Manager, Engineering - Core Storage Platform Automation
Datadog
Spain; Lisbon
hace 6 días

About Datadog :

We're on a mission to build the best platform in the world for engineers to understand and scale their systems, applications, and teams.

We operate at high scale trillions of data points per day allowing for seamless collaboration and problem-solving among Dev, Ops and Security teams globally for tens of thousands of companies.

Our engineering culture values pragmatism, honesty, and simplicity to solve hard problems the right way.

The Opportunity :

Datadog is going through a transformation from being focussed on the three pillars of observability (logs, metrics, traces) to having products that cross modern enterprises needs - Security Monitoring, Real User Monitoring, and Synthetic Monitoring, with many more planned.

At the heart of the massive amount of streaming data generated by these systems are our Core Storage platforms. These platforms comprise 10s of thousands of Kubernetes pods, PBs of data and complex storage and alerting technologies powering highly available distributed multi-tenant solutions for data ingestion, processing and storing, and for data queries serving.

Our software stack is deployed in multiple geo regions and across the three major public cloud infrastructure providers.

Given the technical complexity and at Datadog’s neck-breaking pace of growth, operating the storage solutions requires a non-trivial engineering effort which is hard to sustain.

We are looking for an experienced leader to spearhead the platform automation efforts in Core Storage. As the Engineering Manager, Core Storage Platform Automation, you will manage multiple teams operating embedded and side-by-side with the platform development teams.

The team will measure and reduce operational toil by devising advanced heuristics codified in automation workflows. Your organization will leverage infrastructure provided by other Datadog internal teams, and will develop platform specialized tooling, manageability solutions integrated with our platforms, automation components and more.

You will staff and grow the team with talent featuring SW development and SRE backgrounds. The mission of the Platform Automation team will be to eliminate through software automation 75% of all operational activities requiring human involvement - these include, but are not limited to, release deployments, failures remediation, long running maintenance and business processes.

You Will :

  • Co-author our Platform Automation strategy and liaise with the Infrastructure and the central SRE teams
  • Lead the team in designing and implementing the Core Storage Platform Automation solutions; scale those up and deliver advanced new capabilities like improved auto-scaling, cost reductions, realtime error triangulation and automated repairs, to name a few
  • Own mission-critical automation infrastructure and services
  • Work with the latest open source frameworks and technologies, including Kafka, Airflow, Java, Go, SQL across AWS, GCP and Azure
  • Manage the Team Leads and engineers, ensuring they deliver high quality, timely work and that they’re happy and motivated
  • Recruit and grow our teams in US and Europe
  • You Are :

  • You have at least 7+ years of industry experience in software development and / or operating large scale distributed systems / platforms, of which 4+ years as a manager and at least 2 as a second level manager
  • You’ve built / led / operated Internet-scale, high-performance distributed systems in production for 5+ years and know those systems from top to bottom
  • You are an excellent communicator, self-driven, and are excited to build best-in-industry platform automation solutions
  • You have a BS / MS / PhD in a scientific field or equivalent experience
  • You want to work in a fast-paced, rapid-growth environment where strong relationships in a good culture matter more than process
  • Bonus Points :

  • You have expertise with observability systems
  • You have expertise with datacenter automation, cluster management, runtime error triangulation, workflow automation or similar technologies
  • Why You Should Apply :

  • Generous and competitive global and US benefits
  • New hire stock equity (RSUs) and employee stock purchase plan
  • Continuous career development and pathing opportunities
  • Product training to develop an in-depth understanding of our product and space
  • Best in breed onboarding
  • Internal mentor and buddy program cross-departmentally
  • Friendly and inclusive workplace culture
  • Equal Opportunity at Datadog :

    Datadog is an Affirmative Action and Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

    We also consider qualified applicants regardless of criminal histories, consistent with legal requirements.

    Your Privacy :

    Any information you submit to Datadog as part of your application will be processed in accordance with Datadog’s Applicant and Candidate Privacy Notice.

    Reportar esta oferta
    checkmark

    Thank you for reporting this job!

    Your feedback will help us improve the quality of our services.

    Inscribirse
    Mi Correo Electrónico
    Al hacer clic en la opción "Continuar", doy mi consentimiento para que neuvoo procese mis datos de conformidad con lo establecido en su Política de privacidad . Puedo darme de baja o retirar mi autorización en cualquier momento.
    Continuar
    Formulario de postulación