Principal Site Reliability Engineer, Splunk Observability
Splunk
Barcelona, Spain
hace 6 días

PRINCIPAL SITE RELIABILITY ENGINEER - OBSERVABILITY

Join us as we pursue our disruptive new vision to make machine data accessible, usable and valuable to everyone. We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers.

At Splunk, we’re committed to our work, customers, having fun and most importantly to each other’s success. Learn more about Splunk careers and how you can become a part of our journey!

The Splunk Observability Suite is a new generation of cloud applications for microservices and distributed applications.

We work on new, world-class tools to monitor and observe microservice-based applications. Site Reliability Engineers at Splunk are hybrid Software / Systems Engineers whose overarching goal is to ensure that production services are always up and running reliably.

As a Principal Site Reliability Engineer, you will help us run one of the largest and most sophisticated cloud-scale, big data systems in the world.

You will be responsible for improving operational efficiency, optimal utilization and system resiliency for a real-time streaming analytics platform.

You are passionate about automation, infrastructure-as-code, and getting rid of tedious, manual tasks.

Responsibilities

  • Responsible for automating & operationalizing cloud provider infrastructure via Terraform, Kubernetes, Helm and Istio
  • Monitor capacity & utilization and work closely with the infrastructure team to orchestrate scale-up / down of backend services.
  • Own & operate critical back-end open-source services like Cassandra, Kafka, Elasticsearch, MongoDB, and Zookeeper.
  • Build tools and design processes that help improve observability and system resiliency.
  • Triage site availability incidents and proactively work towards reducing MTTR for customer-impacting incidents.
  • Implement service level metrics & service level objectives that act as service-level health indicators.
  • Establish design patterns for monitoring, benchmarking and deploying new features for the backend services.
  • Requirements

  • Strong coding experience in one or more of Python, Go or Java.
  • Infrastructure as code experience within one or more of Terraform, Ansible, Puppet or Salt.
  • Strong experience with modern application development workflows and version control systems like GitHub, Gitlab or Bitbucket
  • Strong working knowledge of Docker containers and cloud platforms (AWS, GCP and / or Azure)
  • Strong working knowledge of orchestration engines and package management including Kubernetes, Helm, and Istio
  • Experience operating one or more OSS technologies like Kafka, Cassandra, Zookeeper; other backends and streaming systems a plus
  • Extensive understanding of Unix / Linux systems from kernel to shell and beyond (system libraries, file systems, and client-server protocols).
  • 12+ years of experience as a Site Reliability Engineer, Production Engineer or Backend Software Engineer for web-scale or similar platforms.
  • BS degrees in Computer Science or related technical field, or equivalent practical experience.
  • We value diversity at our company. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying.

    For job positions in San Francisco, CA, and other locations where required, we will consider for employment qualified applicants with arrest and conviction records.

    Thank you for your interest in Splunk!

    Reportar esta oferta
    checkmark

    Thank you for reporting this job!

    Your feedback will help us improve the quality of our services.

    Inscribirse
    Mi Correo Electrónico
    Al hacer clic en la opción "Continuar", doy mi consentimiento para que neuvoo procese mis datos de conformidad con lo establecido en su Política de privacidad . Puedo darme de baja o retirar mi autorización en cualquier momento.
    Continuar
    Formulario de postulación