Senior Site Reliability Engineer, Splunk Observability - remote Spain
Splunk Inc
Madrid, Spain
hace 3 días

Join us as we pursue our disruptive new vision to make machine data accessible, usable and valuable to everyone. We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers.

At Splunk, we’re committed to our work, customers, having fun and most importantly to each other’s success. Learn more about Splunk careers and how you can become a part of our journey!

The Splunk Observability Suite is a new generation of cloud applications for microservices and distributed applications.

We work on new, world-class tools to monitor and observe microservice-based applications and are one of the main drivers of the emerging OpenTelemetry CNCF project.

Site Reliability Engineers for Splunk APM are hybrid Software / Systems Engineers whose overarching goal is to ensure that production services are always up and running reliably.

As a Site Reliability Engineer, you will help us operationalize one of the largest and most sophisticated cloud-scale, big data systems in the world.

You will be responsible for improving operational efficiency, optimal utilization and system resiliency for a real-time streaming analytics platform.

You will own some of the open-source software that our platform relies on, and be a core participant in every significant engineering effort underway.

Our engineering teams are small, fast-paced and highly impactful. We are looking for experienced engineers who are passionate about the quality of their work and would like to work on impactful projects.

We will provide a work environment where you have clear deliverables, are empowered to do a great job and will be recognized for your achievements.

You will have the opportunity to work with a highly entrepreneurial team members and a team that retains its startup DNA.

You may also choose to share and talk about your work through channels like conferences, meetups, and blogs.

Responsibilities

  • Responsible for automating & operationalizing cloud provider infrastructure via Terraform as well as Kubernetes, Helm and Istio
  • Monitor capacity & utilization and work closely with the infrastructure team to orchestrate scale-up / down of backend services.
  • Own & operate critical back-end open-source services like Cassandra, Kafka, and Zookeeper.
  • Build tools and design processes that help improve observability and system resiliency.
  • Triage site availability incidents and proactively work towards reducing MTTR for customer-impacting incidents.
  • Partner with service owners to implement service level metrics & service level objectives that act as service-level health indicators.
  • Establish design patterns for monitoring, benchmarking and deploying new features for the backend services.
  • Requirements

  • Coding experience in one or more of Python, Bash, Go or Java.
  • Infrastructure as code experience with in one or more of Terraform, Ansible, Puppet or Salt.
  • Strong experience with modern application development workflows and version control systems like GitHub, Gitlab or Bitbucket
  • Strong working knowledge of Docker containers and cloud platforms (AWS, GCP and / or Azure)
  • Strong working knowledge of orchestration engines and package management including Kubernetes, Helm, and Istio
  • Experience operating one or more OSS technologies like Kafka, Cassandra, Zookeeper; other backends and streaming systems a plus
  • Extensive understanding of Unix / Linux systems from kernel to shell and beyond (system libraries, file systems, and client-server protocols).
  • 5+ years of experience as a Site Reliability Engineer, Production Engineer or Backend Software Engineer for web-scale or similar platforms.
  • BS degrees in Computer Science or related technical field, or equivalent practical experience.
  • We value diversity at our company. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying.

    Thank you for your interest in Splunk!

    Reportar esta oferta
    checkmark

    Thank you for reporting this job!

    Your feedback will help us improve the quality of our services.

    Inscribirse
    Mi Correo Electrónico
    Al hacer clic en la opción "Continuar", doy mi consentimiento para que neuvoo procese mis datos de conformidad con lo establecido en su Política de privacidad . Puedo darme de baja o retirar mi autorización en cualquier momento.
    Continuar
    Formulario de postulación