Business Critical Reliability Engineer Specialist
Location Madrid, Spain Category Information Technology Job Id 202203-111592 JOB DESCRIPTION
Reliability Engineering (RE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems.
RE ensures that Roche’s services both our internally critical and our externally-visible systems have reliability and uptime appropriate to users' needs and a fast rate of improvement.
Additionally RE’s will keep an ever-watchful eye on the capacity and performance of the system. Much of their work focuses on building infrastructure, optimizing existing systems, and eliminating work through automation.
A Reliability Engineer is an infrastructure engineer who knows how to apply engineering principles to operations. They are well versed in a large number of technologies and welcome new tools and techniques.
They work in conjunction with fellow engineering and operations members to come to the best possible solution. They are always looking for patterns and ways to increase efficiency, eliminate downtime, optimize costs, and maintain performance at scale.
They will also advise our consumers on RE value proposition, adoption, industry best practices, and implementation strategy.
REs are responsible for the big picture of how the systems relate to each other, using a breadth of tools and approaches to solve a broad spectrum of problems.
Practices such as limiting time spent on operational work, blameless postmortems, and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting and dynamic day-to-day work.
RE teams will have the opportunity to manage the complex challenges of scale which are unique to Roche while using expertise in coding, algorithms, complexity analysis, and large-scale system design.
RE's culture of diversity, intellectual curiosity, problem-solving, and openness is key to its success. It brings together people with a wide variety of backgrounds, experiences, and perspectives.
They are encouraged to collaborate, think big, and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.
REs provide on-call support to keep systems up and running, ensuring the consumers have the best and fastest experience possible.
Contribute to activities focused on availability, tuning, performance, efficiency, change management, monitoring, emergency response and capacity planning.
Engage in and improve, under guidance, the whole lifecycle of services from inception and design through deployment, operation and refinement.
Collaborate in the creation of a bridge between engineering and operations by applying a software engineering mindset to system administration topics.
Monitors and resolves Incident / problems with platform operations, suggesting priorities and collaborating in the resolution when required.
Contribute to support services before they go live through activities such as infrastructure design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Collaborate with Managed Services suppliers and external consultancy, ensuring the collaboration is as effective as possible.
Scale systems sustainably through mechanisms like automation, and evolve systems by proposing changes that improve reliability and velocity.
Contribute to the maintenance of services once they are live by measuring and monitoring availability, latency and overall system health.
Look for continuous improvement activities both in technical, teamwork, collaboration and processes areas. Propose and contribute to continuous improvement activities.
Act as an analyst by transforming the customer needs into specific requirements to be implemented in components managed by the team or by other teams.
Remain proactive and aware of operational challenges and opportunities and work with support team staff to resolve incidents and major incidents.
Ensure implemented solutions and components comply with Quality / Regulatory standards, as applicable.
Job Requirements / Qualifications
Good interpersonal skills.
Demonstrated customer & delivery focus.
Well proven scripting and automation skills with strong knowledge in delivering and managing infrastructure as code.
Ability to work effectively with team members and virtual teams from different locations and different cultural backgrounds.
Ability to function independently with low supervision and navigate ambiguity.
Strong problem-solving and decision-making skills.
Good oral and written communication skills in English. German, Spanish or Chinese (Mandarin) are significant pluses.
Moderate travel (20%) required and ability to work across multiple time zones, including on-call.
Education / Years of Experience
4-7 years of relevant work experience
or 2-5 years with Bachelor’s degree
or 1-3 years with Masters degree
At least 1 years of experience of working in one or more multinational work environments (e.g. healthcare industry experience is a plus) as a senior systems or software Engineer.
Hands-on technical skills in automation, infrastructure as code, logging, monitoring and observability, infrastructure configuration, scripting languages, and applications.
Knowledge about working with Infrastructure Systems internals, their administration, and networking.
Knowledge about applying design thinking, lean, prioritization, and agile methodologies to evolve services offered to partners.
Knowledge about the definition of technical computing infrastructure entirely under the control of software with no operator or human intervention.
Knowledge about defining Service Level Objectives and Service Level Indicators.
Knowledge about DevOps mindset, processes, and tools.
Cross-Functional Technical Knowledge, tools / scripting / methodologies for : Configuration management, Infrastructure as Code, Automation Design, Infrastructure Development Life Cycle, and hybrid Clouds.
Knowledge about algorithms, data structures, complexity analysis, and software design.
Knowledge about Microsoft-based technologies, both on-prem and Cloud.
Knowledge about PowerShell DSC and scripting languages.
Who we are
At Roche, more than 100,000 people across 100 countries are pushing back the frontiers of healthcare. Working together, we’ve become one of the world’s leading research-focused healthcare groups.
Our success is built on innovation, curiosity and diversity.
Roche is an Equal Opportunity Employer.
Job Level :