Senior Site Reliability Engineer (80-100%)

About the role

You would be playing a key role in ensuring the reliability, stability, scalability and security of our Logging & Monitoring cloud systems and infrastructure. You will be designing, implementing, and testing highly automated solutions to shape the technology platform fulfils our business and product vision, ultimately bring value to our customers with positive user experiences.

Key Responsibilities:

End-to-end responsibility, from development to production, in designing, deploying, operating, and continuously improving performance and fault-tolerance of large-scale multi-cloud solutions.
Ensure system security, data integrity, and high availability of the platform.
Establish and improve monitoring, logging, and alerting frameworks to detect and resolve issues promptly.
Keep up with technology trends and identify promising new solutions that meet our requirements.
Create technical support documentation and provide hands-on troubleshooting and consulting to our customers.

About the team

Our Logging & Monitoring squad develops and operates state-of-the-art logging, monitoring and event management platforms to collect application behaviour information, detect / limit service disruption and provide the associated reporting capabilities. Our ambition is to help empower the developers, application and platform owners identify any growing risks, have a clear understanding of their SLAs, reduce the mean time to resolution and be ahead of the curve with regards to long term trends.

About you

We are happy to meet you if you possess:

Experience in software development, continuous integration/deployment, and system engineering experience in large-scale, distributed cloud solutions.
Hands on expertise in open-source application and infrastructure monitoring tools, e.g., ELK and/or TICK stack, Prometheus and Grafana.
Experience in Distributed Tracing with OpenTelemetry and Observability platforms.
Hands on expertise in container orchestration system such as Kubernetes running in a hybrid cloud environment such as Azure and VMWare.
Experience programming in one or more of the following such as Go, Java, Python and in scripting languages (Shell or PowerShell).
Passion for sharing knowledge, through interactive sessions as well as documentation.
Strong analytical and problem-solving skills, as well as the ability to focus on details without losing track of the bigger picture.
Excellent oral and written English skills, additional language skills are a plus.

Nobody is perfect and meets 100% of our requirements. If you, however, meet some of the criteria below and are curious about the world of observability we'll be more than happy to meet you!

About Swiss Re

Swiss Re is one of the world’s leading providers of reinsurance, insurance and other forms of insurance-based risk transfer, working to make the world more resilient. We anticipate and manage a wide variety of risks, from natural catastrophes and climate change to cybercrime. Combining experience with creative thinking and cutting-edge expertise, we create new opportunities and solutions for our clients. This is possible thanks to the collaboration of more than 14,000 employees across the world.

Our success depends on our ability to build an inclusive culture encouraging fresh perspectives and innovative thinking. We embrace a workplace where everyone has equal opportunities to thrive and develop professionally regardless of their age, gender, race, ethnicity, gender identity and/or expression, sexual orientation, physical or mental ability, skillset, thought or other characteristics. In our inclusive and flexible environment everyone can bring their authentic selves to work and their passion for sustainability.

Keywords:
Reference Code: 128052

Apply now »