Senior Site Reliability Engineer

About the Role: Position summary & Main tasks/activities (short and precise definition of the role and most important activities listed)

You would be playing a key role in ensuring the reliability, stability, scalability and security of our Logging & Monitoring cloud systems and infrastructure. You will be designing, implementing, and testing highly automated solutions to shape the technology platform fulfils our business and product vision, ultimately bring value to our customers with positive user experiences.

Key Responsibilities:

End-to-end responsibility, from development to production, in designing, deploying, operating, and continuously improving performance and fault-tolerance of large-scale multi-cloud solutions.
Ensure system security, data integrity, and high availability of the platform.
Establish and improve monitoring, logging, and alerting frameworks to detect and resolve issues promptly.
Keep up with technology trends and identify promising new solutions that meet our requirements.

Create technical support documentation and provide hands-on troubleshooting and consulting to our customers.

About the Team

Our Logging & Monitoring squad develops and operates state-of-the-art logging, monitoring and event management platforms to collect application behaviour information, detect / limit service disruption and provide the associated reporting capabilities. Our ambition is to help empower the developers, application and platform owners identify any growing risks, have a clear understanding of their SLAs, reduce the mean time to resolution and be ahead of the curve with regards to long term trends.

About you

We are happy to meet you if you possess:

Hands on expertise in container orchestration system such as Kubernetes running in a hybrid cloud environment such as Azure.
Experience in continuous integration/deployment, and system engineering experience in large-scale, distributed cloud solutions.
Experience programming in one or more of the following such as Go, Java, Python and in scripting languages (Shell or PowerShell).
Hands on expertise in open-source application and infrastructure monitoring tools, e.g., ELK and/or TICK stack, Prometheus and Grafana.
Passion for sharing knowledge, through interactive sessions as well as documentation.
Strong analytical and problem-solving skills, as well as the ability to focus on details without losing track of the bigger picture.
Excellent oral and written English skills, additional language skills are a plus.

Nobody is perfect and meets 100% of our requirements. If you, however, meet some of the criteria below and are curious about the world of observability we'll be more than happy to meet you!

About Swiss Re

Swiss Re is one of the world’s leading providers of reinsurance, insurance and other forms of insurance-based risk transfer, working to make the world more resilient. We anticipate and manage a wide variety of risks, from natural catastrophes and climate change to cybercrime. We cover both Property & Casualty and Life & Health. Combining experience with creative thinking and cutting-edge expertise, we create new opportunities and solutions for our clients. This is possible thanks to the collaboration of more than 14,000 employees across the world.

Our success depends on our ability to build an inclusive culture encouraging fresh perspectives and innovative thinking. We embrace a workplace where everyone has equal opportunities to thrive and develop professionally regardless of their age, gender, race, ethnicity, gender identity and/or expression, sexual orientation, physical or mental ability, skillset, thought or other characteristics. In our inclusive and flexible environment everyone can bring their authentic selves to work and their passion for sustainability.

Keywords:
Reference Code: 130498

Apply now »