Site Reliability Engineer
3 weeks ago
This isn't just about keeping the lights on; it's about building systems and solutions that thrive under extreme pressure.
You'll collaborate with feature teams to extract rich telemetry and structured event data from their code, architect and build highly distributed observability solutions to handle massive data ingestion and develop tools that turn complex signals into actionable insights for technical and business stakeholders alike.
Your mission: to ensure our systems stay fast, scalable, and reliable—even when millions of bets are placed simultaneously during major international sporting events.
Whether it's a local derby in Europe, a championship game in North America, or a global tournament, your work will be critical to delivering seamless betting experiences across continents.
If you love solving complex and novel problems, architecting scalable and distributed systems, and being a key player in a high-stakes environment, we'd love to hear from you.
Responsibilities: Architect, build, and maintain large-scale, distributed telemetry pipelines and observability platforms that provide real-time insight into system performance and reliability.
Design innovative solutions for telemetry challenges in our uniquely asynchronous and distributed ecosystem, ensuring high visibility across services.
Act as a subject matter expert, collaborating with development teams to optimise instrumentation, observability tooling, and reliability strategies.
Drive capacity planning and proactive performance optimisation, always pushing the envelope to anticipate and meet evolving business and customer needs.
Partner with teams to define and refine four golden signals, service levels, and error budgets, ensuring we measure and improve critical user journeys and business impact.
Take ownership in high-stakes production incidents, leading deep-dive investigations and implementing long-term solutions to prevent future disruptions.
Develop, refine, and automate reliability-focused tooling, reducing toil and increasing engineering efficiency across the platform.
Skills and Experience: Deep expertise in site reliability engineering concepts and practices.
Advanced knowledge of observability and telemetry data principles, with hands-on experience in designing and implementing solutions at scale.
Experience with Linux system administration and fundamentals.
Solid understanding of network fundamentals, with an emphasis on Layer 7 protocols such as HTTP, g RPC, DNS, and TLS.
Extensive experience with Iaa S platforms, both cloud and on-prem.
Strong experience with containerisation principles, tooling and orchestration.
Proficiency in one or more of the following: Go, Python, C#, Node JS or similar programming languages.
Strong grasp of CI/CD automation and Infrastructure as Code (Ia C) principles.
Bonus Skills and Experience: Experience in fintech style operations.
Experience working in large scale, low latency asynchronous systems.
Hands-on experience with the Opentelemetry ecosystem.
A keen interest in new technologies and industry trends in the SRE and observability space.
Strong analytical and troubleshooting skills, with a systematic approach to problem-solving.
Excellent verbal and written communication skills, with the ability to document systems, processes, and troubleshooting steps clearly.
Benefits: We are in a fantastic new office near Barangaroo, close to Wynyard station.
Our office has a sports hub, if you want to challenge a mate to a game of table tennis or darts.
Fancy a good cup of coffee? We have an in-house barista to get you that perfect cup Many social events to take part in (Melbourne Cup is just one of them).
Great work life balance and flexibility.
A continued commitment to employee development.
Life insurance and income protection plans.
Wellness benefits.
#J-18808-Ljbffr
-
Site Reliability Engineer
5 days ago
Sydney, New South Wales, Australia NXTGIG Full timeSite Reliability EngineerNXT GIG is seeking a dedicated Site Reliability Engineer (SRE) to join our dynamic team and play a crucial role in ensuring the reliability and performance of our systems and applications. As an SRE, you will be responsible for building and maintaining our infrastructure, developing automation solutions, and monitoring system health....
-
Site Reliability Engineer
9 hours ago
Sydney, New South Wales, Australia NXTGIG Full timeSite Reliability EngineerNXT GIG is seeking a dedicated Site Reliability Engineer (SRE) to join our dynamic team and play a crucial role in ensuring the reliability and performance of our systems and applications. As an SRE, you will be responsible for building and maintaining our infrastructure, developing automation solutions, and monitoring system health....
-
Site Reliability Engineer
1 week ago
Sydney, New South Wales, Australia Tribusgrp Full timeSite Reliability Engineer A leading global hedge fund is seeking a Site Reliability Engineer to drive the performance, scalability, and automation of mission-critical trading systems.In this high-impact role, you'll work at the intersection of software and systems engineering, ensuring the seamless operation of sophisticated financial applications.Key...
-
Site Reliability Engineer
5 days ago
Sydney, New South Wales, Australia ClearRoute Full timeJoin to apply for the Site Reliability Engineer (Quality Cloud Engineer) role at ClearRoute.About Us:We are an engineering consultancy bridging Quality Engineering, Cloud Platforms and Developer Experience. Our values challenge us to do the best we can for ClearRoute, our Customers and most importantly our team. We want to create a collaborative team to help...
-
Site Reliability Engineer
2 weeks ago
Sydney, New South Wales, Australia Microsoft Full timeOverview Are you interested in working on one of Microsoft's most exciting products? Are you passionate about exceeding customer expectations and advancing Microsoft's cloud-first strategy? If so, the Azure Customer Experience (CXP) Customer Reliability Engineering (CRE) Team is the place for you Azure CXP CRE is a top-level pillar of Azure Engineering that...
-
Sr Site Reliability Engineer
1 week ago
Sydney, New South Wales, Australia Cisco Systems, Inc. Full timeThe Site Reliability Engineering (SRE) team at Duo, a part of Cisco, plays a crucial role in maintaining the reliability, availability, and performance of Duo's security services.They are responsible for ensuring service reliability by implementing robust monitoring and alerting systems to proactively detect and address issues.The team leads incident...
-
Sr Site Reliability Engineer
6 days ago
Sydney, New South Wales, Australia Cisco Systems Full timeMeet The Team The Site Reliability Engineering (SRE) team at Duo, a part of Cisco, plays a crucial role in maintaining the reliability, availability, and performance of Duo's security services.They are responsible for ensuring service reliability by implementing robust monitoring and alerting systems to proactively detect and address issues.The team leads...
-
Sr Site Reliability Engineer
1 week ago
Sydney, New South Wales, Australia Cisco Systems Full timeMeet The TeamThe Site Reliability Engineering (SRE) team at Duo, a part of Cisco, plays a crucial role in maintaining the reliability, availability, and performance of Duo's security services. They are responsible for ensuring service reliability by implementing robust monitoring and alerting systems to proactively detect and address issues. The team leads...
-
Sr Site Reliability Engineer
5 days ago
Sydney, New South Wales, Australia Cisco Systems, Inc. Full timeThe Site Reliability Engineering (SRE) team at Duo, a part of Cisco, plays a crucial role in maintaining the reliability, availability, and performance of Duo's security services. They are responsible for ensuring service reliability by implementing robust monitoring and alerting systems to proactively detect and address issues. The team leads incident...
-
Site Reliability Engineer
2 days ago
Sydney, New South Wales, Australia Paxus - Technology + Digital Talent Full timeOur client is delivering a major large-scale cloud infrastructure project and is looking for an experienced Site Reliability Engineer (SRE) to join the team.This role is focused on optimising, automating, and scaling critical cloud infrastructure to support real-time distributed systems at high scale.You will be responsible for ensuring reliability,...
-
Senior Site Reliability Engineer
5 days ago
Sydney, New South Wales, Australia Talent Insights Group Full timeAssociate Director - Data Science, Data Engineering, Machine Learning, Platforms and CloudOps at Talent Insights GroupTalent Insights is looking to discuss a new Senior Site Reliability Engineer with a focus on Security position working full-time with a Technology organisation based in Sydney. They pride themselves on driving innovation and pushing the...
-
Site Reliability Engineer
5 days ago
Sydney, New South Wales, Australia Nuage Technology Group Full timeNuage Technology Group provided pay rangeThis range is provided by Nuage Technology Group. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Base pay rangeA$180.00/yr - A$210.00/yrNuage has partnered with award-winning large enterprises for multiple hiring multiple Site Reliability Engineers (SRE) to help...
-
Site Reliability Engineer
6 days ago
Sydney, New South Wales, Australia Paxus - Technology + Digital Talent Full timeOur client is delivering a major large-scale cloud infrastructure project and is looking for an experienced Site Reliability Engineer (SRE) to join the team.This role is focused on optimising, automating, and scaling critical cloud infrastructure to support real-time distributed systems at high scale. You will be responsible for ensuring reliability,...
-
Site Reliability Engineer
2 days ago
Sydney, New South Wales, Australia Paxus Full timeThis range is provided by Paxus.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Base pay range A$900.00/daily - A$1,000.00/daily Our client is delivering a major large-scale cloud infrastructure project and is looking for an experienced Site Reliability Engineer (SRE) to join the team.This role is...
-
Site Reliability Engineer
3 weeks ago
Sydney, New South Wales, Australia Nuage Technology Group Full timeNuage Technology Group provided pay range This range is provided by Nuage Technology Group. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range A$180.00/yr - A$210.00/yr Nuage has partnered with award-winning large enterprises for multiple hiring multiple Site Reliability Engineers (SRE) ...
-
Site Reliability Engineer
6 days ago
Sydney, New South Wales, Australia Paxus Full timeThis range is provided by Paxus. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Base pay rangeA$900.00/daily - A$1,000.00/dailyOur client is delivering a major large-scale cloud infrastructure project and is looking for an experienced Site Reliability Engineer (SRE) to join the team.This role is focused...
-
Site Reliability Engineer
1 week ago
Sydney, New South Wales, Australia Kindred Group plc Full timeAs a Site Reliability Engineer, you'll be at the heart of ensuring the resilience and observability of our Sports Book Platform. This isn't just about keeping the lights on; it's about building systems and solutions that thrive under extreme pressure. You'll collaborate with feature teams to extract rich telemetry and structured event data from their code,...
-
Site Reliability Engineer
2 weeks ago
Sydney, New South Wales, Australia TikTok Full timeAbout Tik Tok Data Security Tik Tok is the leading destination for short-form mobile video.Our mission is to inspire creativity and bring joy.Data Security ("USDS") is a subsidiary of Tik Tok.This new, security-first division was created to bring heightened focus and governance to our data protection policies and content assurance protocols to keep users...
-
Site Reliability Engineer
2 weeks ago
Sydney, New South Wales, Australia ClearRoute Full timeAbout Us We are an engineering consultancy bridging Quality Engineering, Cloud Platforms, and Developer Experience.Our values challenge us to do the best we can for Clear Route, our Customers, and most importantly our team.We want to create a collaborative team to help build Clear Route.This is an opportunity for you to build a consultancy from the ground...
-
Site Reliability Engineer
4 weeks ago
Sydney, New South Wales, Australia Cisco Systems, Inc. Full timeSite Reliability Engineer - Expressions of Interest Location: Sydney, Australia Alternate Location: Australia, Singapore, New Zealand Area of Interest: Engineer - Software Job Type: Professional Job Id: 1436249 Duo Security, now a part of Cisco, is the leading provider of Trusted Access security and multi-factor authentication delivered through the...