Observability & Resilience Specialist (Sre)

1 week ago


Sydney, New South Wales, Australia Wooliesx Full time

**Observability & Resilience Specialist (SRE)**
- **Permanent role**:

- **Surry Hills based with WFH flexibility**:

- **Be part of exciting digital growth
**About WooliesX**
As a start-up business inside one of Australia's largest retailers, WooliesX aims to bring the best of Woolworths to our customers, powered by our team, technology and data. We're an innovation business that brings together the brightest minds in e-commerce, technology, media, and data to transform the way people live and shop.
With an industry leading technology team, backed by analytics, we're resourceful and willing to experiment. Our agile teams are empowered to innovate and deliver an awesome experience for our customers - no matter whether they choose to shop in-store, or online.

**About the Role;**
We are looking for talented team members who are passionate about observability, event intelligence and providing service insights & solutions that enable availability, scalability and reliability of our Always On customer facing technology.

You will be tasked with providing the metrics and framework that drive forward performance, availability and resilience of the platform by standardisation and supporting our teams, cultivating excellent operational practices and identifying automation opportunities.

You will have the unique opportunity to help shape the future of observability & resilient systems at WooliesX. You will work with talented and diverse teams and technology, becoming intimate with the architecture of our systems and surfacing meaningful insights that drive better outcomes.

**Responsibilities**
- Drive service reliability and engineering excellence by implementing and maintaining tooling that surfaces metrics using SLIs, SLOs, and SLAs
- Drive and maintain event intelligence best practices working with teams to measure outcomes
- Work in close collaboration across teams to shape the future roadmap to improve reliability and establish strong operational readiness across teams
- Identify areas for improvement across WooliesX and drive technical change to automate operational outcomes to maximising reliability and minimising recovery time.
- Share your knowledge by giving brown bags, tech talks, and evangelising technology and best practices.
- Contribute to Root Cause Analysis (RCA) investigations and implement appropriate APM solutions (such as Dynatrace) and automation as necessary.
- Roll out of new tools, technologies and processes that have high business impact and are used by multiple teams that improve reliability and velocity.
- Contribute to documentation and uplifting of teams.
- Ability engage and collaborate with with senior leadership and business stakeholders

**Requirements**:

- Current Hands-on experience in Site Reliability or Observability engineering
- Prior experience in implementing SRE/Observability capability in a large scale organisation.
- A solid level of understanding of Observability & Event Intelligence best practice
- Experience with implementation of Dynatrace across a diverse technology stack and ability to identify edge cases, failure modes, erroneous behaviour, specific implementations.
- Strong scripting/programming experience with at least one of the following languages:.NET, Python, Java, Go, C# or similar is beneficial
- You have hands-on experience with cloud infrastructure (AWS, GCE, Azure, Kubernetes, Docker).
- Formal Certification in one of the following: AWS, GCE, Azure, Kubernetes, Docker is beneficial
- Experience with implementing Circuit breakers, Resilience frameworks, Fault tolerance, and self-healing mechanisms of services.
- Strong organisational and interpersonal skills, with experience developing and instilling a culture of operational maturity
- Systematic problem-solving approach, coupled with effective communication skills and a sense of ownership and drive.

**Grow with the Group
As an inclusive, team-first company, our people are at the core of everything we do.

We care deeply about creating a workplace where our team members feel valued, respected and empowered. We are committed to providing equal opportunity regardless of gender identity, ethnicity, disability, sexual orientation or life stage. We are proud to be recognised as a Gold Tier Employer in the Australian Workplace Equality Index for LGBTQ+ inclusion and as an Employer of Choice for Gender Equality by the Workplace Gender Equality Agency.

As our Group continues to evolve, innovate and support our communities, we encourage our team members to do the same with their own careers, by providing ongoing opportunities to grow and make a real difference.

We value flexibility, and encourage our team members to work in ways that meet their work/life commitments and support their wellbeing.

We work hard to create a safe and inclusive environment for all, and most importantly, we're all about creating better experiences - for our customers and for each other.

**We'd love to hear from You
Find your



  • Sydney, New South Wales, Australia beBeeResilience Full time $200,000 - $250,000

    Job Title: AI Operations Resilience SpecialistAbout the RoleThis role is responsible for leading the product vision and roadmap for AIOps capabilities, including intelligent alerting, anomaly detection, predictive capacity/latency analytics, and automated remediation.Owning the product vision and roadmap for AIOps capabilities.Leading a cross-functional...


  • Sydney, New South Wales, Australia beBeeResilience Full time $180,000 - $250,000

    Job DescriptionAs a Technical Product Owner, you will be responsible for defining and delivering product visions and roadmaps for AIOps capabilities. This includes owning the product vision and roadmap for intelligent alerting, anomaly detection, predictive capacity/latency analytics, incident copilots, automated remediation, runbook orchestration,...


  • Sydney, New South Wales, Australia Bebeesoftware Full time

    Senior Software EngineerWe're looking for a skilled Senior Software Engineer to join our Site Reliability Engineering team. As a key member, you'll play a vital role in designing, building, and running resilient systems that foster a culture of learning and knowledge-sharing.You'll work on solving complex cross-service and cross-domain problems, identifying...


  • Sydney, New South Wales, Australia beBeeTechnologist Full time $150,000 - $180,000

    Job OverviewTechnical Lead for AI Operations ResilienceThe Commonwealth Bank is seeking a Technical Product Owner to lead the development and implementation of AI-powered engineering tools and capabilities.You will work closely with global engineers to enhance productivity and engineering capability across the bank.The ideal candidate will have strong...

  • Observability Lead

    1 week ago


    Sydney, New South Wales, Australia beBeeObservability Full time $120,000 - $180,000

    Chief Observability Officer Job DescriptionOverviewThe Platform Engineering team is responsible for providing comprehensive insights into the health, performance, and behavior of critical systems and applications.We empower our engineering teams with robust monitoring, logging, and tracing capabilities, enabling them to identify issues, troubleshoot...


  • Sydney, New South Wales, Australia beBeeObservability Full time

    Observability and Resilience ExpertAs a member of our team, you will play a key role in shaping the future of observability and resilient systems. We are looking for a talented individual who is passionate about providing service insights and solutions that enable availability, scalability, and reliability of our Always On customer-facing technology.Your...


  • Sydney, New South Wales, Australia Amp Full time

    Retirement SpecialistIf you live in Australia or New Zealand, you've likely heard of AMP. But at a time when society is changing, we are too. We're now a nimbler business with new leadership and thinking.For us, these are exciting times. There's a real potential for big thinkers to help us redefine what financial services could be. And turn our legacy into...


  • Sydney, New South Wales, Australia beBeeReliability Full time $194,264 - $218,444

    Reliability Engineering Leadership RoleReal-time data powers organizations to unleash their potential. We seek an experienced Reliability Engineering Manager to lead our team in delivering mission-critical systems with infinite scale, speed, and sustainability.Key Responsibilities:Establish a high-performance SRE team by hiring, onboarding, and mentoring...


  • Sydney, New South Wales, Australia beBeeOperations Full time $120,000 - $175,000

    The Centre of Expertise for Site Reliability Engineering supports the organization's strategy by enabling SRE capabilities towards continuous focus on system health, reliability, availability, capacity, performance, continuity, and management of IT services.This role plays a critical part in managing and maintaining observability technology infrastructure...


  • Sydney, New South Wales, Australia beBeeManagement Full time $180,000 - $250,000

    Job TitleA key member of our team is sought after to lead and manage a group of Observability engineers within the Platform Engineering domain.About the RoleLead and manage a team of Observability engineers within the Platform Engineering domain.Define and execute the strategy and roadmap for Observability across the company, covering logging, metrics,...