
Staff Site Reliability Engineer Australia
5 days ago
Aerospike is thereal-time databaseformission-critical use cases and workloads, includingmachine learning, generative, and agentic AI.Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases.
Global leaders, includingAdobe, Airtel, Barclays, Criteo, DBS Bank, Experian, Grab, HDFC Bank, PayPal, Sony Interactive Entertainment, The Trade Desk, and Wayfair,rely on Aerospike forcustomer 360, fraud detection, real-time bidding,profile stores, recommendation engines,and other use cases.
At Aerospike, we dream big and deliver even bigger. Our mission is to unleash the power of the world's real-time data with a database built for infinite scale, speed, and sustainability.
If you're ready to shape the future of data, join us.
Staff Site Reliability Engineer
As a Staff Site Reliability Engineer at Aerospike, you'll be a technical leader within our global SRE organization, helping drive reliability, performance, and scalability across our hybrid and multi-cloud environments. You'll bring deep operational experience and lead by example—mentoring others, designing resilient systems, and championing modern SRE practices across new and legacy platforms.
You'll play a key role in shaping the direction of our infrastructure initiatives, from Kubernetes-based platforms like AKS and the Aerospike Kubernetes Operator to existing services in AWS and GCP. Your impact will span teams and systems as you solve complex problems, influence architecture, and foster a culture of ownership, resilience, and continuous improvement.
Key Responsibilities- Provide technical leadership across multiple systems and environments, proactively identifying risks, shaping architecture decisions, and improving reliability and performance at scale.
- Lead key infrastructure efforts including Kubernetes platform expansion (AKS, AKO), and application of SRE principles to legacy systems and new cloud offerings.
- Define, measure, and enforce reliability standards through SLIs/SLOs, observability tooling, and incident response frameworks.
- Mentor and guide other SREs by leading design sessions, conducting technical deep dives, and reviewing code, configurations, and infrastructure decisions.
- Partner with product, engineering, and cloud teams to align reliability goals with delivery objectives.
- Lead root cause analyses and implement systemic fixes for issues spanning multiple platforms or services.
- Drive automation-first approaches using IaC, CI/CD pipelines, and scripting to reduce toil and increase deployment confidence.
- Influence cross-functional roadmaps, identifying areas for innovation, technical debt reduction, and long-term scalability.
- Participate in the global on-call rotation, bringing senior-level calm and clarity during incidents and escalations.
- 8+ years of experience in SRE, DevOps, or infrastructure engineering, including significant time operating production systems at scale.
- Deep hands-on experience with at least one major public cloud (AWS, GCP, Azure), and working knowledge of the others; Azure experience is a plus.
- Production experience with Kubernetes, including operating clusters, Helm, operators, and supporting microservices in real-world environments.
- Strong proficiency in infrastructure-as-code tools such as Terraform and CI/CD automation platforms.
- Expertise in observability tools and practices (Datadog, Prometheus, Grafana, ELK, etc.) and using them to define SLIs and SLOs.; DataDog experience is a plus
- Programming and scripting ability in one or more languages (Python, Go, Bash, etc.).
- Experience with large-scale incident response and post-incident review practices.
- Proven ability to mentor other engineers and influence technical strategy across multiple teams.
- Strong communication skills to articulate complex concepts to technical and non-technical stakeholders.
- Hands-on experience managing and optimizing database deployments and services in production environments, ensuring high availability and performance.
- Familiarity with Aerospike or other distributed databases is a plus.
- Kubernetes or cloud certifications (CKA, CKS, AWS/GCP DevOps/Architect) a plus but not require
- Track record of influencing architectural decisions across teams or domains.
Aerospike is an Equal Opportunity Employer. We are committed to providing an environment free from discrimination on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status, or any other basis covered by appropriate law.
Create a Job Alert
Interested in building your career at Aerospike? Get future opportunities sent straight to your email.
Apply for this job*
indicates a required field
First Name *
Last Name *
Email *
Phone
Resume/CV
Enter manually
Accepted file types: pdf, doc, docx, txt, rtf
Enter manually
Accepted file types: pdf, doc, docx, txt, rtf
#J-18808-Ljbffr-
Copper Coast Council, Australia Aerospike, Inc. Full timeAerospike is thereal-time databaseformission-critical use cases and workloads, includingmachine learning, generative, and agentic AI.Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases.Global leaders, includingAdobe, Airtel, Barclays, Criteo, DBS Bank,...
-
Copper Coast, Australia Aerospike, Inc. Full timeAerospike is thereal-time databaseformission-critical use cases and workloads, includingmachine learning, generative, and agentic AI.Aerospike powers millions of transactions per second with millisecond latency, at a fraction of the total cost of ownership compared to other databases.Global leaders, includingAdobe, Airtel, Barclays, Criteo, DBS Bank,...
-
Reliability Engineering Manager
1 week ago
Copper Coast, Australia beBeeSRE Full time $180,000 - $200,000Aerospike is a real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI.Job Overview:We are seeking an experienced leader to establish a new regional SRE team in Australia. As a founding member, you will play a pivotal role in building and shaping a high-performing team from the ground up.Main...
-
Reliability Engineering Leader
1 week ago
Copper Coast, Australia beBeesite Full time $150,000 - $200,000Job OpportunityAerospike is a real-time database used for mission-critical applications and workloads, including machine learning, AI, and data analytics.Global organizations rely on Aerospike for customer 360, fraud detection, real-time bidding, profile stores, recommendation engines, and other use cases.The Site Reliability Engineering Manager will play a...
-
Senior Reliability Engineering Manager
5 days ago
Copper Coast, Australia beBeeReliability Full time $150,000 - $200,000Key Responsibilities:We are seeking a highly skilled Senior SRE Manager to lead our Australian team in delivering world-class reliability engineering services. As a key member of our global organization, you will be responsible for building and leading a new regional SRE team, establishing high-performance culture, and ensuring strong execution across...
-
Site Reliability
2 weeks ago
Sunshine Coast, Australia beBeeCloudNative Full time $100,000 - $130,000About this role: We are seeking an experienced Site Reliability / DevOps Engineer who thrives in cloud-native environments and wants to shape the infrastructure behind a high-performance application.Key Responsibilities:• Optimise and evolve our AWS infrastructure as we grow. • Enable flexible, modular deployment strategies. • Drive automation across...
-
Reliability Engineer
3 weeks ago
Wollongong City Council, Australia Aboriginal Health Council of Western Australia Full timeThere's never been a better time to be in energy. And there's never been a more exciting time to be at Endeavour Energy.More than 2.7 million people across New South Wales rely on us every day for the supply of safe and reliable power to their homes and businesses. We employ more than 1,700 people across our catchment, making us one of the largest employers...
-
High-Performance Reliability Leader
1 week ago
Copper Coast Council, Australia beBeeEngineering Full time $120,000 - $180,000Job Title:Reliability Engineering Leadership Position Job Description: You will play a critical leadership role in building and shaping a high-performing regional SRE group from the ground up. This team will be instrumental in ensuring the uptime, reliability, and scalability of cloud deployments across multiple global offerings. Key Responsibilities: ...
-
Distributed Systems Expert
4 days ago
Copper Coast Council, Australia beBeeEngineer Full time $90,000 - $130,000Staff Site Reliability EngineerA real-time database designed for mission-critical use cases and workloads, including machine learning and AI, is required. This system powers millions of transactions per second with millisecond latency at a fraction of the cost of other databases.Global leaders rely on this technology for customer 360, fraud detection,...
-
Planning & Reliability Engineer
1 week ago
Sunshine Coast, Australia MacKellar Full time $104,000 - $130,878 per yearAbout the Role We're seeking a Reliability & Planning Engineer to join our team in Nambour, Sunshine Coast. You'll develop and execute maintenance strategies, analyse data to improve equipment reliability, and support site teams with troubleshooting and planning. Working closely with maintenance, engineering, and operations teams, you'll help ensure safety...