When your system is reliable, your teams are freed up to focus on innovation.
SRE - with a sustainable, scalable and subscription-based twist
We inspect code, design,implementation, and operational procedures to find out reliability gaps in your system.
We set up objectives, indicators, monitoring and observability. We also do workshops on SLIs/SLOs.
If critical blockers are found, we address them with a Production Readiness Upgrade to expedite improvements.
On-call incident response, fire drills and chaos testing to meet reliability objectives, prevent root causes and capacity to innovate.
While you can’t avoid the risk that comes with innovation, you can manage it.
Create self-healing systems and keep your engineering teams inspired.
Want reliable systems? Create chaos. Get technical insight on exactly how.
Find the right tool for the job and you’ve solved half the problem.
Patterns don’t lie. Our Ops patterns help you see where you are and what step to take next.
Fire drills train engineers as well as reveal ‘flexibility’ and resilience in your processes.
(Spoiler) Simulated chaos prepares you for real and inevitable incidents.
WTF is SRE? is back on 28 April 2022! This is the second annual free, full-day conference designed by site reliability engineers, design thinkers and people with ‘innovation’ in their job descriptions.
Register for talks on site reliability engineering, DevSecOps, observability, reliability and working with complex distributed systems at scale.