When your system is reliable, your teams are freed up to focus on innovation.
SRE - with a sustainable, scalable and subscription-based twist
We inspect code, design,implementation, and operational procedures to find out reliability gaps in your system.
We set up objectives, indicators, monitoring and observability. We also do workshops on SLIs/SLOs.
If critical blockers are found, we address them with a Production Readiness Upgrade to expedite improvements.
On-call incident response, fire drills and chaos testing to meet reliability objectives, prevent root causes and capacity to innovate.
While you can’t avoid the risk that comes with innovation, you can manage it.
Create self-healing systems and keep your engineering teams inspired.
Want reliable systems? Create chaos. Get technical insight on exactly how.
Find the right tool for the job and you’ve solved half the problem.
Patterns don’t lie. Our Ops patterns help you see where you are and what step to take next.
Fire drills train engineers as well as reveal ‘flexibility’ and resilience in your processes.
(Spoiler) Simulated chaos prepares you for real and inevitable incidents.
#WTFisSRE is over - We started some fires, put them out and learned that we should start a few more if we want to create reliable systems. Thank you for helping build a stronger community and sharing lessons for a more reliable world.
Want to grab a recap of your favourite talk?
Or passively create FOMO for the colleague who ‘just doesn’t do virtual conferences’?
Check out our videos!