Rancher Cloud Native Accelerator for Operational Excellence

Screenshot 2023-01-03 at 07.40.45

What problem are we solving?

  • Reduced effectiveness in Day 2 operations from repetitive manual tasks
  • Lack of predictability and reliability in platform engineering tasks
  • Changes are either not tracked or not transparent due to the team taking a “ClickOps” approach
  • Infrastructure teams lose track and control of multiple clusters across environments
  • Capability gaps in supporting Cloud Native Software Development Life Cycle (SDLC)

When we’re done, you get:

  • Effective Rancher Day 2 operations based on Rancher Management, with improved  automation, observability, security, and networking
  • Increased productivity for Delivery teams and Platform teams
  • Advanced life cycle management for Kubernetes clusters at scale
  • Eliminating toil through automation to increase efficiency
  • Documentation and delivery review for complete knowledge transfer and handover

Who benefits the most?

  • Companies that experience pain dealing with multiple cluster versions, cluster upgrades, and ongoing maintenance
  • Companies that want to mature their SDLC process to match with Cloud Native technologies
  • Companies that want to scale their Rancher Kubernetes setup to operate reliable systems, upskill their teams, and continuously innovate using advanced Cloud Native tools and processes

How does it work? - the delivery

Phase 1: Review Rancher installation

You work with a balanced team of engineers, architects and consultants to plot where you are, define your desired end state and methodologically plan the path to achieve them.

  • Compare the current state of the Rancher installation against best practices
  • Review the level of automation
  • Track the performance of third-party integrations: What are they supposed to do? Do they deliver the intended results?
  • Inspect security posture: role-based access control; encryption for critical data; secret management solution; authentication & authorisation; network security
  • Review observability approach: Are Cloud Native observability tools installed with centralised logs, metric connection, and an effective alert management system?
  • Evaluate transparency of operational strategy
  • Assess operational practices and procedures
  • Review architecture: How does Rancher interact with the rest of the architecture?
  • Organisation: How are other teams working with the Rancher platform ops team?
  • Identify and prioritise work items for phases 2 + 3 together with the customer
  • Reporting on review and work items for active stakeholder management
  • Build a prioritised list of work items for the enhanced MVP phase

Phase 2: Build enhanced MVP to stabilise the situation

This is where the change begins. It includes a multifunctional team of engineers to implement the designed solutions, integrations, and architectures together with your team.

  • Upgrade the installation following Rancher best practices
  • Impact engineering: Resolve high-priority blockers with MVP-style installation for cloud vendors, identity providers, and other CNCF tools
  • Finalise designs for updated Cloud Native architecture
  • Finalise design infrastructure-as-code (IaC) approach
  • Immersive knowledge sharing: training sessions, pair programming, workshops, hackathon
  • Agree on a prioritised list of work items for the Cloud Native day 2 operations phase
  • Active stakeholder management: Presentation of solution designs & upgrades

Phase 3: Scale implementation to enable day 2 operational excellence

We implement the designed solutions, integrations, and architecture together with your team.

  • Improve authentication with identity provider configuration and authorisation using Role-based Access Control (RBAC) through built-in Rancher functionality
  • Establish Kubernetes security best practices according to CIS benchmarking, with configuration settings like node hardening and encryption of etcd
  • Improve container security by implementing vulnerability monitoring with tools like Trivy or NeuVector and implementing runtime security with Falco or NeuVector


  • Improve metrics collection to increase visibility across the environment
  • Implement alert management to respond to incidents more proactively
  • Enable log centralisation through built-in Rancher functionality or open-source tools like the ELK Stack and Grafana Loki, to allow practical insights from a single pane of glass


  • Apply horizontal pod and cluster autoscaling
  • Instance right sizing and application resource configuration to optimise the utilisation of resources in the environment


  • Reduce toil and manual operations with infrastructure automation (Ansible, Terraform) and improved CI/CD implementation
  • Implement a GitOps approach using open-source tools like ArgoCD to improve deployment strategy
  • Improve reliability
  • Validate your error budgets, SLIs, SLOs, and SLAs

Phase 4: Completion and handover

We implement the designed solutions, integrations, and architecture together with your team.

  • Provide options for further immersive upskilling of the customer team during implementation: training sessions, pair-programming, workshops, and hackathons.
  • Documentation
  • Complete knowledge transfer and handover
  • Delivery review: What did we achieve together?
  • Discuss further improvements

The world is changing fast.
The only way to survive it is to build for it.

Talk to our Experts

Book your 15 min 1:1 with:


Felix Evert
Global Partnerships Manager

Book a meeting

Let’s take the next step together


Fill in the form to request a detailed quotation