Prometheus is a monitoring solution that gathers time-series based numerical data. It is a tool that can help a lot in alerting on critical production issues, incident response, post-mortem analysis, and metrics. It is a Cloud Native—and particularly Kubernetes-native— system designed to provide unified observability of Cloud Native workloads.
This package aims to provide a robust Prometheus setup along with the metrics, dashboards and alerts tailored to your specific needs. We will use of a pre-existing Kubernetes cluster to deploy the Prometheus stack and will use either our own demo application or your app to initially monitor workloads through the new setup.
Along with documentation and thorough handover, we will make sure that your engineers are confident in maintaining and configuring the package.
What You Will Need
- An existing Kubernetes cluster.
- Access for our team to a Kubernetes cluster to deploy the stack
- Availability of one Dev and one Ops engineer at all times
- Workloads already running on a Kubernetes cluster. If none are, we will deploy a demo application to demonstrate the working functionality of the Prometheus stack.
- Expose a metrics endpoint for Prometheus to scrape, if your application is being used for this project.
What to Expect
Week 1: Preparation and information gathering
- Identify custom alerts for Alertmanager
- Gather information regarding creating dashboards in Grafana
- Identify cluster requirements (how many nodes, alert levels, needed metrics)
- Identify how to expose Grafana (DNS, internally, externally, SSL, etc.)
- Identify how to store the data (Cloud Storage, NFS, NAS, etc)
Weeks 2 and 3: Deployment
- Deploy to Kubernetes cluster:
- A demo application or your company’s app to the cluster
- Create custom alerts.
- Create Grafana dashboards.
- Allocate persistent storage.
- Configure access to Grafana.
Week 4: Documentation and knowledge sharing
- Knowledge-sharing sessions for all the above
- Documentation creation in your preferred method
- Code handover in your preferred tool