We have already looked at BEST Kubernetes monitoring tools, with the increasing adoption of containers and microservices in the enterprises, monitoring utilities have to handle more services and server instances than ever before. Kubernetes environments vary from deployment to deployment, but they generally have a handful of key components, resources, and potential errors in common. Currently, the Kubernetes ecosystem provides two add-ons for aggregating and reporting monitoring data from your cluster: (1) Metrics Server and (2) kube-state-metrics.
Metrics Server is a cluster add-on that collects resource usage data from each node and provides aggregated metrics through the Metrics API. Metrics Server makes resource metrics such as CPU and memory available for users to query, as well as for the Kubernetes Horizontal Pod Autoscaler to use for auto-scaling workloads.
In addition to monitoring the CPU and memory usage of cluster nodes and pods, you will also need a way to collect metrics tracking the high-level status of the cluster and its constituent objects. Kubernetes API server exposes data about the count, health, and availability of pods, nodes, and other Kubernetes objects. By installing the kube-state-metrics add-on in your cluster, you can consume these metrics to detect and resolve issues with cluster infrastructure, resource constraints, or pod scheduling.
kube-state-metrics service provides additional cluster information that Metrics Server does not. Metrics Server exposes statistics about the resource utilization of Kubernetes objects, whereas kube-state-metrics listens to the Kubernetes API and generates metrics about the state of Kubernetes objects: node status, node capacity (CPU and memory), number of desired/available/unavailable/updated replicas per Deployment, pod status (e.g., waiting, running, ready), and so on.
In this post, we are going to look at what are the key metrics and alerts that are required to monitor your Kubernetes cluster.
At a high level, below are the key metrics to monitor
What to monitor? | Metrics to monitor | Alert Criteria |
Cluster state | Monitor the aggregated resources usage across all nodes in your cluster.
|
|
Node resources | For each of the node monitor :
| If the node’s CPU or memory usage drops below a desired threshold.
|
Missing pod | Health and availability of your pod deployments.
| If the number of available pods for a deployment falls below the number of pods you specified when you created the deployment. |
Pods that are not running | If a pod isn’t running or even scheduled, there could be an issue with either the pod or the cluster, or with your entire Kubernetes deployment.
| Alerts should be based on the status of your pods (“Failed,” ”Pending,” or “Unknown” for the period of time you specify) |
Container restarts | Container restarts could happen when you’re hitting a memory limit (ex.Out of Memory kills) in your containers. Also, there could be an issue with either the container itself or its host. | Kubernetes automatically restarts containers, but setting up an alert will give you an immediate notification later you can analyze and set the proper limits |
Container resource usage | Monitor container resource usage for containers in case you’re hitting resource limits, spikes in resource consumption, | Alerts to check if container CPU and memory usage and on limits are based on thresholds. |
Storage volumes | Monitor storage to
| Alerts to check if available bytes, capacity crosses your thresholds. Identify persistent volumes and apply a different alert threshold or notification for these volumes, which likely hold important application data. |
Control Plane – Etcd | Monitor etcd for the below parameters:
| Alerts to check if any pending or failed proposals or reach inappropriate thresholds. |
Control Plane – API Server | Monitor the API server for below parameters :
| Alerts to check if the rate or number of HTTP requests crosses a desired threshold. |
Control Plane – Scheduler | Monitor the scheduler for the below parameters
| Alerts to check if the rate or number of HTTP requests crosses a desired threshold. |
Control Plane – Controller Manager | Monitor the scheduler for the below parameters:
| Alerts to check if requests to the work queue exceed a maximum threshold. |
Kubernetes events | Collecting events from Kubernetes and from the container engine (such as Docker) allows you to see how pod creation, destruction, starting, or stopping affects the performance of your infrastructure. | Any failure or exception should need to be alerted. |
I hope, I have covered key metrics and alerts that are required to monitor your Kubernetes cluster. Also If I have missed out on any of the key metrics, do let me know.
Like this post? Don’t forget to share it!
Operating a business often entails balancing tight schedules, evolving market dynamics, and shifting consumer requirements.…
Of course, every site has different needs. In the end, however, there is one aspect…
In today's digital-first world, businesses must adopt effective strategies to stay competitive. Social media marketing…
62% of UX designers now use AI to enhance their workflows. Artificial intelligence (AI) rapidly…
The integration of artificial intelligence into graphic design through tools like Adobe Photoshop can save…
The cryptocurrency trading world has grown significantly in recent years, with automation playing a key…
This website uses cookies.