Setting Up Kubernetes Monitoring with Prometheus and Grafana

Monitoring your Kubernetes clusters is crucial for maintaining healthy applications and infrastructure. Today, we’ll set up a complete monitoring stack using Prometheus and Grafana.

Why Monitor Kubernetes?

  • Resource Utilization: Track CPU, memory, and storage usage
  • Application Health: Monitor pod status and application metrics
  • Alerting: Get notified before issues become critical
  • Capacity Planning: Understand growth patterns

Installation with Helm

First, let’s install the Prometheus stack:

# Add the Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=admin123

Key Components

Prometheus Server

  • Metrics Collection: Scrapes metrics from various sources
  • Time Series Database: Stores metrics data
  • Query Engine: PromQL for querying metrics

Grafana

  • Visualization: Beautiful dashboards
  • Alerting: Visual alert management
  • Data Sources: Connects to Prometheus

AlertManager

  • Alert Routing: Routes alerts to different channels
  • Grouping: Groups similar alerts
  • Silencing: Temporarily disable alerts

Essential Kubernetes Metrics

# Node CPU usage
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Pod memory usage
container_memory_usage_bytes{pod!="", container!="POD"}

# Pod restart count
increase(kube_pod_container_status_restarts_total[1h])

# Node disk usage
100 - ((node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes)

Tomorrow, we’ll explore advanced Prometheus queries and creating custom alerting rules for your specific use cases.


How do you monitor your Kubernetes clusters? Share your monitoring strategies!