Prometheus & Metrics

Prometheus is the open-source monitoring system that became the standard for cloud-native infrastructure. It uses a pull model, stores time-series data, and provides a powerful query language. If you run containers in production, you will almost certainly encounter Prometheus.

The Pull Model

Unlike push-based systems where applications send metrics to a collector, Prometheus scrapes metrics from your applications. Your service exposes an HTTP endpoint (typically /metrics), and Prometheus periodically fetches it.

Prometheus Server
  |
  |-- scrape every 15s --> App A /metrics
  |-- scrape every 15s --> App B /metrics
  |-- scrape every 15s --> Node Exporter /metrics
  |-- scrape every 15s --> Database Exporter /metrics

Why Pull?

Service discovery. Prometheus finds targets through Kubernetes service discovery, DNS, or consul. New instances are scraped automatically.
No client-side buffering. Applications do not need to worry about where to send metrics or what happens if the collector is down.
Easier debugging. You can curl the /metrics endpoint directly and see exactly what Prometheus sees.

Configuration

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: "myapp"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: (.+)
        replacement: ${1}:$1

In Kubernetes, annotate your pods to be discovered:

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"

The /metrics Endpoint

The endpoint returns metrics in a simple text format:

# HELP http_requests_total Total number of HTTP requests
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/api/users",status="200"} 14523
http_requests_total{method="GET",path="/api/users",status="500"} 12
http_requests_total{method="POST",path="/api/users",status="201"} 891

# HELP http_request_duration_seconds Duration of HTTP requests
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{method="GET",le="0.01"} 10234
http_request_duration_seconds_bucket{method="GET",le="0.05"} 13892
http_request_duration_seconds_bucket{method="GET",le="0.1"} 14401
http_request_duration_seconds_bucket{method="GET",le="+Inf"} 14523
http_request_duration_seconds_sum{method="GET"} 523.41
http_request_duration_seconds_count{method="GET"} 14523

You can verify what your application exposes at any time:

curl http://localhost:8080/metrics

Metric Types

Counter

A value that only goes up. Resets to zero when the process restarts.

Use for: total requests, total errors, bytes transferred.

from prometheus_client import Counter

REQUEST_COUNT = Counter(
    "http_requests_total",
    "Total HTTP requests",
    ["method", "path", "status"]
)

# In your request handler
REQUEST_COUNT.labels(method="GET", path="/api/users", status="200").inc()

You never read a counter's raw value directly. You use rate() or increase() to compute how fast it is growing.

Gauge

A value that goes up and down. Represents a current state.

Use for: temperature, queue depth, active connections, memory usage.

from prometheus_client import Gauge

ACTIVE_CONNECTIONS = Gauge(
    "active_connections",
    "Number of active connections"
)

ACTIVE_CONNECTIONS.inc()   # Connection opened
ACTIVE_CONNECTIONS.dec()   # Connection closed
ACTIVE_CONNECTIONS.set(42) # Set to specific value

Histogram

Samples observations and counts them in configurable buckets. Also provides a sum and count.

Use for: request duration, response size, any value where you care about distribution.

from prometheus_client import Histogram

REQUEST_DURATION = Histogram(
    "http_request_duration_seconds",
    "HTTP request duration in seconds",
    ["method"],
    buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]
)

# In your request handler
with REQUEST_DURATION.labels(method="GET").time():
    handle_request()

Choose buckets based on your SLOs. If your target is p99 under 500ms, you need buckets around that range.

Summary

Similar to histogram but calculates quantiles on the client side. Generally prefer histograms -- they are aggregatable across instances, summaries are not.

Labels

Labels add dimensions to metrics. A single metric name with different label values produces multiple time series.

http_requests_total{method="GET", status="200"}  -> one time series
http_requests_total{method="POST", status="201"} -> another time series
http_requests_total{method="GET", status="500"}  -> another time series

Label Best Practices

Keep cardinality low. Every unique label combination is a separate time series. A label with user IDs (millions of values) will destroy Prometheus.
Use labels for dimensions you will filter or group by: method, status code, endpoint, service, instance.
Do not put high-cardinality values in labels: user IDs, email addresses, request IDs, timestamps.

# Good: low cardinality
REQUEST_COUNT.labels(method="GET", status="200", endpoint="/api/users")

# Bad: high cardinality -- will create millions of time series
REQUEST_COUNT.labels(method="GET", user_id="abc-123-def")

PromQL Basics

PromQL is the query language for Prometheus. It operates on time-series data.

rate()

Compute the per-second rate of a counter over a time window:

rate(http_requests_total[5m])

This returns the average requests per second over the last 5 minutes. Use rate() for counters, never read counters raw.

sum()

Aggregate across label dimensions:

sum(rate(http_requests_total[5m])) by (status)

Total request rate, grouped by status code.

avg()

Average across instances:

avg(rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m])) by (method)

Average request duration by HTTP method.

histogram_quantile()

Compute percentiles from histogram data:

histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

The 99th percentile request duration over the last 5 minutes.

Practical Queries

# Error rate as a percentage
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
* 100

# Request rate per endpoint
sum(rate(http_requests_total[5m])) by (path)

# Memory usage as percentage of limit
container_memory_usage_bytes / container_spec_memory_limit_bytes * 100

# Top 5 pods by CPU usage
topk(5, rate(container_cpu_usage_seconds_total[5m]))

Client Libraries

Prometheus has official client libraries for most languages:

# Python
pip install prometheus-client

# Go (built into the standard prometheus packages)
go get github.com/prometheus/client_golang/prometheus

# Java
# Add io.prometheus:simpleclient to your build

# Node.js
npm install prom-client

Instrumenting a Python Application

from prometheus_client import start_http_server, Counter, Histogram

REQUESTS = Counter("http_requests_total", "Total requests", ["method", "status"])
DURATION = Histogram("http_request_duration_seconds", "Request duration")

# Start metrics server on port 8000
start_http_server(8000)

# In your handler
@DURATION.time()
def handle_request(request):
    # process request
    REQUESTS.labels(method=request.method, status="200").inc()

Common Pitfalls

High-cardinality labels. Adding user IDs or request IDs as labels creates millions of time series and overwhelms Prometheus. Use logs or traces for that.
Not using rate() on counters. A raw counter value is meaningless -- it only goes up. Always wrap counters with rate() or increase().
Scrape interval mismatch. If your rate window is shorter than your scrape interval, you get no data. Use a window at least 4x your scrape interval: rate(metric[1m]) with a 15s scrape is fine.
Exposing too many metrics. Every metric has a storage cost. Instrument what you will actually alert on or dashboard. Remove the rest.
Not setting up retention. Prometheus stores data locally. Without retention limits, the disk fills up. Set --storage.tsdb.retention.time=15d or use remote write for long-term storage.
Ignoring the /metrics endpoint. Always verify your instrumentation by curling the endpoint directly. If it is not there, Prometheus cannot scrape it.

Key Takeaways

Prometheus pulls metrics from your applications via HTTP. Expose a /metrics endpoint and let Prometheus find it.
Use counters for things that only increase, gauges for current state, histograms for distributions.
Keep label cardinality low. High-cardinality labels are the most common way to break Prometheus.
Learn four PromQL functions and you cover 80% of use cases: rate(), sum(), histogram_quantile(), and avg().
Instrument your application code with client libraries. The /metrics endpoint takes minutes to add and pays for itself the first time something goes wrong in production.