metrics Introduced in L8

Prometheus

The open-source metrics standard. Pull-based scraping, time-series database, the basis of most cloud-native observability.

Mindmap

hover · click to navigate
this tech depends on / used by alternative Shipyard anchor
What it is

The plain-English version

Prometheus is an open-source monitoring and alerting toolkit, originated at SoundCloud. It pulls metrics from applications via HTTP scraping, stores them in a time-series database, and lets you query them with PromQL. Paired with Grafana for dashboards and Alertmanager for routing alerts.

Why it exists

The problem it solves

Prometheus is the CNCF default for metrics. Anywhere you see Kubernetes, you see Prometheus. The pull model is unusual but well-suited to ephemeral containers (each Pod gets discovered and scraped automatically). It's the standard layer for self-hosted observability.

What it competes with

Alternatives

AlternativeTypeWhen it wins
SentryerrorsThe error-tracking standard. Captures frontend and backend exceptions with full context. First tool teams add for production observability.
DatadogAPMComprehensive observability — metrics, logs, APM, RUM, all under one expensive roof.
ELK Stacklog mgmtElasticsearch + Logstash + Kibana — the open-source log management trio. Now also "Elastic Stack" with Beats.
Where it shows up in Shipyard

Deep links

Vocabulary

The words you'll hear

Exporter
An agent that exposes metrics in Prometheus format. node_exporter, postgres_exporter, etc.
Scrape
Prometheus pulls /metrics from each target on an interval.
Metric type
Counter (only goes up), gauge (any value), histogram (buckets), summary (quantiles).
PromQL
The query language. rate(http_requests_total[5m]).
Alertmanager
Companion service that routes alerts to email, PagerDuty, Slack.
Grafana
Dashboard tool, often paired with Prometheus.
Recording rule
Pre-computed query stored as a new metric. Saves repeated work.
Prompting

Bad vs. good prompt for Prometheus

✕ Bad prompt
set up prometheus
✓ Good prompt
Set up Prometheus monitoring for our Node.js Express app. Use prom-client to expose /metrics with default Node and HTTP request metrics (method, route, status, duration histogram). Write the Prometheus scrape config snippet and a Grafana dashboard JSON for: requests per second by status, p50/p95/p99 latency, error rate. Don't include the full dashboard — show the key panels.

Why it works: Specifies the SDK, what metrics to expose, the scrape config and the actually-useful dashboard panels. Most 'set up Prometheus' answers stop at /metrics; this one finishes the job.

Pitfalls

What bites real teams

⚠ Cardinality explosion

Labels like user_id create millions of series and OOM Prometheus. Keep label cardinality bounded.

⚠ Long-term storage

Prometheus's local storage isn't designed for years of data. Pair with Thanos, Cortex, or Mimir for long retention.

⚠ Alert fatigue

It's tempting to alert on every metric. Most alerts should be SLO-burn-rate alerts, not threshold alerts.

References

Official docs only