metrics Introduced in L8

Prometheus

The open-source metrics standard. Pull-based scraping, time-series database, the basis of most cloud-native observability.

this tech depends on / used by alternative Shipyard anchor

What it is

The plain-English version

Prometheus is an open-source monitoring and alerting toolkit, originated at SoundCloud. It pulls metrics from applications via HTTP scraping, stores them in a time-series database, and lets you query them with PromQL. Paired with Grafana for dashboards and Alertmanager for routing alerts.

Why it exists

The problem it solves

Prometheus is the CNCF default for metrics. Anywhere you see Kubernetes, you see Prometheus. The pull model is unusual but well-suited to ephemeral containers (each Pod gets discovered and scraped automatically). It's the standard layer for self-hosted observability.

What it competes with

Alternatives

Alternative	Type	When it wins
Sentry	errors	The error-tracking standard. Captures frontend and backend exceptions with full context. First tool teams add for production observability.
Datadog	APM	Comprehensive observability — metrics, logs, APM, RUM, all under one expensive roof.
ELK Stack	log mgmt	Elasticsearch + Logstash + Kibana — the open-source log management trio. Now also "Elastic Stack" with Beats.

Where it shows up in Shipyard

Deep links

L8SysOps

Vocabulary

The words you'll hear

Exporter: An agent that exposes metrics in Prometheus format. node_exporter, postgres_exporter, etc.
Scrape: Prometheus pulls /metrics from each target on an interval.
Metric type: Counter (only goes up), gauge (any value), histogram (buckets), summary (quantiles).
PromQL: The query language. rate(http_requests_total[5m]).
Alertmanager: Companion service that routes alerts to email, PagerDuty, Slack.
Grafana: Dashboard tool, often paired with Prometheus.
Recording rule: Pre-computed query stored as a new metric. Saves repeated work.

Prompting

Bad vs. good prompt for Prometheus

✕ Bad prompt

set up prometheus

✓ Good prompt

Set up Prometheus monitoring for our Node.js Express app. Use prom-client to expose /metrics with default Node and HTTP request metrics (method, route, status, duration histogram). Write the Prometheus scrape config snippet and a Grafana dashboard JSON for: requests per second by status, p50/p95/p99 latency, error rate. Don't include the full dashboard — show the key panels.

Why it works: Specifies the SDK, what metrics to expose, the scrape config and the actually-useful dashboard panels. Most 'set up Prometheus' answers stop at /metrics; this one finishes the job.

Pitfalls

What bites real teams

⚠ Cardinality explosion

Labels like user_id create millions of series and OOM Prometheus. Keep label cardinality bounded.

⚠ Long-term storage

Prometheus's local storage isn't designed for years of data. Pair with Thanos, Cortex, or Mimir for long retention.

⚠ Alert fatigue

It's tempting to alert on every metric. Most alerts should be SLO-burn-rate alerts, not threshold alerts.

References

Official docs only

prometheus.io/docs

◇Mindmap

The plain-English version

The problem it solves

Alternatives

Deep links

The words you'll hear

Bad vs. good prompt for Prometheus

What bites real teams

Official docs only

Mindmap