L7 · DevOps · Shipyard

What DevOps actually is

DevOps is not a job title, a tool, or a team. It is a culture and a stack of practices for shipping software reliably and often. The single sentence: the people who write the software are accountable for running it. That accountability flips a lot of engineering decisions.

The folklore version: in the bad old days, developers wrote code and threw it over the wall to operations, who deployed and ran it. When something broke, the two sides blamed each other. DevOps is the deliberate erasure of that wall — the same people own the deploy, the on-call, the incident, the postmortem, and the next iteration.

The practical pillars covered in this level:

Describe infrastructure as code, not as clicks in a console (IaC).
Package applications so they run identically anywhere (containers).
Automate the build → test → deploy path (pipelines).
Ship changes with a controlled blast radius (deploy strategies, feature flags).
Reconcile actual state to desired state continuously (GitOps).
Manage secrets and configuration at scale.
Operate with reliability discipline (SRE).

Infrastructure as Code

IaC means: instead of clicking around in the AWS console to create a server, a database, and a load balancer, you write a text file describing what you want, and a tool makes it happen. The text file is committed to git like any other code. Pull requests are reviewed. History is auditable. Re-creating production from scratch is a single command.

The tools you'll meet:

Tool	Language	Sweet spot
Terraform	HCL (its own DSL)	Multi-cloud, the de facto default. Huge ecosystem of providers.
Pulumi	TypeScript, Python, Go, etc.	"Real programming language" IaC — loops, conditions, libraries. Same providers as Terraform underneath.
AWS CDK	TypeScript, Python, Java	AWS-native. Compiles down to CloudFormation. Tight AWS integration.
CloudFormation	YAML / JSON	AWS's original. Verbose. Most teams reach for CDK on top of it now.
SST, Serverless Framework	YAML / TS	Specialized for serverless apps on AWS.

HCL
# A tiny Terraform example — nothing to memorize, just recognize the shape.
resource "aws_s3_bucket" "tasklane_uploads" {
  bucket = "tasklane-uploads-prod"
  tags = {
    Project     = "tasklane"
    Environment = "production"
  }
}

resource "aws_s3_bucket_versioning" "tasklane_uploads" {
  bucket = aws_s3_bucket.tasklane_uploads.id
  versioning_configuration { status = "Enabled" }
}

The vocabulary you'll hear:

Plan: Terraform's "show me what would change before applying it." Always read the plan.
Apply: Execute the plan; modify real infrastructure.
State: Terraform's record of what it currently believes is deployed. Stored remotely (S3, Terraform Cloud) so the whole team shares one source of truth.
Drift: When real infrastructure differs from the IaC code (someone clicked something in the console). Drift is bad; it's also inevitable. Detect and reconcile.
Module: A reusable bundle of IaC code. "We have a VPC module" means a standardized way to create network setup.

Containers — Docker, the noun and the verb

A container is a packaged, portable bundle of an application plus everything it needs to run — the binaries, libraries, files, environment. The whole bundle runs in isolation on any machine that can run containers. The same bundle that runs on your laptop runs on a server in production, byte-identical.

The vocabulary that constantly trips people up:

Image: The static blueprint. A read-only snapshot built from a Dockerfile. Stored in a registry.
Container: A running instance of an image. You can run many containers from one image.
Dockerfile: The recipe. A text file with instructions for how to build the image.
Registry: Where images live. Docker Hub, GitHub Container Registry, AWS ECR, GCP Artifact Registry.
Layer: Each step in a Dockerfile produces a layer. Layers are cached and reused — get the order right and rebuilds become fast.

DOCKERFILE
# Tasklane's Dockerfile — typical multi-stage Node app.
FROM node:20-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci

FROM node:20-alpine AS build
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build

FROM node:20-alpine AS runtime
WORKDIR /app
COPY --from=build /app/.next ./.next
COPY --from=build /app/public ./public
COPY package*.json ./
RUN npm ci --omit=dev
EXPOSE 3000
CMD ["npm", "start"]

Dockerfile → image → registry → container. The image is the unit shipped between machines.

Kubernetes at concept-level — enough to read a YAML file

Kubernetes (often shortened to k8s — eight letters between K and s) is the orchestrator. It runs and manages containers across many machines. You declare what you want (3 copies of the Tasklane app, exposed on port 80, autoscaling between 3 and 20 replicas based on CPU); k8s makes it true and keeps it that way.

The vocabulary, in order of how often you'll hear it:

Pod

The smallest unit. One or more tightly-coupled containers that share network and storage. In practice, most pods have one container. "There are three pods running" means three replicas of your app are alive.

Deployment

The declaration of "I want N pods running this image." Handles rollouts and rollbacks. The manifest you'll touch most often.

Service

A stable network endpoint pointing at a set of pods. Pods come and go (they have ephemeral IPs); the Service has a stable name. "Frontend talks to backend.svc.cluster.local" — that's a Service.

Ingress

The thing that handles incoming traffic from outside the cluster — TLS, hostnames, path routing. "tasklane.com → ingress → service → pods."

Namespace

A logical division of the cluster. Often dev, staging, prod, or per-team. Resources in different namespaces can have the same name without colliding.

ConfigMap & Secret

Where configuration and secrets live. Mounted into pods as files or environment variables. Secrets are base64-encoded by default (not encrypted!) — production setups layer real secret stores on top.

Helm

A package manager for k8s. A "chart" is a parameterized bundle of YAML you can install with one command. helm install postgres bitnami/postgresql beats hand-writing the deployment.

kubectl

The CLI. Pronounced "cube-cuttle" or "cube-control" — the team will judge you for either. kubectl get pods, kubectl logs <pod>, kubectl apply -f deploy.yaml — these three get you most of the way.

Pipelines, deeper — Jenkins, Actions, GitLab CI

L4 introduced CI/CD as the conveyor belt. At this level, three things change: pipelines get richer (multi-stage, with artifact promotion), they vary by tool more visibly, and you'll meet the older world (Jenkins) that still runs much of enterprise.

Tool	Configuration	Lives where	Sweet spot
GitHub Actions	YAML in `.github/workflows/`	GitHub-hosted runners or self-hosted	Anything on GitHub. Cleanest YAML of the modern tools. Massive marketplace.
GitLab CI	YAML in `.gitlab-ci.yml`	GitLab runners (cloud or self-hosted)	Same job for GitLab. Matched dev experience.
Jenkins	Groovy in `Jenkinsfile` (declarative or scripted pipelines)	Self-hosted Jenkins controller + agents	Existing enterprise environments. Plugin ecosystem unmatched. Heavy to operate.
CircleCI	YAML in `.circleci/config.yml`	CircleCI cloud or self-hosted runners	Strong caching and parallelism. Common in mid-sized companies.
Argo Workflows	YAML CRDs on Kubernetes	Inside k8s	K8s-native, complex DAG workflows. Often paired with ArgoCD.
Tekton	YAML CRDs on Kubernetes	Inside k8s	Cloud-native CI building blocks; usually wrapped by other tools.
Buildkite, Drone, TeamCity, Bamboo	varies	varies	Niche or legacy. Real teams run them; rarely the default for a new project.

Jenkins — what every operations person eventually meets

Jenkins is the elder of CI/CD. Open source since 2011 (forked from Hudson, originally 2005). It runs on a controller server that schedules jobs to agents (worker machines). You write a Jenkinsfile in Groovy describing the pipeline. Plugins extend it for almost anything you can imagine — and many things you can't.

GROOVY
// Jenkinsfile — declarative pipeline. Reads top-to-bottom.
pipeline {
  agent any
  environment {
    NODE_ENV = 'production'
  }
  stages {
    stage('Checkout') { steps { checkout scm } }
    stage('Install')  { steps { sh 'npm ci' } }
    stage('Lint')     { steps { sh 'npm run lint' } }
    stage('Test')     { steps { sh 'npm test' } }
    stage('Build')    { steps { sh 'npm run build' } }
    stage('Deploy') {
      when { branch 'main' }
      steps {
        withCredentials([string(credentialsId: 'CF_TOKEN', variable: 'CF_TOKEN')]) {
          sh 'wrangler pages deploy ./out'
        }
      }
    }
  }
  post {
    failure { mail to: 'oncall@tasklane.example', subject: "Build failed: ${env.BUILD_TAG}" }
  }
}

Multi-stage pipelines — promote artifacts, don't rebuild

A mature pipeline builds an artifact once, then promotes that exact artifact through environments. The same Docker image that passed staging is the one that hits production. Rebuilds for prod are how subtle bugs ship.

Promote one artifact through environments. The thing tested in staging is the thing live in prod.

Ephemeral preview environments

For every PR, the pipeline can spin up a temporary deployment at a unique URL — pr-847.preview.tasklane.example. Reviewers click the URL, smoke-test the change in a real environment, then the environment tears down on merge or close. Vercel, Netlify, and Cloudflare Pages do this automatically. On k8s, tools like Argo Rollouts or Kargo handle it.

Deploy strategies — controlling blast radius

"Push the button, code is live for 100% of users immediately" is one strategy. It is the riskiest. The mature alternatives:

Rolling deploy: Replace pods one at a time. Default for k8s deployments. New version comes up, old one comes down, repeat. Cheap, simple, the right default for most cases.
Blue/green: Two complete environments. Blue is live; green is the new version. Once green is verified, flip traffic from blue to green at the load balancer. Rollback = flip back. Requires roughly double the infrastructure during the swap.
Canary: Send a small percentage of traffic (1%, then 5%, 25%, 100%) to the new version while watching error and latency metrics. If anything spikes, roll back automatically. Safest pattern for big or risky changes.
Shadow / mirroring: Send real traffic to the new version but ignore its responses — useful for performance and correctness testing without user-visible risk.

Canary: ramp traffic to the new version gradually, watching metrics. Pause or roll back at the first sign of trouble.

Feature flags — separating deploy from release

A feature flag is a runtime switch that turns a feature on or off without a deploy. The code for the new feature is shipped to production; whether anyone sees it is controlled by a flag.

Why this matters: deploys and releases are different things. A deploy is a technical event (new code is running). A release is a product event (users see the new behavior). Conflating them is how launches go badly. With flags, you can deploy code on Tuesday and release it on Friday, with full ability to roll back the release without touching the deploy.

Tools you'll meet:

LaunchDarkly — the category leader. SaaS, mature, expensive at scale.
Statsig — flags + experimentation + product analytics. Cheaper, growing fast.
Unleash — open source, self-hostable. Good fit if you don't want a SaaS dependency.
PostHog — analytics-first, with feature flags as part of the suite.
OpenFeature — vendor-neutral standard / API. Lets you switch providers without rewriting integration code.

GitOps — the repo is the source of truth

GitOps is a deployment pattern where the desired state of your infrastructure (and applications, on k8s) is declared in a git repo, and an automated controller continuously reconciles the actual state to match. You don't run kubectl apply; you commit a YAML file, and ArgoCD or Flux notices and applies it.

The shift in mindset: git becomes the only way to change production. Want to scale up? Commit a change. Want to roll back? Revert the commit. Want to know what's deployed right now? Look at the repo. The audit trail, the review process, the rollback mechanism — all collapse into one workflow you already know.

Two main controllers:

ArgoCD — the most popular. Has a great UI showing what's deployed vs. what the repo says. CNCF-graduated.
Flux — also mature, often paired with Helm. Slightly lighter touch, no built-in UI.

Secrets at scale

L3 introduced .env files for a single project. At organizational scale, secrets need: rotation, scoping (only this service can read this secret), audit (who accessed what when), and zero appearance in source control.

Tool	Hosting	Sweet spot
HashiCorp Vault	Self-hosted or HCP	The category leader. Powerful, complex, dynamic secrets (auto-generated, short-lived DB credentials).
AWS Secrets Manager / Parameter Store	AWS	Cleanly integrated with everything AWS. The default if you're already there.
GCP Secret Manager / Azure Key Vault	GCP / Azure	Equivalents on the other clouds.
Doppler	SaaS	Developer-friendly, multi-environment, integrates with most platforms.
Infisical	SaaS or self-hosted	Open-source secrets manager with E2E encryption. Newer but well-built.
SOPS + age / git-crypt	Git-based	Encrypted secrets stored alongside code. Pairs well with GitOps.

Configuration management — Ansible, Chef, Puppet

Before containers and IaC took over, the dominant pattern was configuration management — tools that make a fleet of servers look the same. They still matter for hybrid environments, virtual-machine workloads, and anywhere you can't easily containerize.

Tool	Style	Notes
Ansible	Push-based, agentless, YAML "playbooks"	The dominant choice today. Easy to start with. SSH-driven; no agent on target hosts. Owned by Red Hat.
Chef	Pull-based, agent on each node, Ruby DSL "cookbooks"	Older, still in heavy enterprise use. Steeper curve.
Puppet	Pull-based, agent on each node, its own DSL	The original of the bunch (2005). Big in regulated industries.
SaltStack	Both push and pull	Less common; powerful event-driven model.

YAML
# Ansible playbook example — install Nginx on a fleet of Ubuntu hosts.
- name: Provision web servers
  hosts: webservers
  become: true
  tasks:
    - name: Install nginx
      apt:
        name: nginx
        state: present
        update_cache: true
    - name: Ensure nginx is running
      service:
        name: nginx
        state: started
        enabled: true
    - name: Deploy site config
      template:
        src: templates/tasklane.conf.j2
        dest: /etc/nginx/sites-enabled/tasklane.conf
      notify: reload nginx
  handlers:
    - name: reload nginx
      service: { name: nginx, state: reloaded }

The mental model: IaC creates the boxes, configuration management sets up what runs inside them, containers package the apps. Modern stacks blur the lines (Kubernetes does most of what config management used to do), but the older world is still alive.

SRE basics — error budgets, SLI / SLO / SLA revisited

L6 introduced p50 / p95 / p99 latencies and the SLI / SLO / SLA acronyms. SRE turns those into a discipline:

SLI (Service Level Indicator): The thing you measure. "Percentage of homepage requests that return in under 500ms."
SLO (Service Level Objective): The target. "99.9% of homepage requests under 500ms over a 28-day window."
SLA (Service Level Agreement): The contract — usually with customers, often with money attached. "If we drop below 99.5%, we refund a percentage of the bill."
Error budget: The mathematical complement of the SLO. If your SLO is 99.9%, your error budget is 0.1% of the period. Once you've spent the budget, you stop shipping risky changes and focus on reliability work.

End of level

DevOps