Level 07 · 100 → 120 · Yardmaster path
DevOps
The practice of shipping reliably. Infrastructure as code. Containers and orchestration. Deeper CI/CD with Jenkins, GitHub Actions, GitLab CI. Deploy strategies — blue/green, canary, rolling. Feature flags. GitOps. Secrets at scale. Configuration management with Ansible, Chef, Puppet. SRE basics — SLIs, SLOs, error budgets.
What DevOps actually is
DevOps is not a job title, a tool, or a team. It is a culture and a stack of practices for shipping software reliably and often. The single sentence: the people who write the software are accountable for running it. That accountability flips a lot of engineering decisions.
The folklore version: in the bad old days, developers wrote code and threw it over the wall to operations, who deployed and ran it. When something broke, the two sides blamed each other. DevOps is the deliberate erasure of that wall — the same people own the deploy, the on-call, the incident, the postmortem, and the next iteration.
The practical pillars covered in this level:
- Describe infrastructure as code, not as clicks in a console (IaC).
- Package applications so they run identically anywhere (containers).
- Automate the build → test → deploy path (pipelines).
- Ship changes with a controlled blast radius (deploy strategies, feature flags).
- Reconcile actual state to desired state continuously (GitOps).
- Manage secrets and configuration at scale.
- Operate with reliability discipline (SRE).
Infrastructure as Code
IaC means: instead of clicking around in the AWS console to create a server, a database, and a load balancer, you write a text file describing what you want, and a tool makes it happen. The text file is committed to git like any other code. Pull requests are reviewed. History is auditable. Re-creating production from scratch is a single command.
The tools you'll meet:
| Tool | Language | Sweet spot |
|---|---|---|
| Terraform | HCL (its own DSL) | Multi-cloud, the de facto default. Huge ecosystem of providers. |
| Pulumi | TypeScript, Python, Go, etc. | "Real programming language" IaC — loops, conditions, libraries. Same providers as Terraform underneath. |
| AWS CDK | TypeScript, Python, Java | AWS-native. Compiles down to CloudFormation. Tight AWS integration. |
| CloudFormation | YAML / JSON | AWS's original. Verbose. Most teams reach for CDK on top of it now. |
| SST, Serverless Framework | YAML / TS | Specialized for serverless apps on AWS. |
HCL# A tiny Terraform example — nothing to memorize, just recognize the shape. resource "aws_s3_bucket" "tasklane_uploads" { bucket = "tasklane-uploads-prod" tags = { Project = "tasklane" Environment = "production" } } resource "aws_s3_bucket_versioning" "tasklane_uploads" { bucket = aws_s3_bucket.tasklane_uploads.id versioning_configuration { status = "Enabled" } }
The vocabulary you'll hear:
- Plan
- Terraform's "show me what would change before applying it." Always read the plan.
- Apply
- Execute the plan; modify real infrastructure.
- State
- Terraform's record of what it currently believes is deployed. Stored remotely (S3, Terraform Cloud) so the whole team shares one source of truth.
- Drift
- When real infrastructure differs from the IaC code (someone clicked something in the console). Drift is bad; it's also inevitable. Detect and reconcile.
- Module
- A reusable bundle of IaC code. "We have a VPC module" means a standardized way to create network setup.
Containers — Docker, the noun and the verb
A container is a packaged, portable bundle of an application plus everything it needs to run — the binaries, libraries, files, environment. The whole bundle runs in isolation on any machine that can run containers. The same bundle that runs on your laptop runs on a server in production, byte-identical.
The vocabulary that constantly trips people up:
- Image
- The static blueprint. A read-only snapshot built from a
Dockerfile. Stored in a registry. - Container
- A running instance of an image. You can run many containers from one image.
- Dockerfile
- The recipe. A text file with instructions for how to build the image.
- Registry
- Where images live. Docker Hub, GitHub Container Registry, AWS ECR, GCP Artifact Registry.
- Layer
- Each step in a Dockerfile produces a layer. Layers are cached and reused — get the order right and rebuilds become fast.
DOCKERFILE# Tasklane's Dockerfile — typical multi-stage Node app. FROM node:20-alpine AS deps WORKDIR /app COPY package*.json ./ RUN npm ci FROM node:20-alpine AS build WORKDIR /app COPY --from=deps /app/node_modules ./node_modules COPY . . RUN npm run build FROM node:20-alpine AS runtime WORKDIR /app COPY --from=build /app/.next ./.next COPY --from=build /app/public ./public COPY package*.json ./ RUN npm ci --omit=dev EXPOSE 3000 CMD ["npm", "start"]
Kubernetes at concept-level — enough to read a YAML file
Kubernetes (often shortened to k8s — eight letters between K and s) is the orchestrator. It runs and manages containers across many machines. You declare what you want (3 copies of the Tasklane app, exposed on port 80, autoscaling between 3 and 20 replicas based on CPU); k8s makes it true and keeps it that way.
The vocabulary, in order of how often you'll hear it:
Pod
The smallest unit. One or more tightly-coupled containers that share network and storage. In practice, most pods have one container. "There are three pods running" means three replicas of your app are alive.
Deployment
The declaration of "I want N pods running this image." Handles rollouts and rollbacks. The manifest you'll touch most often.
Service
A stable network endpoint pointing at a set of pods. Pods come and go (they have ephemeral IPs); the Service has a stable name. "Frontend talks to backend.svc.cluster.local" — that's a Service.
Ingress
The thing that handles incoming traffic from outside the cluster — TLS, hostnames, path routing. "tasklane.com → ingress → service → pods."
Namespace
A logical division of the cluster. Often dev, staging, prod, or per-team. Resources in different namespaces can have the same name without colliding.
ConfigMap & Secret
Where configuration and secrets live. Mounted into pods as files or environment variables. Secrets are base64-encoded by default (not encrypted!) — production setups layer real secret stores on top.
Helm
A package manager for k8s. A "chart" is a parameterized bundle of YAML you can install with one command. helm install postgres bitnami/postgresql beats hand-writing the deployment.
kubectl
The CLI. Pronounced "cube-cuttle" or "cube-control" — the team will judge you for either. kubectl get pods, kubectl logs <pod>, kubectl apply -f deploy.yaml — these three get you most of the way.
Pipelines, deeper — Jenkins, Actions, GitLab CI
L4 introduced CI/CD as the conveyor belt. At this level, three things change: pipelines get richer (multi-stage, with artifact promotion), they vary by tool more visibly, and you'll meet the older world (Jenkins) that still runs much of enterprise.
| Tool | Configuration | Lives where | Sweet spot |
|---|---|---|---|
| GitHub Actions | YAML in .github/workflows/ | GitHub-hosted runners or self-hosted | Anything on GitHub. Cleanest YAML of the modern tools. Massive marketplace. |
| GitLab CI | YAML in .gitlab-ci.yml | GitLab runners (cloud or self-hosted) | Same job for GitLab. Matched dev experience. |
| Jenkins | Groovy in Jenkinsfile (declarative or scripted pipelines) | Self-hosted Jenkins controller + agents | Existing enterprise environments. Plugin ecosystem unmatched. Heavy to operate. |
| CircleCI | YAML in .circleci/config.yml | CircleCI cloud or self-hosted runners | Strong caching and parallelism. Common in mid-sized companies. |
| Argo Workflows | YAML CRDs on Kubernetes | Inside k8s | K8s-native, complex DAG workflows. Often paired with ArgoCD. |
| Tekton | YAML CRDs on Kubernetes | Inside k8s | Cloud-native CI building blocks; usually wrapped by other tools. |
| Buildkite, Drone, TeamCity, Bamboo | varies | varies | Niche or legacy. Real teams run them; rarely the default for a new project. |
Jenkins — what every operations person eventually meets
Jenkins is the elder of CI/CD. Open source since 2011 (forked from Hudson, originally 2005). It runs on a controller server that schedules jobs to agents (worker machines). You write a Jenkinsfile in Groovy describing the pipeline. Plugins extend it for almost anything you can imagine — and many things you can't.
GROOVY// Jenkinsfile — declarative pipeline. Reads top-to-bottom. pipeline { agent any environment { NODE_ENV = 'production' } stages { stage('Checkout') { steps { checkout scm } } stage('Install') { steps { sh 'npm ci' } } stage('Lint') { steps { sh 'npm run lint' } } stage('Test') { steps { sh 'npm test' } } stage('Build') { steps { sh 'npm run build' } } stage('Deploy') { when { branch 'main' } steps { withCredentials([string(credentialsId: 'CF_TOKEN', variable: 'CF_TOKEN')]) { sh 'wrangler pages deploy ./out' } } } } post { failure { mail to: 'oncall@tasklane.example', subject: "Build failed: ${env.BUILD_TAG}" } } }
Multi-stage pipelines — promote artifacts, don't rebuild
A mature pipeline builds an artifact once, then promotes that exact artifact through environments. The same Docker image that passed staging is the one that hits production. Rebuilds for prod are how subtle bugs ship.
Ephemeral preview environments
For every PR, the pipeline can spin up a temporary deployment at a unique URL — pr-847.preview.tasklane.example. Reviewers click the URL, smoke-test the change in a real environment, then the environment tears down on merge or close. Vercel, Netlify, and Cloudflare Pages do this automatically. On k8s, tools like Argo Rollouts or Kargo handle it.
Deploy strategies — controlling blast radius
"Push the button, code is live for 100% of users immediately" is one strategy. It is the riskiest. The mature alternatives:
- Rolling deploy
- Replace pods one at a time. Default for k8s deployments. New version comes up, old one comes down, repeat. Cheap, simple, the right default for most cases.
- Blue/green
- Two complete environments. Blue is live; green is the new version. Once green is verified, flip traffic from blue to green at the load balancer. Rollback = flip back. Requires roughly double the infrastructure during the swap.
- Canary
- Send a small percentage of traffic (1%, then 5%, 25%, 100%) to the new version while watching error and latency metrics. If anything spikes, roll back automatically. Safest pattern for big or risky changes.
- Shadow / mirroring
- Send real traffic to the new version but ignore its responses — useful for performance and correctness testing without user-visible risk.
Feature flags — separating deploy from release
A feature flag is a runtime switch that turns a feature on or off without a deploy. The code for the new feature is shipped to production; whether anyone sees it is controlled by a flag.
Why this matters: deploys and releases are different things. A deploy is a technical event (new code is running). A release is a product event (users see the new behavior). Conflating them is how launches go badly. With flags, you can deploy code on Tuesday and release it on Friday, with full ability to roll back the release without touching the deploy.
Tools you'll meet:
- LaunchDarkly — the category leader. SaaS, mature, expensive at scale.
- Statsig — flags + experimentation + product analytics. Cheaper, growing fast.
- Unleash — open source, self-hostable. Good fit if you don't want a SaaS dependency.
- PostHog — analytics-first, with feature flags as part of the suite.
- OpenFeature — vendor-neutral standard / API. Lets you switch providers without rewriting integration code.
GitOps — the repo is the source of truth
GitOps is a deployment pattern where the desired state of your infrastructure (and applications, on k8s) is declared in a git repo, and an automated controller continuously reconciles the actual state to match. You don't run kubectl apply; you commit a YAML file, and ArgoCD or Flux notices and applies it.
The shift in mindset: git becomes the only way to change production. Want to scale up? Commit a change. Want to roll back? Revert the commit. Want to know what's deployed right now? Look at the repo. The audit trail, the review process, the rollback mechanism — all collapse into one workflow you already know.
Two main controllers:
- ArgoCD — the most popular. Has a great UI showing what's deployed vs. what the repo says. CNCF-graduated.
- Flux — also mature, often paired with Helm. Slightly lighter touch, no built-in UI.
Secrets at scale
L3 introduced .env files for a single project. At organizational scale, secrets need: rotation, scoping (only this service can read this secret), audit (who accessed what when), and zero appearance in source control.
| Tool | Hosting | Sweet spot |
|---|---|---|
| HashiCorp Vault | Self-hosted or HCP | The category leader. Powerful, complex, dynamic secrets (auto-generated, short-lived DB credentials). |
| AWS Secrets Manager / Parameter Store | AWS | Cleanly integrated with everything AWS. The default if you're already there. |
| GCP Secret Manager / Azure Key Vault | GCP / Azure | Equivalents on the other clouds. |
| Doppler | SaaS | Developer-friendly, multi-environment, integrates with most platforms. |
| Infisical | SaaS or self-hosted | Open-source secrets manager with E2E encryption. Newer but well-built. |
| SOPS + age / git-crypt | Git-based | Encrypted secrets stored alongside code. Pairs well with GitOps. |
Configuration management — Ansible, Chef, Puppet
Before containers and IaC took over, the dominant pattern was configuration management — tools that make a fleet of servers look the same. They still matter for hybrid environments, virtual-machine workloads, and anywhere you can't easily containerize.
| Tool | Style | Notes |
|---|---|---|
| Ansible | Push-based, agentless, YAML "playbooks" | The dominant choice today. Easy to start with. SSH-driven; no agent on target hosts. Owned by Red Hat. |
| Chef | Pull-based, agent on each node, Ruby DSL "cookbooks" | Older, still in heavy enterprise use. Steeper curve. |
| Puppet | Pull-based, agent on each node, its own DSL | The original of the bunch (2005). Big in regulated industries. |
| SaltStack | Both push and pull | Less common; powerful event-driven model. |
YAML# Ansible playbook example — install Nginx on a fleet of Ubuntu hosts. - name: Provision web servers hosts: webservers become: true tasks: - name: Install nginx apt: name: nginx state: present update_cache: true - name: Ensure nginx is running service: name: nginx state: started enabled: true - name: Deploy site config template: src: templates/tasklane.conf.j2 dest: /etc/nginx/sites-enabled/tasklane.conf notify: reload nginx handlers: - name: reload nginx service: { name: nginx, state: reloaded }
The mental model: IaC creates the boxes, configuration management sets up what runs inside them, containers package the apps. Modern stacks blur the lines (Kubernetes does most of what config management used to do), but the older world is still alive.
SRE basics — error budgets, SLI / SLO / SLA revisited
L6 introduced p50 / p95 / p99 latencies and the SLI / SLO / SLA acronyms. SRE turns those into a discipline:
- SLI (Service Level Indicator)
- The thing you measure. "Percentage of homepage requests that return in under 500ms."
- SLO (Service Level Objective)
- The target. "99.9% of homepage requests under 500ms over a 28-day window."
- SLA (Service Level Agreement)
- The contract — usually with customers, often with money attached. "If we drop below 99.5%, we refund a percentage of the bill."
- Error budget
- The mathematical complement of the SLO. If your SLO is 99.9%, your error budget is 0.1% of the period. Once you've spent the budget, you stop shipping risky changes and focus on reliability work.
Wrap-up
Jargon recap
- DevOps
- Culture and practice — same people own writing, deploying, and running.
- IaC
- Infrastructure as Code. Terraform, Pulumi, CDK. Console clicks → text files.
- Container / image / registry
- Packaged app / static blueprint / where images live.
- Kubernetes / k8s / kubectl
- Container orchestrator / its nickname / its CLI.
- Pod / Deployment / Service / Ingress
- k8s primitives — running container(s) / declared state / stable endpoint / external traffic.
- Jenkins / Jenkinsfile
- Self-hosted CI/CD elder. Groovy pipeline definition.
- Rolling / blue-green / canary
- Three deploy strategies of increasing safety and complexity.
- Feature flag
- Runtime switch. Separates deploy from release.
- GitOps
- Git is the source of truth. Controllers reconcile reality to repo.
- Vault / Secrets Manager
- Secret stores at scale. Scoped, rotated, audited.
- Ansible / Chef / Puppet
- Configuration management — make a fleet of machines identical.
- SLI / SLO / SLA / error budget
- Reliability discipline. Indicator / objective / contract / budget.
You should now be able to
Mini-exercise
Pick a service you use (Slack, Linear, Stripe). Sketch how you imagine they deploy it: what's the unit of code, how does it get from commit to production, what deploy strategy fits their risk profile? Then look up any public engineering blog posts they have and compare. The gap between your guess and the reality is the lesson.