What DevOps actually is

DevOps is not a job title, a tool, or a team. It is a culture and a stack of practices for shipping software reliably and often. The single sentence: the people who write the software are accountable for running it. That accountability flips a lot of engineering decisions.

The folklore version: in the bad old days, developers wrote code and threw it over the wall to operations, who deployed and ran it. When something broke, the two sides blamed each other. DevOps is the deliberate erasure of that wall — the same people own the deploy, the on-call, the incident, the postmortem, and the next iteration.

The practical pillars covered in this level:

  1. Describe infrastructure as code, not as clicks in a console (IaC).
  2. Package applications so they run identically anywhere (containers).
  3. Automate the build → test → deploy path (pipelines).
  4. Ship changes with a controlled blast radius (deploy strategies, feature flags).
  5. Reconcile actual state to desired state continuously (GitOps).
  6. Manage secrets and configuration at scale.
  7. Operate with reliability discipline (SRE).

Infrastructure as Code

IaC means: instead of clicking around in the AWS console to create a server, a database, and a load balancer, you write a text file describing what you want, and a tool makes it happen. The text file is committed to git like any other code. Pull requests are reviewed. History is auditable. Re-creating production from scratch is a single command.

The tools you'll meet:

ToolLanguageSweet spot
TerraformHCL (its own DSL)Multi-cloud, the de facto default. Huge ecosystem of providers.
PulumiTypeScript, Python, Go, etc."Real programming language" IaC — loops, conditions, libraries. Same providers as Terraform underneath.
AWS CDKTypeScript, Python, JavaAWS-native. Compiles down to CloudFormation. Tight AWS integration.
CloudFormationYAML / JSONAWS's original. Verbose. Most teams reach for CDK on top of it now.
SST, Serverless FrameworkYAML / TSSpecialized for serverless apps on AWS.
HCL
# A tiny Terraform example — nothing to memorize, just recognize the shape. resource "aws_s3_bucket" "tasklane_uploads" { bucket = "tasklane-uploads-prod" tags = { Project = "tasklane" Environment = "production" } } resource "aws_s3_bucket_versioning" "tasklane_uploads" { bucket = aws_s3_bucket.tasklane_uploads.id versioning_configuration { status = "Enabled" } }

The vocabulary you'll hear:

Plan
Terraform's "show me what would change before applying it." Always read the plan.
Apply
Execute the plan; modify real infrastructure.
State
Terraform's record of what it currently believes is deployed. Stored remotely (S3, Terraform Cloud) so the whole team shares one source of truth.
Drift
When real infrastructure differs from the IaC code (someone clicked something in the console). Drift is bad; it's also inevitable. Detect and reconcile.
Module
A reusable bundle of IaC code. "We have a VPC module" means a standardized way to create network setup.

Containers — Docker, the noun and the verb

A container is a packaged, portable bundle of an application plus everything it needs to run — the binaries, libraries, files, environment. The whole bundle runs in isolation on any machine that can run containers. The same bundle that runs on your laptop runs on a server in production, byte-identical.

The vocabulary that constantly trips people up:

Image
The static blueprint. A read-only snapshot built from a Dockerfile. Stored in a registry.
Container
A running instance of an image. You can run many containers from one image.
Dockerfile
The recipe. A text file with instructions for how to build the image.
Registry
Where images live. Docker Hub, GitHub Container Registry, AWS ECR, GCP Artifact Registry.
Layer
Each step in a Dockerfile produces a layer. Layers are cached and reused — get the order right and rebuilds become fast.
DOCKERFILE
# Tasklane's Dockerfile — typical multi-stage Node app. FROM node:20-alpine AS deps WORKDIR /app COPY package*.json ./ RUN npm ci FROM node:20-alpine AS build WORKDIR /app COPY --from=deps /app/node_modules ./node_modules COPY . . RUN npm run build FROM node:20-alpine AS runtime WORKDIR /app COPY --from=build /app/.next ./.next COPY --from=build /app/public ./public COPY package*.json ./ RUN npm ci --omit=dev EXPOSE 3000 CMD ["npm", "start"]
Dockerfile recipe build Image layered, immutable push Registry ECR, GHCR, Hub pull Container running instance Host Same image runs identically on a laptop, a CI runner, and a production cluster.
Dockerfile → image → registry → container. The image is the unit shipped between machines.

Kubernetes at concept-level — enough to read a YAML file

Kubernetes (often shortened to k8s — eight letters between K and s) is the orchestrator. It runs and manages containers across many machines. You declare what you want (3 copies of the Tasklane app, exposed on port 80, autoscaling between 3 and 20 replicas based on CPU); k8s makes it true and keeps it that way.

The vocabulary, in order of how often you'll hear it:

Pod

The smallest unit. One or more tightly-coupled containers that share network and storage. In practice, most pods have one container. "There are three pods running" means three replicas of your app are alive.

Deployment

The declaration of "I want N pods running this image." Handles rollouts and rollbacks. The manifest you'll touch most often.

Service

A stable network endpoint pointing at a set of pods. Pods come and go (they have ephemeral IPs); the Service has a stable name. "Frontend talks to backend.svc.cluster.local" — that's a Service.

Ingress

The thing that handles incoming traffic from outside the cluster — TLS, hostnames, path routing. "tasklane.com → ingress → service → pods."

Namespace

A logical division of the cluster. Often dev, staging, prod, or per-team. Resources in different namespaces can have the same name without colliding.

ConfigMap & Secret

Where configuration and secrets live. Mounted into pods as files or environment variables. Secrets are base64-encoded by default (not encrypted!) — production setups layer real secret stores on top.

Helm

A package manager for k8s. A "chart" is a parameterized bundle of YAML you can install with one command. helm install postgres bitnami/postgresql beats hand-writing the deployment.

kubectl

The CLI. Pronounced "cube-cuttle" or "cube-control" — the team will judge you for either. kubectl get pods, kubectl logs <pod>, kubectl apply -f deploy.yaml — these three get you most of the way.

Pipelines, deeper — Jenkins, Actions, GitLab CI

L4 introduced CI/CD as the conveyor belt. At this level, three things change: pipelines get richer (multi-stage, with artifact promotion), they vary by tool more visibly, and you'll meet the older world (Jenkins) that still runs much of enterprise.

ToolConfigurationLives whereSweet spot
GitHub ActionsYAML in .github/workflows/GitHub-hosted runners or self-hostedAnything on GitHub. Cleanest YAML of the modern tools. Massive marketplace.
GitLab CIYAML in .gitlab-ci.ymlGitLab runners (cloud or self-hosted)Same job for GitLab. Matched dev experience.
JenkinsGroovy in Jenkinsfile (declarative or scripted pipelines)Self-hosted Jenkins controller + agentsExisting enterprise environments. Plugin ecosystem unmatched. Heavy to operate.
CircleCIYAML in .circleci/config.ymlCircleCI cloud or self-hosted runnersStrong caching and parallelism. Common in mid-sized companies.
Argo WorkflowsYAML CRDs on KubernetesInside k8sK8s-native, complex DAG workflows. Often paired with ArgoCD.
TektonYAML CRDs on KubernetesInside k8sCloud-native CI building blocks; usually wrapped by other tools.
Buildkite, Drone, TeamCity, BamboovariesvariesNiche or legacy. Real teams run them; rarely the default for a new project.

Jenkins — what every operations person eventually meets

Jenkins is the elder of CI/CD. Open source since 2011 (forked from Hudson, originally 2005). It runs on a controller server that schedules jobs to agents (worker machines). You write a Jenkinsfile in Groovy describing the pipeline. Plugins extend it for almost anything you can imagine — and many things you can't.

GROOVY
// Jenkinsfile — declarative pipeline. Reads top-to-bottom. pipeline { agent any environment { NODE_ENV = 'production' } stages { stage('Checkout') { steps { checkout scm } } stage('Install') { steps { sh 'npm ci' } } stage('Lint') { steps { sh 'npm run lint' } } stage('Test') { steps { sh 'npm test' } } stage('Build') { steps { sh 'npm run build' } } stage('Deploy') { when { branch 'main' } steps { withCredentials([string(credentialsId: 'CF_TOKEN', variable: 'CF_TOKEN')]) { sh 'wrangler pages deploy ./out' } } } } post { failure { mail to: 'oncall@tasklane.example', subject: "Build failed: ${env.BUILD_TAG}" } } }

Multi-stage pipelines — promote artifacts, don't rebuild

A mature pipeline builds an artifact once, then promotes that exact artifact through environments. The same Docker image that passed staging is the one that hits production. Rebuilds for prod are how subtle bugs ship.

Build image v1.42 Test same image Staging deploy v1.42 Manual gate approve Production same v1.42 v1.42 flows through every stage — never rebuilt.
Promote one artifact through environments. The thing tested in staging is the thing live in prod.

Ephemeral preview environments

For every PR, the pipeline can spin up a temporary deployment at a unique URL — pr-847.preview.tasklane.example. Reviewers click the URL, smoke-test the change in a real environment, then the environment tears down on merge or close. Vercel, Netlify, and Cloudflare Pages do this automatically. On k8s, tools like Argo Rollouts or Kargo handle it.

Deploy strategies — controlling blast radius

"Push the button, code is live for 100% of users immediately" is one strategy. It is the riskiest. The mature alternatives:

Rolling deploy
Replace pods one at a time. Default for k8s deployments. New version comes up, old one comes down, repeat. Cheap, simple, the right default for most cases.
Blue/green
Two complete environments. Blue is live; green is the new version. Once green is verified, flip traffic from blue to green at the load balancer. Rollback = flip back. Requires roughly double the infrastructure during the swap.
Canary
Send a small percentage of traffic (1%, then 5%, 25%, 100%) to the new version while watching error and latency metrics. If anything spikes, roll back automatically. Safest pattern for big or risky changes.
Shadow / mirroring
Send real traffic to the new version but ignore its responses — useful for performance and correctness testing without user-visible risk.
Canary rollout — traffic shift over time t=0 100% old t=5m 99/1 t=15m 90/10 t=30m 70/30 t=1h 50/50 t=2h 10/90 t=4h 100% new old new
Canary: ramp traffic to the new version gradually, watching metrics. Pause or roll back at the first sign of trouble.

Feature flags — separating deploy from release

A feature flag is a runtime switch that turns a feature on or off without a deploy. The code for the new feature is shipped to production; whether anyone sees it is controlled by a flag.

Why this matters: deploys and releases are different things. A deploy is a technical event (new code is running). A release is a product event (users see the new behavior). Conflating them is how launches go badly. With flags, you can deploy code on Tuesday and release it on Friday, with full ability to roll back the release without touching the deploy.

Tools you'll meet:

  • LaunchDarkly — the category leader. SaaS, mature, expensive at scale.
  • Statsig — flags + experimentation + product analytics. Cheaper, growing fast.
  • Unleash — open source, self-hostable. Good fit if you don't want a SaaS dependency.
  • PostHog — analytics-first, with feature flags as part of the suite.
  • OpenFeature — vendor-neutral standard / API. Lets you switch providers without rewriting integration code.

GitOps — the repo is the source of truth

GitOps is a deployment pattern where the desired state of your infrastructure (and applications, on k8s) is declared in a git repo, and an automated controller continuously reconciles the actual state to match. You don't run kubectl apply; you commit a YAML file, and ArgoCD or Flux notices and applies it.

The shift in mindset: git becomes the only way to change production. Want to scale up? Commit a change. Want to roll back? Revert the commit. Want to know what's deployed right now? Look at the repo. The audit trail, the review process, the rollback mechanism — all collapse into one workflow you already know.

Two main controllers:

  • ArgoCD — the most popular. Has a great UI showing what's deployed vs. what the repo says. CNCF-graduated.
  • Flux — also mature, often paired with Helm. Slightly lighter touch, no built-in UI.

Secrets at scale

L3 introduced .env files for a single project. At organizational scale, secrets need: rotation, scoping (only this service can read this secret), audit (who accessed what when), and zero appearance in source control.

ToolHostingSweet spot
HashiCorp VaultSelf-hosted or HCPThe category leader. Powerful, complex, dynamic secrets (auto-generated, short-lived DB credentials).
AWS Secrets Manager / Parameter StoreAWSCleanly integrated with everything AWS. The default if you're already there.
GCP Secret Manager / Azure Key VaultGCP / AzureEquivalents on the other clouds.
DopplerSaaSDeveloper-friendly, multi-environment, integrates with most platforms.
InfisicalSaaS or self-hostedOpen-source secrets manager with E2E encryption. Newer but well-built.
SOPS + age / git-cryptGit-basedEncrypted secrets stored alongside code. Pairs well with GitOps.

Configuration management — Ansible, Chef, Puppet

Before containers and IaC took over, the dominant pattern was configuration management — tools that make a fleet of servers look the same. They still matter for hybrid environments, virtual-machine workloads, and anywhere you can't easily containerize.

ToolStyleNotes
AnsiblePush-based, agentless, YAML "playbooks"The dominant choice today. Easy to start with. SSH-driven; no agent on target hosts. Owned by Red Hat.
ChefPull-based, agent on each node, Ruby DSL "cookbooks"Older, still in heavy enterprise use. Steeper curve.
PuppetPull-based, agent on each node, its own DSLThe original of the bunch (2005). Big in regulated industries.
SaltStackBoth push and pullLess common; powerful event-driven model.
YAML
# Ansible playbook example — install Nginx on a fleet of Ubuntu hosts. - name: Provision web servers hosts: webservers become: true tasks: - name: Install nginx apt: name: nginx state: present update_cache: true - name: Ensure nginx is running service: name: nginx state: started enabled: true - name: Deploy site config template: src: templates/tasklane.conf.j2 dest: /etc/nginx/sites-enabled/tasklane.conf notify: reload nginx handlers: - name: reload nginx service: { name: nginx, state: reloaded }

The mental model: IaC creates the boxes, configuration management sets up what runs inside them, containers package the apps. Modern stacks blur the lines (Kubernetes does most of what config management used to do), but the older world is still alive.

SRE basics — error budgets, SLI / SLO / SLA revisited

L6 introduced p50 / p95 / p99 latencies and the SLI / SLO / SLA acronyms. SRE turns those into a discipline:

SLI (Service Level Indicator)
The thing you measure. "Percentage of homepage requests that return in under 500ms."
SLO (Service Level Objective)
The target. "99.9% of homepage requests under 500ms over a 28-day window."
SLA (Service Level Agreement)
The contract — usually with customers, often with money attached. "If we drop below 99.5%, we refund a percentage of the bill."
Error budget
The mathematical complement of the SLO. If your SLO is 99.9%, your error budget is 0.1% of the period. Once you've spent the budget, you stop shipping risky changes and focus on reliability work.
End of level

Wrap-up

Jargon recap

DevOps
Culture and practice — same people own writing, deploying, and running.
IaC
Infrastructure as Code. Terraform, Pulumi, CDK. Console clicks → text files.
Container / image / registry
Packaged app / static blueprint / where images live.
Kubernetes / k8s / kubectl
Container orchestrator / its nickname / its CLI.
Pod / Deployment / Service / Ingress
k8s primitives — running container(s) / declared state / stable endpoint / external traffic.
Jenkins / Jenkinsfile
Self-hosted CI/CD elder. Groovy pipeline definition.
Rolling / blue-green / canary
Three deploy strategies of increasing safety and complexity.
Feature flag
Runtime switch. Separates deploy from release.
GitOps
Git is the source of truth. Controllers reconcile reality to repo.
Vault / Secrets Manager
Secret stores at scale. Scoped, rotated, audited.
Ansible / Chef / Puppet
Configuration management — make a fleet of machines identical.
SLI / SLO / SLA / error budget
Reliability discipline. Indicator / objective / contract / budget.

You should now be able to

Mini-exercise

Pick a service you use (Slack, Linear, Stripe). Sketch how you imagine they deploy it: what's the unit of code, how does it get from commit to production, what deploy strategy fits their risk profile? Then look up any public engineering blog posts they have and compare. The gap between your guess and the reality is the lesson.