7 min read
On this page

What DevOps Is

DevOps is a culture, not a job title. It is a set of practices that brings software development and IT operations together so that building, testing, and releasing software happens faster and more reliably. If your org chart has a "DevOps team" that sits between developers and operations, you have rebranded your ops team, not adopted DevOps.

The Origin

The term was coined in 2009 by Patrick Debois, who organized the first DevOpsDays conference in Ghent, Belgium. The core frustration was simple: developers wrote code and threw it over the wall to operations, who then scrambled to deploy it. Deployments were scary. Releases happened quarterly. Outages led to blame games. DevOps emerged as a response to this dysfunction.

The foundational texts are The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford (a novel about a fictional IT department), and The DevOps Handbook by Gene Kim, Jez Humble, Patrick Debois, and John Willis. Both describe a way of working, not a technology stack.

The CALMS Framework

CALMS is the most widely used framework for evaluating DevOps maturity. It was created by Jez Humble and has five pillars:

Culture

DevOps starts with how people work together. Developers and operations engineers share responsibility for the software in production. When something breaks at 2 AM, the developer who wrote the code is as accountable as the ops engineer who got paged. Blameless post-mortems replace finger-pointing. Teams own their services end to end.

Before DevOps:
  Developer: "It works on my machine."
  Ops: "Well, your machine isn't production."
  Manager: "Whose fault is this?"

After DevOps:
  Team: "The deploy failed. Let's look at the logs together,
         fix it, and add a test so it doesn't happen again."

Automation

Manual processes do not scale and are error-prone. DevOps teams automate everything they can: builds, tests, deployments, infrastructure provisioning, monitoring setup. The goal is not to eliminate humans but to free them from repetitive work so they can focus on problems that require judgment.

What to automate (in rough priority order):
  1. Build and test (CI)        — Every commit triggers a build and test run
  2. Deployment (CD)            — Push a button (or merge a PR) to deploy
  3. Infrastructure provisioning — Terraform, CloudFormation, Pulumi
  4. Monitoring and alerting    — Auto-configured when a service is deployed
  5. Security scanning          — Dependency checks, container image scans
  6. Runbooks                   — Automated remediation for known failure modes

The most impactful automation is usually the simplest. A CI pipeline that runs tests on every pull request catches more bugs than a sophisticated canary deployment system. Start with the basics.

Lean

Borrowed from manufacturing, lean thinking means reducing waste. In software, waste includes: waiting (for approvals, for builds, for deploys), handoffs (tickets thrown between teams), partially done work (features sitting in branches for weeks), and context switching. Small batch sizes, short feedback loops, and limiting work in progress are lean principles applied to software delivery.

Measurement

You cannot improve what you do not measure. The four key metrics from the DORA research program are:

1. Deployment frequency       — How often you deploy to production
2. Lead time for changes      — Time from commit to production
3. Change failure rate         — Percentage of deploys causing failures
4. Mean time to recovery (MTTR) — How quickly you restore service after failure

Elite performers deploy on demand (multiple times per day), have lead times under an hour, change failure rates under 15%, and recover in under an hour.

These metrics are not aspirational ideals. The DORA State of DevOps report, published annually since 2014, surveys thousands of organizations and consistently finds that elite performers outperform low performers by orders of magnitude. The gap is not explained by team size, industry, or technology stack. It is explained by practices: trunk-based development, CI/CD, infrastructure as code, monitoring, and -- above all -- culture.

Sharing

Knowledge silos kill velocity. Sharing means shared repositories, shared dashboards, shared on-call rotations, shared documentation. When an ops engineer writes a runbook for a failure mode, the developer who caused the bug reads it. When a developer writes a new feature, the ops engineer understands the deployment requirements before release day.

The DevOps Lifecycle

DevOps is often described as an infinite loop with eight phases:

Plan -> Code -> Build -> Test -> Release -> Deploy -> Operate -> Monitor
  ^                                                                  |
  |------------------------------------------------------------------|
  • Plan -- Define work, prioritize backlog, set sprint goals
  • Code -- Write code, review pull requests, merge to main
  • Build -- Compile, package, create artifacts (CI pipeline)
  • Test -- Unit tests, integration tests, security scans, linting
  • Release -- Version the artifact, tag the release, approve for deployment
  • Deploy -- Push to staging, then production (CD pipeline)
  • Operate -- Run the software: scaling, patching, incident response
  • Monitor -- Collect metrics, logs, traces; alert on anomalies

The loop is continuous. Monitoring informs planning. Production incidents create backlog items. Feedback from operations flows back into development.

Breaking Silos

The core organizational change in DevOps is eliminating the wall between teams. In traditional organizations:

Traditional model:
  Dev team -> writes code -> throws artifact over wall -> Ops team deploys

DevOps model:
  Cross-functional team owns the service from commit to production

This does not mean developers must become sysadmins. It means:

  • Developers understand how their code runs in production
  • Operations engineers participate in design reviews
  • Both teams share on-call responsibilities
  • Deployment is the team's problem, not a separate team's problem
  • Incident response involves the people who wrote the code

Amazon's "you build it, you run it" philosophy, articulated by Werner Vogels in 2006, is the purest expression of this idea.

What Shared Ownership Looks Like in Practice

A team practicing DevOps does not have separate roles for "writing code" and "deploying code." The daily workflow looks like this:

Morning:
  Developer picks up a ticket from the sprint board.
  Writes code. Writes tests. Opens a PR.

Afternoon:
  PR is reviewed by a teammate (who also understands the deploy pipeline).
  PR merges to main. CI runs tests. CD deploys to staging.
  Developer verifies the change in staging.
  Developer promotes to production (or it auto-deploys after passing checks).

If something goes wrong:
  Monitoring detects the issue. Alert fires.
  The on-call engineer (a developer this week) investigates.
  They check metrics, logs, and traces — tools they helped configure.
  They roll back or push a fix. They write a blameless post-mortem.

The key difference from the old model: the person who wrote the code is equipped to understand its behavior in production, and the person responding to incidents has context on the code that is running.

Why "We Hired a DevOps Engineer" Misses the Point

Job postings for "DevOps Engineer" are everywhere. The title is not inherently wrong -- someone has to write the CI/CD pipelines, manage infrastructure, and configure monitoring. The problem is when organizations treat it as a substitute for cultural change.

What usually happens:
  1. Company has slow, painful deployments
  2. Management reads about DevOps
  3. Company hires a "DevOps engineer"
  4. DevOps engineer writes pipelines, manages Kubernetes
  5. Developers still throw code over the wall -- now to the DevOps engineer
  6. Nothing has changed except the job title of the person catching the code

What should happen:
  1. Company has slow, painful deployments
  2. Management reads about DevOps
  3. Teams are restructured to own services end to end
  4. Developers learn to write Dockerfiles, read logs, respond to alerts
  5. Ops engineers participate in code reviews, design sessions
  6. Someone (call them whatever you want) builds shared tooling
  7. Deploy frequency goes up, failure rate goes down

The DevOps engineer role is useful when it focuses on building platforms and tooling that enable other teams to ship faster. It fails when it becomes a rebranded ops silo.

The litmus test: if your DevOps engineer left tomorrow, would developers be unable to deploy? If yes, you do not have DevOps. You have a single point of failure with a trendy title.

Real-World Example

A mid-size e-commerce company had a typical setup: 40 developers, 5 ops engineers, monthly releases. Deployments took a full weekend. Every release broke something. The ops team was drowning.

They made three changes:

  1. Shared on-call -- Developers joined the on-call rotation for their own services. Suddenly, developers cared about log quality and error handling because they were the ones getting paged.
  2. Automated deployments -- They invested in a CI/CD pipeline. Deployments went from a weekend ritual to a button click. Within three months, they were deploying daily.
  3. Blameless post-mortems -- After every incident, they held a post-mortem focused on what happened and how to prevent it, not who was at fault. Root causes were usually systemic (missing tests, poor monitoring, unclear runbooks), not individual mistakes.

Six months later, deploy frequency went from monthly to daily. Change failure rate dropped from 30% to under 10%. MTTR dropped from hours to minutes. The ops team stopped being a bottleneck and started building internal tooling.

How to Start

If your organization is not practicing DevOps, the path forward is incremental, not revolutionary:

Week 1-2:   Set up a CI pipeline. Every PR runs automated tests.
Week 3-4:   Automate deployments to staging. One command or one button.
Month 2:    Add monitoring: know when your application is unhealthy before users tell you.
Month 3:    Extend automated deployment to production. Start with low-risk services.
Month 4:    Begin shared on-call. Developers join the rotation for their own services.
Ongoing:    Measure DORA metrics. Hold blameless post-mortems. Iterate.

You do not need to reorganize the company overnight. You need to start with one team, one pipeline, and one set of metrics. Success is contagious -- when other teams see faster, safer deployments, they adopt the practices voluntarily.

Common Pitfalls

  • Renaming ops to DevOps -- Changing the team name without changing how teams collaborate is organizational theater
  • Tool-first thinking -- Buying Kubernetes before understanding why deployments are slow; the tool is not the solution, the practice is
  • Ignoring measurement -- Adopting DevOps "because it's modern" without tracking DORA metrics means you cannot tell if it is working
  • Half-hearted culture change -- Developers say "that's an ops problem" when paged; ops engineers refuse to attend sprint planning; management does not enforce shared ownership
  • Automation without understanding -- Automating a broken process produces broken results faster; fix the process first, then automate

Key Takeaways

  • DevOps is a culture of shared ownership between development and operations, not a team name or a tool
  • The CALMS framework (Culture, Automation, Lean, Measurement, Sharing) is the foundation
  • The DORA metrics (deployment frequency, lead time, change failure rate, MTTR) are how you measure progress
  • Breaking silos means developers understand production and ops engineers participate in development
  • Hiring a "DevOps engineer" without cultural change just moves the silo; it does not remove it
  • Start with culture and measurement; tools follow naturally