Managing State & Secrets

Terraform state is the mapping between your configuration and real-world infrastructure. Without it, Terraform does not know what it has created, what needs to change, or what to destroy. Managing state correctly -- remote storage, locking, drift detection, and import -- is the difference between a reliable infrastructure workflow and a dangerous one. Managing secrets is equally critical: one leaked database password in a state file can compromise your entire system.

What State Is

When you run terraform apply, Terraform creates a state file (terraform.tfstate) that records every resource it manages, along with its current attributes.

Configuration:                    State file:
resource "aws_instance" "web" {   "aws_instance.web": {
  ami           = "ami-abc123"      "id": "i-0def456ghi789",
  instance_type = "t3.micro"        "ami": "ami-abc123",
}                                   "instance_type": "t3.micro",
                                    "public_ip": "54.123.45.67",
                                    "private_ip": "10.0.1.15",
                                    ...
                                  }

State serves three purposes:

Mapping -- Connects resource blocks in your config to real resources in the cloud
Metadata -- Tracks dependencies between resources for correct ordering
Performance -- Caches resource attributes to avoid querying every API on every plan

Remote State Backends

Local state (terraform.tfstate on disk) is fine for learning. For teams, it is unacceptable:

Only one person can run Terraform at a time (no locking)
The state file is on one person's laptop (not shared)
Losing the state file means Terraform loses track of all infrastructure

S3 + DynamoDB (AWS)

The most common remote backend for AWS users:

terraform {
  backend "s3" {
    bucket         = "myorg-terraform-state"
    key            = "production/networking/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-locks"
  }
}

Set up the backend infrastructure (do this once, manually or with a bootstrap Terraform config):

# bootstrap/main.tf -- run this first to create the backend resources

resource "aws_s3_bucket" "terraform_state" {
  bucket = "myorg-terraform-state"
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
    }
  }
}

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-state-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

S3 provides durable, versioned storage. DynamoDB provides locking. Encryption protects sensitive data in state.

Terraform Cloud

Terraform Cloud (now part of HCP Terraform) provides remote state, locking, and a web UI out of the box:

terraform {
  cloud {
    organization = "myorg"
    workspaces {
      name = "production-networking"
    }
  }
}

Terraform Cloud also offers remote execution (runs happen on their servers, not your laptop), policy enforcement (Sentinel or OPA), and team management. The free tier supports up to 500 managed resources.

Other Backends

Google Cloud Storage:  Similar to S3, native for GCP users
Azure Blob Storage:    Similar to S3, native for Azure users
Consul:                HashiCorp's service mesh; supports locking natively
PostgreSQL:            Store state in a database; useful in air-gapped environments

State Locking

When two people run terraform apply at the same time, they can corrupt the state or create conflicting resources. State locking prevents this.

Developer A runs terraform apply:
  1. Acquires lock on state
  2. Reads current state
  3. Plans changes
  4. Applies changes
  5. Writes new state
  6. Releases lock

Developer B tries terraform apply during step 3:
  Error: Error acquiring the state lock
  Lock Info:
    ID:        a1b2c3d4-e5f6-7890
    Operation: OperationTypeApply
    Who:       developer-a@laptop
    Created:   2024-01-15 14:30:00 UTC

With S3, locking uses DynamoDB. With Terraform Cloud, locking is built in. If a lock gets stuck (the process that acquired it crashed), you can force-unlock:

terraform force-unlock LOCK_ID

Use this sparingly. Force-unlocking while another apply is in progress corrupts state.

State Drift

State drift occurs when the real-world infrastructure no longer matches what Terraform's state says. This happens when someone changes a resource manually (via the console, CLI, or another tool).

State says:                          AWS says:
aws_instance.web:                    i-0def456ghi789:
  instance_type: "t3.micro"           instance_type: "t3.large"  <- Someone changed this manually
  tags: { Name: "web-server" }        tags: { Name: "web-prod" } <- And this

On the next terraform plan, Terraform detects the drift and proposes changes to bring reality back in line with the configuration:

Terraform will perform the following actions:

  # aws_instance.web will be updated in-place
  ~ resource "aws_instance" "web" {
      ~ instance_type = "t3.large" -> "t3.micro"
      ~ tags          = {
          ~ Name = "web-prod" -> "web-server"
        }
    }

This is correct behavior: Terraform enforces the declared state. But it surprises teams that made intentional manual changes. The fix is cultural: all changes go through Terraform, never through the console.

# Detect drift without making changes
terraform plan -refresh-only

Importing Existing Resources

When you adopt Terraform for existing infrastructure, you need to import resources into state. Terraform does not know about resources it did not create.

# Import an existing EC2 instance
terraform import aws_instance.web i-0def456ghi789

# Import an existing S3 bucket
terraform import aws_s3_bucket.assets my-app-assets

# Import an existing RDS instance
terraform import aws_db_instance.main mydb-production

The import command adds the resource to state but does not generate configuration. You must write the matching resource block in your .tf files first. After importing, run terraform plan and adjust your config until the plan shows no changes.

Terraform 1.5+ introduced import blocks that can be included in configuration:

import {
  to = aws_instance.web
  id = "i-0def456ghi789"
}

resource "aws_instance" "web" {
  ami           = "ami-abc123"
  instance_type = "t3.micro"
  # ... fill in all attributes to match the existing resource
}

terraform plan -generate-config-out=generated.tf

This generates a configuration file for imported resources, saving you from writing it manually.

Secrets in Terraform

Terraform state stores the full attributes of every managed resource. For databases, this includes the master password. For API keys, this includes the key value. State files are a high-value target.

The Problem

resource "aws_db_instance" "main" {
  engine         = "postgres"
  instance_class = "db.r6g.large"
  username       = "admin"
  password       = "super_secret_password"    # This ends up in state
}

Even if you use a variable instead of a hardcoded string, the password is stored in plaintext in the state file after apply.

Mitigation Strategies

1. Encrypt state at rest. Always enable encryption on your state backend (S3 SSE, Terraform Cloud encryption).

2. Restrict access to state. Use IAM policies to limit who can read the S3 bucket or Terraform Cloud workspace.

3. Use environment variables for sensitive inputs:

export TF_VAR_db_password="super_secret_password"
terraform apply

The password is still in state after apply, but it is not in any configuration file or version control.

4. Generate secrets outside Terraform and reference them:

# Store the password in AWS Secrets Manager (created outside Terraform)
data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "production/db/password"
}

resource "aws_db_instance" "main" {
  engine         = "postgres"
  instance_class = "db.r6g.large"
  username       = "admin"
  password       = data.aws_secretsmanager_secret_version.db_password.secret_string
}

The password still ends up in state, but it is managed in Secrets Manager and never appears in .tf files or .tfvars files.

5. Use Vault for dynamic secrets:

provider "vault" {
  address = "https://vault.example.com"
}

data "vault_generic_secret" "db_creds" {
  path = "database/creds/app-role"
}

resource "aws_db_instance" "main" {
  engine         = "postgres"
  instance_class = "db.r6g.large"
  username       = data.vault_generic_secret.db_creds.data["username"]
  password       = data.vault_generic_secret.db_creds.data["password"]
}

Vault can generate short-lived credentials that are rotated automatically. This limits the blast radius if state is compromised.

6. Mark outputs as sensitive:

output "db_password" {
  value     = aws_db_instance.main.password
  sensitive = true
}

This prevents the value from being displayed in terraform output or plan output, but it is still in state.

State Operations

Occasionally you need to manipulate state directly:

# List all resources in state
terraform state list

# Show details of a specific resource
terraform state show aws_instance.web

# Move a resource (rename without destroy/recreate)
terraform state mv aws_instance.web aws_instance.api_server

# Remove a resource from state (Terraform forgets about it; the real resource is untouched)
terraform state rm aws_instance.web

# Pull remote state to a local file (for inspection)
terraform state pull > state.json

State operations are powerful and dangerous. Always back up state before manipulating it. With S3 versioning enabled, you can recover previous state versions if something goes wrong.

Common Pitfalls

Local state in production -- Local state is not shared, not locked, and easily lost. Use a remote backend from day one.
State file in Git -- State files contain sensitive data and change on every apply. They do not belong in version control. Add *.tfstate and *.tfstate.backup to .gitignore.
Hardcoded secrets in .tf files -- Anyone with repo access can read them. Use environment variables, Secrets Manager, or Vault.
Ignoring state drift -- Manual changes cause Terraform to propose unexpected changes on the next plan. Run terraform plan -refresh-only periodically to detect drift.
Not enabling S3 versioning -- Without versioning, a corrupted state write is unrecoverable. Always enable versioning on the state bucket.
Force-unlocking without investigation -- A stuck lock usually means a process is still running or crashed mid-apply. Investigate before force-unlocking.
Running terraform state rm without understanding the consequences -- Removing a resource from state does not delete it. It means Terraform no longer manages it. The resource continues to exist and to cost money, unmanaged.

Key Takeaways

State is the bridge between your Terraform configuration and real-world infrastructure; losing it means losing track of everything Terraform manages
Always use a remote backend with locking (S3 + DynamoDB, Terraform Cloud, or equivalent)
State drift happens when resources are changed outside Terraform; the fix is cultural (all changes through Terraform) and operational (regular drift detection)
Import existing resources into state with terraform import before managing them with Terraform
Secrets in state are unavoidable for some resources; encrypt state at rest, restrict access, and manage secrets through Secrets Manager or Vault
Back up state before any state operations; enable versioning on your state bucket