Zero Trust Architecture
Zero trust is a security model that eliminates implicit trust based on network location. Traditional security treats everything inside the corporate network as trusted and everything outside as untrusted. Zero trust assumes no network location, user, or device is inherently trustworthy and verifies every request.
Never Trust, Always Verify
The core principle of zero trust is that every access request must be authenticated, authorized, and encrypted regardless of where it originates.
Traditional Perimeter Security vs Zero Trust
Traditional (castle and moat):
Outside firewall: untrusted, blocked
Inside firewall: trusted, full access
Problem: once an attacker gets inside (phishing, VPN compromise,
insider threat), they have unrestricted lateral movement.
Zero trust:
Every request: authenticate, authorize, encrypt
Network location: irrelevant to trust decisions
Internal service A calling internal service B: must prove identity
If an attacker compromises one service, they cannot automatically
access other services. Every hop requires authentication.
Zero Trust Principles
Core principles:
1. Verify explicitly
Every request is authenticated and authorized using all available signals:
identity, device health, location, behavior pattern, data classification.
2. Use least privilege access
Grant minimum permissions for the minimum time needed.
Just-in-time access: admin rights granted for 1 hour, then revoked.
Just-enough access: read-only when write is not needed.
3. Assume breach
Design as if attackers are already inside the network.
Minimize blast radius through segmentation.
Monitor and log everything for detection.
Google's BeyondCorp is the most well-known zero trust implementation. After experiencing the 2009 Operation Aurora attack, Google moved away from VPN-based access. Every Google employee accesses internal applications through an identity-aware proxy that verifies their identity and device health on every request, regardless of whether they are in the office or at home.
Trust Signals
Signals evaluated per request:
Identity:
Who is making the request?
Is the identity verified (MFA, certificate)?
Device:
Is the device managed and compliant?
Is the OS patched and up to date?
Is disk encryption enabled?
Is endpoint detection running?
Context:
What time is the request? (3 AM login from new location?)
What is the risk level of the requested resource?
Does this match the user's normal behavior pattern?
Network:
Is the connection encrypted?
Is the source IP from a known-risk geography?
(Network location alone is NOT sufficient for trust)
Micro-Segmentation
Micro-segmentation divides the network into small, isolated zones. Each zone has its own access controls, limiting how far an attacker can move after compromising a single component.
Network Segmentation Levels
Traditional segmentation:
DMZ | Application Tier | Database Tier
Three zones, broad access within each zone.
Compromise one app server --> access all app servers.
Micro-segmentation:
Each service has its own security boundary.
Order Service can talk to Payment Service (port 443 only).
Order Service CANNOT talk to User Database directly.
Payment Service can talk to Payment Database (port 5432 only).
Payment Service CANNOT talk to Order Database.
Even within the same Kubernetes namespace, pods are isolated
by network policies.
Implementation Approaches
Network policy (Kubernetes):
Define which pods can communicate with which other pods.
Default deny: no traffic unless explicitly allowed.
Example policy:
Allow: order-service --> payment-service (port 443)
Allow: payment-service --> payment-db (port 5432)
Deny: all other traffic
A compromised order-service pod cannot reach the payment database
because no network path exists.
Software-defined networking:
VMware NSX, Cisco ACI, cloud VPCs
Policy defined at the workload level, not the network level
Follows the workload as it moves between hosts
Host-based firewalls:
iptables/nftables rules on each host
Fine-grained but harder to manage at scale
Micro-Segmentation Challenges
Challenges:
- Mapping all legitimate communication paths (service dependency mapping)
- Maintaining policies as services evolve (new services, new dependencies)
- Debugging connectivity issues (blocked traffic looks like service failures)
- Performance overhead of policy evaluation on every packet
Strategies:
- Start with monitoring mode (log but don't block) to discover traffic patterns
- Use service mesh for automatic policy generation from observed traffic
- Implement gradual rollout: enforce for one service at a time
Netflix uses micro-segmentation across their AWS infrastructure. Each microservice runs in its own security group with explicit ingress and egress rules, limiting the blast radius of any single compromise.
Service Mesh
A service mesh provides infrastructure-level security, observability, and traffic management for service-to-service communication. It implements zero trust networking without requiring application code changes.
Service Mesh Architecture
Service mesh components:
Data plane (sidecar proxy per service):
Envoy proxy runs alongside each service instance
Intercepts all inbound and outbound network traffic
Handles mTLS, authorization, retries, observability
Control plane (centralized management):
Istiod (Istio) or Linkerd control plane
Distributes certificates, policies, and configuration
Manages proxy configuration across all sidecars
Traffic flow:
Service A --> [Envoy sidecar A] --mTLS--> [Envoy sidecar B] --> Service B
Application code makes a plain HTTP call to Service B.
The sidecar handles encryption, authentication, and authorization transparently.
Service Mesh Security Features
Security capabilities:
Automatic mTLS:
Every service-to-service call is encrypted
Both sides present certificates (mutual authentication)
Certificates are automatically rotated
No application code changes required
Authorization policies:
Define which services can call which endpoints
Based on service identity (not IP address)
Can include request-level conditions (HTTP method, path, headers)
Certificate management:
Automatic issuance and rotation (24-hour lifetime typical)
Short-lived certificates limit exposure if compromised
Identity tied to workload, not to network location
Istio authorization policy example:
Rule: only order-service can call payment-service
Service: payment-service
Allow:
Source: order-service (verified by mTLS identity)
Methods: POST
Paths: /api/payments/*
Deny:
All other sources
Airbnb adopted Istio service mesh to enforce zero trust between their microservices. Every service call is mutually authenticated, and authorization policies define exactly which services can communicate.
Mutual TLS (mTLS)
Standard TLS authenticates only the server: the client verifies the server's certificate. Mutual TLS requires both sides to present and verify certificates, ensuring both client and server identities are verified.
TLS vs mTLS
Standard TLS:
Client verifies server certificate: "Am I talking to the real server?"
Server does NOT verify client: "I'll accept any client"
Used for: browser-to-server HTTPS
Mutual TLS:
Client verifies server certificate: "Am I talking to the real server?"
Server verifies client certificate: "Is this client who it claims to be?"
Used for: service-to-service communication in zero trust
mTLS Handshake
mTLS handshake (additional steps beyond regular TLS):
1. Client --> Server: ClientHello (same as TLS)
2. Server --> Client: ServerHello + server certificate (same as TLS)
3. Server --> Client: CertificateRequest (NEW: server asks for client cert)
4. Client --> Server: client certificate (NEW: client proves identity)
5. Both verify each other's certificates
6. Encrypted communication established
Server now knows: this request comes from "order-service" (not just "some client")
mTLS in Practice
mTLS implementation approaches:
Service mesh (recommended):
Sidecar proxies handle mTLS automatically
Applications see plain HTTP traffic
Certificate rotation is transparent
Zero application code changes
Application-level:
Application configures TLS with client certificate
Must handle certificate loading, rotation, and error handling
More control but more development work
Infrastructure-level:
Load balancers or API gateways terminate mTLS
Backend services receive verified identity as a header
Simpler for backends but trust shifts to the infrastructure
Certificate Lifecycle
Short-lived certificate pattern:
1. Service starts, requests certificate from CA (e.g., Istio citadel)
2. CA issues certificate valid for 24 hours
3. Service uses certificate for mTLS
4. Before expiration, service requests new certificate
5. Old certificate expires, limiting exposure window
Traditional: certificates valid for 1-2 years
If compromised, attacker has months of access
Zero trust: certificates valid for hours
If compromised, attacker has hours at most
Automatic rotation eliminates manual renewal failures
Stripe uses mTLS between all internal services. Every API call between their payment processing components requires mutual certificate verification, ensuring that even if an attacker gains network access, they cannot impersonate a service.
Zero Trust Implementation Strategy
Phased rollout:
Phase 1: Visibility (months 1-3)
Deploy service mesh in permissive mode
Map all service-to-service communication
Identify all dependencies and traffic patterns
No enforcement yet
Phase 2: Authentication (months 3-6)
Enable mTLS in strict mode
All service communication is encrypted and authenticated
Services without certificates cannot communicate
Phase 3: Authorization (months 6-12)
Define and enforce authorization policies
Start with coarse policies (service-level allow/deny)
Gradually add fine-grained rules (method, path, headers)
Phase 4: Continuous improvement
Add device health checks and user context
Implement just-in-time access for privileged operations
Automate policy updates based on observed traffic patterns
Common Pitfalls
- Treating zero trust as a product. Zero trust is an architecture and a mindset, not a single tool you can buy. It requires changes across identity, network, application, and data layers.
- Skipping the visibility phase. Enforcing strict policies without understanding existing traffic patterns causes outages. Always start with observe-only mode.
- mTLS without authorization. Encrypting and authenticating traffic without authorization policies means any authenticated service can call any other service. Authentication without authorization is incomplete.
- Ignoring legacy systems. Older systems that cannot participate in mTLS or identity-aware access create holes in zero trust coverage. Plan migration paths or use compensating controls (network segmentation, API gateways).
- Certificate management failures. Expired certificates cause outages. Automated rotation with monitoring for expiration is essential. Manual certificate management does not scale.
- Policy sprawl. Thousands of fine-grained rules become impossible to audit and maintain. Start coarse and refine only where needed.
Key Takeaways
- Zero trust eliminates implicit trust based on network location. Every request must be authenticated, authorized, and encrypted regardless of origin.
- Micro-segmentation limits blast radius by isolating services into small security zones with explicit communication policies.
- Service meshes implement zero trust at the infrastructure level, providing automatic mTLS, authorization, and certificate management without application code changes.
- mTLS ensures both sides of a service-to-service connection verify each other's identity, replacing the traditional model of trusting anything inside the network.
- Implement zero trust incrementally: visibility first, authentication second, authorization third. Rushing to enforcement without understanding traffic patterns causes outages.