Profiling with pprof

Go has profiling built into the standard library. No third-party tools, no agents, no complex setup. The runtime/pprof and net/http/pprof packages produce profiles that go tool pprof analyzes. The workflow is simple: benchmark, profile, find the hot path, optimize, benchmark again.

Two Ways to Profile

net/http/pprof: For Running Servers

Add a blank import and the profiling endpoints appear:

import _ "net/http/pprof"

func main() {
    // If you already have an HTTP server, pprof registers on DefaultServeMux
    // For a dedicated debug server:
    go func() {
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()

    // Your application...
}

Collect a 30-second CPU profile:

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

Available endpoints:

Endpoint                     Description
------------------------------------------------------
/debug/pprof/profile         CPU profile
/debug/pprof/heap            Memory allocations
/debug/pprof/goroutine       All goroutines
/debug/pprof/block           Blocking operations
/debug/pprof/mutex           Mutex contention
/debug/pprof/allocs          Past memory allocations
/debug/pprof/trace           Execution trace

Always bind the debug server to localhost. Never expose pprof endpoints to the internet.

runtime/pprof: For Benchmarks & CLI Tools

For programs that are not long-running servers:

import "runtime/pprof"

func main() {
    cpuFile, _ := os.Create("cpu.prof")
    defer cpuFile.Close()
    pprof.StartCPUProfile(cpuFile)
    defer pprof.StopCPUProfile()

    // Your application logic...

    memFile, _ := os.Create("mem.prof")
    defer memFile.Close()
    pprof.WriteHeapProfile(memFile)
}

Analyze the profile:

go tool pprof cpu.prof

Profiling from Benchmarks

The easiest way to profile is through go test -bench:

go test -bench=BenchmarkProcess -cpuprofile=cpu.prof -memprofile=mem.prof ./...
go tool pprof cpu.prof

This is the recommended approach: write a benchmark, profile it, optimize, benchmark again.

go tool pprof: Interactive Analysis

When you open a profile, pprof drops you into an interactive shell:

go tool pprof cpu.prof
(pprof) top 10

top: Find Hot Functions

(pprof) top 10
Showing nodes accounting for 4.5s, 90% of 5s total
      flat  flat%   sum%        cum   cum%
     2.1s 42.00% 42.00%      2.1s 42.00%  runtime.memmove
     0.8s 16.00% 58.00%      0.8s 16.00%  encoding/json.(*decodeState).scanWhile
     0.5s 10.00% 68.00%      1.3s 26.00%  myapp/internal/parser.Parse
     0.4s  8.00% 76.00%      0.4s  8.00%  runtime.mallocgc

flat: Time spent in this function only (not its callees)
cum: Cumulative time including all functions it calls

list: See Source Code

(pprof) list Parse
     0.5s      1.3s (flat, cum) 26.00% of Total
         .          .     42:func Parse(data []byte) (*Result, error) {
     0.1s      0.1s     43:    lines := bytes.Split(data, []byte("\n"))
         .          .     44:    results := make([]Item, 0)
     0.4s      1.2s     45:    for _, line := range lines {
         .      0.8s     46:        item, err := parseLine(line)

web: Visual Call Graph

(pprof) web

This opens a visual call graph in your browser (requires Graphviz).

Flame Graphs

Flame graphs are the most intuitive way to read profiles. Use pprof's built-in web UI:

go tool pprof -http=:8081 cpu.prof

This opens a browser with interactive flame graphs, call graphs, and source views. The flame graph shows:

Width = time spent (wider = slower)
Depth = call stack depth
Color = package (same package, same color)

Look for wide bars. Those are your hot paths.

CPU Profiling

CPU profiling answers: "Where is my program spending CPU time?"

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

The profiler samples the call stack 100 times per second (default). Each sample records which function was executing. More samples in a function means more CPU time.

What to Look For

Functions with high flat time: they are doing the work
Functions with high cum but low flat: they call expensive functions
runtime.mallocgc high in the profile: too many allocations
runtime.memmove high: copying too much data

Memory Profiling

Memory profiling answers: "Where is my program allocating memory?"

go tool pprof http://localhost:6060/debug/pprof/heap

alloc vs inuse

(pprof) top -alloc_space    # total bytes allocated (even if freed)
(pprof) top -inuse_space    # bytes currently in use

alloc_space: Shows where allocations happen. High allocations cause GC pressure even if memory is freed quickly.
inuse_space: Shows what is holding memory right now. Use this to find memory leaks.

(pprof) top -alloc_space
     flat  flat%
   120MB 45.00%  myapp/internal/parser.Parse
    80MB 30.00%  encoding/json.(*Decoder).Decode
    40MB 15.00%  bytes.Split

This tells you Parse is allocating 120MB over the profile period. Even if all of it is freed, it creates GC pressure.

Goroutine Profiling

go tool pprof http://localhost:6060/debug/pprof/goroutine

Shows where goroutines are blocked. Useful for finding:

Goroutine leaks (goroutine count growing over time)
Deadlocks (goroutines waiting for each other)
Resource contention (many goroutines waiting on the same lock)

The Profiling Workflow

The workflow is a loop:

1. Write a benchmark
2. Run the benchmark to get a baseline
3. Profile the benchmark
4. Find the hot path
5. Optimize
6. Benchmark again to verify improvement
7. If not fast enough, go to step 3

func BenchmarkProcess(b *testing.B) {
    data := loadTestData()
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        Process(data)
    }
}

# Baseline
go test -bench=BenchmarkProcess -count=5 ./...

# Profile
go test -bench=BenchmarkProcess -cpuprofile=cpu.prof ./...
go tool pprof -http=:8081 cpu.prof

# After optimization, compare
go test -bench=BenchmarkProcess -count=5 ./... | tee new.txt
benchstat old.txt new.txt

benchstat shows statistically significant performance differences.

Common Pitfalls

Profiling in development only. Production workloads differ from benchmarks. Enable net/http/pprof in production (on a separate, internal-only port).
Exposing pprof to the internet. Profiling endpoints leak internal details. Always bind to localhost or put behind authentication.
Optimizing without a benchmark. Without before/after numbers, you do not know if your change helped. Always benchmark first.
Looking only at flat time. A function with low flat but high cum calls an expensive function. Optimizing the callee helps more.
Ignoring allocation profiles. CPU profiles miss GC overhead. If runtime.mallocgc is high in your CPU profile, switch to memory profiling to find the allocation source.
Optimizing the wrong thing. Profile first. Developers are notoriously bad at guessing where the bottleneck is.

Key Takeaways

Go has built-in profiling: net/http/pprof for servers, runtime/pprof for CLI tools.
The easiest profiling path: write a benchmark, run it with -cpuprofile or -memprofile, analyze with go tool pprof.
Use top to find hot functions, list to see source code, web or -http for flame graphs.
Memory profiling distinguishes alloc_space (total allocations, GC pressure) from inuse_space (current memory, leaks).
The workflow is a loop: benchmark, profile, optimize, benchmark again.
Always profile before optimizing. Intuition about bottlenecks is usually wrong.