4 min read
On this page

Profiling with pprof

Go has profiling built into the standard library. No third-party tools, no agents, no complex setup. The runtime/pprof and net/http/pprof packages produce profiles that go tool pprof analyzes. The workflow is simple: benchmark, profile, find the hot path, optimize, benchmark again.

Two Ways to Profile

net/http/pprof: For Running Servers

Add a blank import and the profiling endpoints appear:

import _ "net/http/pprof"

func main() {
    // If you already have an HTTP server, pprof registers on DefaultServeMux
    // For a dedicated debug server:
    go func() {
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()

    // Your application...
}

Collect a 30-second CPU profile:

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

Available endpoints:

Endpoint                     Description
------------------------------------------------------
/debug/pprof/profile         CPU profile
/debug/pprof/heap            Memory allocations
/debug/pprof/goroutine       All goroutines
/debug/pprof/block           Blocking operations
/debug/pprof/mutex           Mutex contention
/debug/pprof/allocs          Past memory allocations
/debug/pprof/trace           Execution trace

Always bind the debug server to localhost. Never expose pprof endpoints to the internet.

runtime/pprof: For Benchmarks & CLI Tools

For programs that are not long-running servers:

import "runtime/pprof"

func main() {
    cpuFile, _ := os.Create("cpu.prof")
    defer cpuFile.Close()
    pprof.StartCPUProfile(cpuFile)
    defer pprof.StopCPUProfile()

    // Your application logic...

    memFile, _ := os.Create("mem.prof")
    defer memFile.Close()
    pprof.WriteHeapProfile(memFile)
}

Analyze the profile:

go tool pprof cpu.prof

Profiling from Benchmarks

The easiest way to profile is through go test -bench:

go test -bench=BenchmarkProcess -cpuprofile=cpu.prof -memprofile=mem.prof ./...
go tool pprof cpu.prof

This is the recommended approach: write a benchmark, profile it, optimize, benchmark again.

go tool pprof: Interactive Analysis

When you open a profile, pprof drops you into an interactive shell:

go tool pprof cpu.prof
(pprof) top 10

top: Find Hot Functions

(pprof) top 10
Showing nodes accounting for 4.5s, 90% of 5s total
      flat  flat%   sum%        cum   cum%
     2.1s 42.00% 42.00%      2.1s 42.00%  runtime.memmove
     0.8s 16.00% 58.00%      0.8s 16.00%  encoding/json.(*decodeState).scanWhile
     0.5s 10.00% 68.00%      1.3s 26.00%  myapp/internal/parser.Parse
     0.4s  8.00% 76.00%      0.4s  8.00%  runtime.mallocgc
  • flat: Time spent in this function only (not its callees)
  • cum: Cumulative time including all functions it calls

list: See Source Code

(pprof) list Parse
     0.5s      1.3s (flat, cum) 26.00% of Total
         .          .     42:func Parse(data []byte) (*Result, error) {
     0.1s      0.1s     43:    lines := bytes.Split(data, []byte("\n"))
         .          .     44:    results := make([]Item, 0)
     0.4s      1.2s     45:    for _, line := range lines {
         .      0.8s     46:        item, err := parseLine(line)

web: Visual Call Graph

(pprof) web

This opens a visual call graph in your browser (requires Graphviz).

Flame Graphs

Flame graphs are the most intuitive way to read profiles. Use pprof's built-in web UI:

go tool pprof -http=:8081 cpu.prof

This opens a browser with interactive flame graphs, call graphs, and source views. The flame graph shows:

  • Width = time spent (wider = slower)
  • Depth = call stack depth
  • Color = package (same package, same color)

Look for wide bars. Those are your hot paths.

CPU Profiling

CPU profiling answers: "Where is my program spending CPU time?"

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

The profiler samples the call stack 100 times per second (default). Each sample records which function was executing. More samples in a function means more CPU time.

What to Look For

  • Functions with high flat time: they are doing the work
  • Functions with high cum but low flat: they call expensive functions
  • runtime.mallocgc high in the profile: too many allocations
  • runtime.memmove high: copying too much data

Memory Profiling

Memory profiling answers: "Where is my program allocating memory?"

go tool pprof http://localhost:6060/debug/pprof/heap

alloc vs inuse

(pprof) top -alloc_space    # total bytes allocated (even if freed)
(pprof) top -inuse_space    # bytes currently in use
  • alloc_space: Shows where allocations happen. High allocations cause GC pressure even if memory is freed quickly.
  • inuse_space: Shows what is holding memory right now. Use this to find memory leaks.
(pprof) top -alloc_space
     flat  flat%
   120MB 45.00%  myapp/internal/parser.Parse
    80MB 30.00%  encoding/json.(*Decoder).Decode
    40MB 15.00%  bytes.Split

This tells you Parse is allocating 120MB over the profile period. Even if all of it is freed, it creates GC pressure.

Goroutine Profiling

go tool pprof http://localhost:6060/debug/pprof/goroutine

Shows where goroutines are blocked. Useful for finding:

  • Goroutine leaks (goroutine count growing over time)
  • Deadlocks (goroutines waiting for each other)
  • Resource contention (many goroutines waiting on the same lock)

The Profiling Workflow

The workflow is a loop:

1. Write a benchmark
2. Run the benchmark to get a baseline
3. Profile the benchmark
4. Find the hot path
5. Optimize
6. Benchmark again to verify improvement
7. If not fast enough, go to step 3
func BenchmarkProcess(b *testing.B) {
    data := loadTestData()
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        Process(data)
    }
}
# Baseline
go test -bench=BenchmarkProcess -count=5 ./...

# Profile
go test -bench=BenchmarkProcess -cpuprofile=cpu.prof ./...
go tool pprof -http=:8081 cpu.prof

# After optimization, compare
go test -bench=BenchmarkProcess -count=5 ./... | tee new.txt
benchstat old.txt new.txt

benchstat shows statistically significant performance differences.

Common Pitfalls

  • Profiling in development only. Production workloads differ from benchmarks. Enable net/http/pprof in production (on a separate, internal-only port).
  • Exposing pprof to the internet. Profiling endpoints leak internal details. Always bind to localhost or put behind authentication.
  • Optimizing without a benchmark. Without before/after numbers, you do not know if your change helped. Always benchmark first.
  • Looking only at flat time. A function with low flat but high cum calls an expensive function. Optimizing the callee helps more.
  • Ignoring allocation profiles. CPU profiles miss GC overhead. If runtime.mallocgc is high in your CPU profile, switch to memory profiling to find the allocation source.
  • Optimizing the wrong thing. Profile first. Developers are notoriously bad at guessing where the bottleneck is.

Key Takeaways

  • Go has built-in profiling: net/http/pprof for servers, runtime/pprof for CLI tools.
  • The easiest profiling path: write a benchmark, run it with -cpuprofile or -memprofile, analyze with go tool pprof.
  • Use top to find hot functions, list to see source code, web or -http for flame graphs.
  • Memory profiling distinguishes alloc_space (total allocations, GC pressure) from inuse_space (current memory, leaks).
  • The workflow is a loop: benchmark, profile, optimize, benchmark again.
  • Always profile before optimizing. Intuition about bottlenecks is usually wrong.