Profiling with pprof
Go has profiling built into the standard library. No third-party tools, no agents, no complex setup. The runtime/pprof and net/http/pprof packages produce profiles that go tool pprof analyzes. The workflow is simple: benchmark, profile, find the hot path, optimize, benchmark again.
Two Ways to Profile
net/http/pprof: For Running Servers
Add a blank import and the profiling endpoints appear:
import _ "net/http/pprof"
func main() {
// If you already have an HTTP server, pprof registers on DefaultServeMux
// For a dedicated debug server:
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
// Your application...
}
Collect a 30-second CPU profile:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
Available endpoints:
Endpoint Description
------------------------------------------------------
/debug/pprof/profile CPU profile
/debug/pprof/heap Memory allocations
/debug/pprof/goroutine All goroutines
/debug/pprof/block Blocking operations
/debug/pprof/mutex Mutex contention
/debug/pprof/allocs Past memory allocations
/debug/pprof/trace Execution trace
Always bind the debug server to localhost. Never expose pprof endpoints to the internet.
runtime/pprof: For Benchmarks & CLI Tools
For programs that are not long-running servers:
import "runtime/pprof"
func main() {
cpuFile, _ := os.Create("cpu.prof")
defer cpuFile.Close()
pprof.StartCPUProfile(cpuFile)
defer pprof.StopCPUProfile()
// Your application logic...
memFile, _ := os.Create("mem.prof")
defer memFile.Close()
pprof.WriteHeapProfile(memFile)
}
Analyze the profile:
go tool pprof cpu.prof
Profiling from Benchmarks
The easiest way to profile is through go test -bench:
go test -bench=BenchmarkProcess -cpuprofile=cpu.prof -memprofile=mem.prof ./...
go tool pprof cpu.prof
This is the recommended approach: write a benchmark, profile it, optimize, benchmark again.
go tool pprof: Interactive Analysis
When you open a profile, pprof drops you into an interactive shell:
go tool pprof cpu.prof
(pprof) top 10
top: Find Hot Functions
(pprof) top 10
Showing nodes accounting for 4.5s, 90% of 5s total
flat flat% sum% cum cum%
2.1s 42.00% 42.00% 2.1s 42.00% runtime.memmove
0.8s 16.00% 58.00% 0.8s 16.00% encoding/json.(*decodeState).scanWhile
0.5s 10.00% 68.00% 1.3s 26.00% myapp/internal/parser.Parse
0.4s 8.00% 76.00% 0.4s 8.00% runtime.mallocgc
- flat: Time spent in this function only (not its callees)
- cum: Cumulative time including all functions it calls
list: See Source Code
(pprof) list Parse
0.5s 1.3s (flat, cum) 26.00% of Total
. . 42:func Parse(data []byte) (*Result, error) {
0.1s 0.1s 43: lines := bytes.Split(data, []byte("\n"))
. . 44: results := make([]Item, 0)
0.4s 1.2s 45: for _, line := range lines {
. 0.8s 46: item, err := parseLine(line)
web: Visual Call Graph
(pprof) web
This opens a visual call graph in your browser (requires Graphviz).
Flame Graphs
Flame graphs are the most intuitive way to read profiles. Use pprof's built-in web UI:
go tool pprof -http=:8081 cpu.prof
This opens a browser with interactive flame graphs, call graphs, and source views. The flame graph shows:
- Width = time spent (wider = slower)
- Depth = call stack depth
- Color = package (same package, same color)
Look for wide bars. Those are your hot paths.
CPU Profiling
CPU profiling answers: "Where is my program spending CPU time?"
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
The profiler samples the call stack 100 times per second (default). Each sample records which function was executing. More samples in a function means more CPU time.
What to Look For
- Functions with high
flattime: they are doing the work - Functions with high
cumbut lowflat: they call expensive functions runtime.mallocgchigh in the profile: too many allocationsruntime.memmovehigh: copying too much data
Memory Profiling
Memory profiling answers: "Where is my program allocating memory?"
go tool pprof http://localhost:6060/debug/pprof/heap
alloc vs inuse
(pprof) top -alloc_space # total bytes allocated (even if freed)
(pprof) top -inuse_space # bytes currently in use
- alloc_space: Shows where allocations happen. High allocations cause GC pressure even if memory is freed quickly.
- inuse_space: Shows what is holding memory right now. Use this to find memory leaks.
(pprof) top -alloc_space
flat flat%
120MB 45.00% myapp/internal/parser.Parse
80MB 30.00% encoding/json.(*Decoder).Decode
40MB 15.00% bytes.Split
This tells you Parse is allocating 120MB over the profile period. Even if all of it is freed, it creates GC pressure.
Goroutine Profiling
go tool pprof http://localhost:6060/debug/pprof/goroutine
Shows where goroutines are blocked. Useful for finding:
- Goroutine leaks (goroutine count growing over time)
- Deadlocks (goroutines waiting for each other)
- Resource contention (many goroutines waiting on the same lock)
The Profiling Workflow
The workflow is a loop:
1. Write a benchmark
2. Run the benchmark to get a baseline
3. Profile the benchmark
4. Find the hot path
5. Optimize
6. Benchmark again to verify improvement
7. If not fast enough, go to step 3
func BenchmarkProcess(b *testing.B) {
data := loadTestData()
b.ResetTimer()
for i := 0; i < b.N; i++ {
Process(data)
}
}
# Baseline
go test -bench=BenchmarkProcess -count=5 ./...
# Profile
go test -bench=BenchmarkProcess -cpuprofile=cpu.prof ./...
go tool pprof -http=:8081 cpu.prof
# After optimization, compare
go test -bench=BenchmarkProcess -count=5 ./... | tee new.txt
benchstat old.txt new.txt
benchstat shows statistically significant performance differences.
Common Pitfalls
- Profiling in development only. Production workloads differ from benchmarks. Enable
net/http/pprofin production (on a separate, internal-only port). - Exposing pprof to the internet. Profiling endpoints leak internal details. Always bind to localhost or put behind authentication.
- Optimizing without a benchmark. Without before/after numbers, you do not know if your change helped. Always benchmark first.
- Looking only at flat time. A function with low
flatbut highcumcalls an expensive function. Optimizing the callee helps more. - Ignoring allocation profiles. CPU profiles miss GC overhead. If
runtime.mallocgcis high in your CPU profile, switch to memory profiling to find the allocation source. - Optimizing the wrong thing. Profile first. Developers are notoriously bad at guessing where the bottleneck is.
Key Takeaways
- Go has built-in profiling:
net/http/pproffor servers,runtime/pproffor CLI tools. - The easiest profiling path: write a benchmark, run it with
-cpuprofileor-memprofile, analyze withgo tool pprof. - Use
topto find hot functions,listto see source code,webor-httpfor flame graphs. - Memory profiling distinguishes
alloc_space(total allocations, GC pressure) frominuse_space(current memory, leaks). - The workflow is a loop: benchmark, profile, optimize, benchmark again.
- Always profile before optimizing. Intuition about bottlenecks is usually wrong.