Memory & GC
Go's garbage collector is concurrent and optimized for low latency. It typically pauses for under a millisecond, even with gigabytes of heap. Understanding how memory works in Go -- stack vs heap, escape analysis, the GC algorithm -- lets you write code that is both fast and memory-efficient without fighting the runtime.
Go's Garbage Collector
Go uses a concurrent, tri-color mark-and-sweep collector:
- Mark phase: The GC traces all reachable objects starting from roots (goroutine stacks, globals). Objects are colored white (unvisited), grey (visited, children not yet scanned), or black (visited, all children scanned).
- Sweep phase: White objects (unreachable) are freed. Black objects survive.
- Concurrent: Most GC work happens concurrently with your application. Only brief stop-the-world pauses occur at the start and end of marking.
Phase Concurrent? Duration
-----------------------------------------
Mark setup STW < 1ms
Marking Concurrent proportional to live heap
Mark cleanup STW < 1ms
Sweeping Concurrent proportional to freed heap
The key property: GC pause times are proportional to the number of goroutine stacks and globals, not the heap size. A 10GB heap does not mean 10GB of scanning in a pause.
GOGC: Tuning the GC
GOGC controls how often the GC runs. The default is GOGC=100, meaning the GC triggers when the heap grows to 2x the size of the live heap after the last collection.
# GC runs when heap doubles (default)
GOGC=100 ./myapp
# GC runs when heap grows 50% -- more frequent, less memory
GOGC=50 ./myapp
# GC runs when heap triples -- less frequent, more memory
GOGC=200 ./myapp
# Disable GC entirely (for short-lived programs)
GOGC=off ./myapp
GOMEMLIMIT (Go 1.19+)
GOMEMLIMIT sets a soft memory limit. The GC runs more aggressively as memory approaches the limit:
GOMEMLIMIT=1GiB ./myapp
This is better than tuning GOGC directly. Set GOMEMLIMIT to ~80% of your container memory limit:
# Container has 2GB
GOMEMLIMIT=1600MiB ./myapp
The GC will work harder to stay under the limit without needing you to guess the right GOGC value.
Stack vs Heap Allocation
Go has two places to allocate memory:
Stack
- Fast: just a pointer bump
- Automatically freed when the function returns
- No GC involvement
- Limited to data that does not outlive the function
Heap
- Slower: requires GC tracking
- Lives until the GC determines it is unreachable
- Required for data that escapes the function scope
func stackAllocation() int {
x := 42 // allocated on the stack
return x // value is copied to caller
}
func heapAllocation() *int {
x := 42 // allocated on the heap
return &x // pointer escapes the function
}
In the second function, x must live beyond the function return, so the compiler allocates it on the heap.
Escape Analysis
The compiler decides where to allocate through escape analysis. View its decisions with -gcflags '-m':
go build -gcflags '-m' ./...
./main.go:10:2: x escapes to heap
./main.go:15:2: y does not escape
./main.go:20:9: make([]byte, n) escapes to heap
Common reasons values escape:
// Escapes: returned pointer
func newUser() *User {
u := User{Name: "Alice"} // escapes to heap
return &u
}
// Escapes: assigned to interface
func process(v any) { /* ... */ }
func main() {
x := 42
process(x) // x escapes because interface{} is a heap allocation
}
// Escapes: closure captures variable
func counter() func() int {
n := 0 // escapes: captured by closure
return func() int {
n++
return n
}
}
// Escapes: too large for stack
func bigSlice() {
data := make([]byte, 10_000_000) // escapes: too big for stack
_ = data
}
Reducing Allocations
sync.Pool: Reuse Temporary Objects
sync.Pool maintains a pool of reusable objects, reducing GC pressure:
var bufferPool = sync.Pool{
New: func() any {
return new(bytes.Buffer)
},
}
func processRequest(data []byte) string {
buf := bufferPool.Get().(*bytes.Buffer)
defer func() {
buf.Reset()
bufferPool.Put(buf)
}()
buf.Write(data)
buf.WriteString(" processed")
return buf.String()
}
Use sync.Pool for objects that are allocated and freed frequently (buffers, encoders, temporary slices). Do not use it for objects with a long lifetime.
Pre-Allocated Slices
// Bad: grows and reallocates multiple times
func collect(n int) []string {
var result []string
for i := 0; i < n; i++ {
result = append(result, fmt.Sprintf("item-%d", i))
}
return result
}
// Good: one allocation
func collect(n int) []string {
result := make([]string, 0, n)
for i := 0; i < n; i++ {
result = append(result, fmt.Sprintf("item-%d", i))
}
return result
}
When you know the size (or a reasonable upper bound), pre-allocate with make([]T, 0, capacity).
Avoid String Concatenation in Loops
// Bad: O(n^2) allocations
func join(items []string) string {
result := ""
for _, item := range items {
result += item + ", " // allocates a new string each iteration
}
return result
}
// Good: O(n) with one allocation
func join(items []string) string {
var buf strings.Builder
for i, item := range items {
if i > 0 {
buf.WriteString(", ")
}
buf.WriteString(item)
}
return buf.String()
}
strings.Builder minimizes allocations by growing an internal buffer.
Accept Interfaces, Return Structs
// Allocates: returning an interface forces heap allocation
func NewReader() io.Reader {
return &myReader{} // escapes to heap
}
// Does not allocate: returning concrete type may stay on stack
func NewReader() *myReader {
return &myReader{} // may stay on stack if caller does not store in interface
}
Avoid Pointers to Small Values
// Counterintuitive: pointer causes heap allocation
type Config struct {
Port *int // forces heap allocation for the int
Verbose *bool // forces heap allocation for the bool
}
// Better for small values: use the value directly
type Config struct {
Port int
Verbose bool
}
Pointers to small types (int, bool, small structs) often cost more than copying the value, because the pointer forces a heap allocation.
When GC Pauses Matter & When They Don't
They Matter
- Real-time trading systems (sub-millisecond latency requirements)
- Game servers (frame timing sensitive)
- Low-latency network proxies
They Usually Do Not Matter
- Web APIs (network latency dwarfs GC pauses)
- Batch processing (throughput matters, not latency)
- CLI tools (run once and exit)
For most Go applications, the GC is not the bottleneck. Profile before tuning.
Monitoring GC in Production
Enable GC logging:
GODEBUG=gctrace=1 ./myapp
gc 1 @0.012s 2%: 0.015+1.2+0.006 ms clock, 0.12+0.8/1.0/0+0.048 ms cpu, 4->4->2 MB, 4 MB goal
0.015+1.2+0.006 ms: STW pause + concurrent mark + STW pause4->4->2 MB: heap before, heap after mark, live heap4 MB goal: target heap size for next GC
In code, use runtime.ReadMemStats:
var m runtime.MemStats
runtime.ReadMemStats(&m)
slog.Info("memory",
"heap_alloc", m.HeapAlloc,
"heap_sys", m.HeapSys,
"num_gc", m.NumGC,
"gc_pause_total", m.PauseTotalNs,
)
Common Pitfalls
- Tuning GOGC before profiling. The default is fine for most applications. Profile first, tune only if GC is actually a bottleneck.
- Using sync.Pool for long-lived objects. The pool is cleared on every GC cycle. It is for temporary, frequently-allocated objects only.
- Assuming pointers are always faster. Pointers to small values cause heap allocations. Passing small structs by value is often faster.
- Pre-optimizing allocations. Write clear code first. Profile. Optimize only the hot paths.
- Setting GOMEMLIMIT without headroom. If your container has 2GB, do not set
GOMEMLIMIT=2GiB. Leave 20% headroom for the OS and non-Go memory. - Ignoring escape analysis output. Run
go build -gcflags '-m'on hot paths. One unexpected escape can cause significant allocation overhead.
Key Takeaways
- Go's GC is concurrent with sub-millisecond pauses. It is not the bottleneck for most applications.
GOMEMLIMIT(Go 1.19+) is the preferred tuning knob. Set it to ~80% of container memory.- Stack allocation is free. Heap allocation requires GC work. Escape analysis decides which.
- Run
go build -gcflags '-m'to see what escapes to the heap. - Reduce allocations with
sync.Pool, pre-allocated slices,strings.Builder, and by returning concrete types. - Profile before tuning. Most Go applications do not need GC tuning.
- Use
GODEBUG=gctrace=1andruntime.ReadMemStatsto monitor GC behavior in production.