The Build Process
When you run gcc main.c -o main, four distinct stages happen in sequence: preprocessing, compilation, assembly, and linking. Understanding each stage explains why certain errors occur, why header files exist, and how large C projects are organized.
The Four Stages
Stage 1: Preprocessing
The preprocessor handles all # directives. It expands #include by pasting file contents, replaces #define macros with their expansions, and evaluates #ifdef/#endif blocks. The output is a single, expanded C source file with no preprocessor directives remaining.
gcc -E main.c -o main.i
The .i file can be enormous. A simple file that includes <stdio.h> may expand to over 10,000 lines because standard headers include other headers recursively.
Stage 2: Compilation
The compiler translates the preprocessed C source into assembly language. It parses the C code, builds an abstract syntax tree, performs optimizations, and generates assembly instructions for the target architecture.
gcc -S main.c -o main.s
The .s file contains human-readable assembly. This is where syntax errors and type errors are caught. The compiler sees only one translation unit at a time — one .c file with all its included headers.
Stage 3: Assembly
The assembler converts assembly language into machine code, producing an object file. Object files contain binary machine instructions but are not yet executable because they may reference functions and variables defined elsewhere.
gcc -c main.c -o main.o
The .o file is a binary file in ELF format (Linux), Mach-O format (macOS), or COFF/PE format (Windows). It contains machine code, a symbol table listing defined and referenced symbols, and relocation information.
Stage 4: Linking
The linker combines one or more object files and libraries into a final executable. It resolves symbol references: if main.o calls calculate() and math.o defines calculate(), the linker connects them.
gcc main.o math.o -o program
The linker also brings in the C runtime startup code (crt0) that calls main() and links the standard C library.
Object Files & Symbol Tables
Every .o file has a symbol table listing what it defines and what it needs. You can inspect it with nm.
nm main.o
U calculate
0000000000000000 T main
U printf
T means the symbol is defined in the text (code) section. U means undefined — this symbol is referenced but not defined here. The linker must find every U symbol in some other object file or library.
Static Libraries
A static library is an archive of object files. It is created with the ar tool.
gcc -c utils.c -o utils.o
gcc -c math_helpers.c -o math_helpers.o
ar rcs libmylib.a utils.o math_helpers.o
The ar rcs command creates (or replaces) the archive, creates an index for fast lookup, and adds the specified object files. The naming convention is lib<name>.a.
To link against it:
gcc main.o -L. -lmylib -o program
-L. tells the linker to search the current directory. -lmylib tells it to look for libmylib.a. The linker extracts only the object files that contain symbols your program actually references.
How Static Linking Works
With static libraries, the referenced object code is copied into your executable. The resulting binary is self-contained — it does not need the .a file at runtime. The trade-off is a larger executable and no way to update the library without recompiling.
Shared Libraries
Shared libraries (.so on Linux, .dylib on macOS, .dll on Windows) are loaded at runtime rather than copied into the executable.
gcc -shared -fPIC utils.c math_helpers.c -o libmylib.so
gcc main.c -L. -lmylib -o program
The -fPIC flag generates position-independent code, which is required for shared libraries because the library can be loaded at any memory address. At runtime, the dynamic linker (ld.so on Linux) finds and loads the shared library.
Static vs Shared Trade-offs
/* This code works the same regardless of library type */
#include "mylib.h"
int main(void) {
int result = calculate(10, 20);
return result;
}
Static libraries produce larger executables but have no runtime dependencies. Shared libraries produce smaller executables and allow updates without recompilation, but the correct library version must be present at runtime.
Symbol Resolution & Undefined References
The most common linker error is "undefined reference."
/usr/bin/ld: main.o: undefined reference to `calculate'
collect2: error: ld returned 1 exit status
This means the linker found a call to calculate but could not find its definition in any object file or library. Common causes:
/* You forgot to compile the file that defines calculate */
/* gcc main.c -o program <-- missing math.c */
/* gcc main.c math.c -o program <-- correct */
/* You declared the function but never defined it */
int calculate(int a, int b); /* Declaration in header */
/* But no .c file contains the definition */
/* You misspelled the function name */
int calulate(int a, int b) { /* Typo in definition */ }
/* You forgot to link the library */
/* gcc main.o -o program <-- missing -lm */
/* gcc main.o -lm -o program <-- correct */
Multiple Definitions
The opposite error occurs when a symbol is defined in more than one object file.
/usr/bin/ld: math.o: multiple definition of `calculate'; utils.o: first defined here
This happens when you define a function in a header file and include that header in multiple .c files. Each .c file gets its own copy of the function, and the linker sees duplicates. The fix is to declare the function in the header and define it in exactly one .c file, or use static inline in the header.
Header Files as Contracts
Header files serve a specific purpose: they declare the interface that a .c file provides. They are contracts between compilation units.
/* calculator.h - the contract */
#ifndef CALCULATOR_H
#define CALCULATOR_H
int add(int a, int b);
int subtract(int a, int b);
#endif
/* calculator.c - the implementation */
#include "calculator.h"
int add(int a, int b) {
return a + b;
}
int subtract(int a, int b) {
return a - b;
}
/* main.c - the consumer */
#include "calculator.h"
int main(void) {
int result = add(3, 4);
return result;
}
The header tells the compiler what functions exist and what types they use. The compiler can check that main.c calls add with the correct arguments without seeing the implementation. The linker later connects the call to the actual code.
A Complete Build Example
Consider a project with three source files.
project/
main.c
parser.c
parser.h
utils.c
utils.h
Building it step by step:
gcc -c main.c -o main.o
gcc -c parser.c -o parser.o
gcc -c utils.c -o utils.o
gcc main.o parser.o utils.o -o program
Each .c file compiles independently. If you modify only parser.c, you only need to recompile parser.o and relink. This is the foundation of incremental builds, which build systems automate.
Compilation Flags That Matter
gcc -Wall -Wextra -Wpedantic -std=c17 -O2 -g main.c -o program
-Wallenables most common warnings-Wextraenables additional warnings-Wpedanticenforces strict standard compliance-std=c17specifies the C standard version-O2enables optimization (use-O0for debugging)-gincludes debug information for GDB
Always compile with warnings enabled. A clean compile with -Wall -Wextra catches many bugs before they become runtime crashes.
Common Pitfalls
- Forgetting to link all object files — If you add a new
.cfile to the project but do not add it to the build command, you get undefined reference errors. Build systems prevent this. - Defining functions in headers — A function defined (not just declared) in a header that is included by multiple
.cfiles causes multiple definition errors at link time. - Wrong library link order — The linker processes files left to right. If
main.odepends on-lmath, then-lmathmust come aftermain.o. Put libraries at the end of the link command. - Mixing C and C++ without extern "C" — C++ mangles function names. If you link C and C++ object files, wrap C headers in
extern "C"blocks when included from C++. - Not recompiling after changing a header — If you change
parser.hbut only recompileparser.c, other files that includeparser.hmay use stale declarations. Build systems track header dependencies to avoid this. - Confusing compiler errors with linker errors — Compiler errors reference source lines. Linker errors reference symbol names. If you see "undefined reference," the problem is not in the code syntax but in which files are being linked.
Key Takeaways
- The C build process has four stages: preprocessing (expand macros and includes), compilation (C to assembly), assembly (assembly to object code), and linking (combine objects into executable).
- Object files contain machine code and symbol tables. The linker resolves undefined symbols by finding them in other object files or libraries.
- Static libraries (
.a) are archives of object files copied into the executable. Shared libraries (.so/.dylib/.dll) are loaded at runtime. - "Undefined reference" means the linker cannot find a function or variable definition. Check that all source files are compiled and all libraries are linked.
- Header files declare interfaces. They are contracts between compilation units that let the compiler check types without seeing implementations.
- Always compile with
-Wall -Wextrato catch bugs at compile time rather than at runtime.