5 min read
On this page

Hello World & Compilation

Every C program starts with understanding the compilation pipeline. Unlike interpreted languages where you write code and run it immediately, C requires an explicit build step that transforms human-readable source code into machine code. This is not a limitation — it is the reason C is fast.

Your First C Program

#include <stdio.h>

int main(void) {
    printf("Hello, world!\n");
    return 0;
}

Save this as hello.c and compile it:

$ gcc -o hello hello.c
$ ./hello
Hello, world!

Every piece of this program matters:

  • #include <stdio.h> — a preprocessor directive that includes the standard I/O header, which declares printf
  • int main(void) — the entry point; the OS calls this function when your program starts
  • printf("Hello, world!\n") — writes to standard output; \n is a newline character
  • return 0 — tells the OS the program succeeded; nonzero means failure

The Compilation Pipeline

When you run gcc -o hello hello.c, four distinct phases execute in sequence. Understanding these phases prevents 90% of beginner confusion.

Phase 1: Preprocessor (Source to Expanded Source)

The preprocessor handles lines starting with #. It performs textual substitution before the compiler ever sees your code.

// Before preprocessing
#include <stdio.h>
#define MAX_SIZE 1024

int buffer[MAX_SIZE];
// After preprocessing (conceptually)
// ... thousands of lines from stdio.h pasted here ...

int buffer[1024];

See the preprocessor output yourself:

$ gcc -E hello.c | tail -20

The preprocessor does not understand C. It performs text manipulation: #include pastes file contents, #define does find-and-replace, #ifdef conditionally includes or excludes blocks.

Phase 2: Compiler (Expanded Source to Assembly)

The compiler parses C code, checks types, optimizes, and produces assembly language for your target architecture.

$ gcc -S hello.c
$ cat hello.s

This produces a .s file containing assembly instructions. On x86-64 Linux, the printf call becomes something like a call instruction to the printf symbol. You do not need to read assembly, but knowing it exists demystifies what your code becomes.

Phase 3: Assembler (Assembly to Object Code)

The assembler converts human-readable assembly into machine code — binary instructions the CPU understands. The output is an object file (.o).

$ gcc -c hello.c
$ file hello.o
hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

Object files contain machine code but are not yet runnable. They have unresolved references — your code calls printf, but the object file does not contain printf's implementation.

Phase 4: Linker (Object Files to Executable)

The linker resolves references between object files and libraries. It finds printf in the C standard library (libc), connects your call to its implementation, and produces the final executable.

$ gcc -o hello hello.o
$ file hello
hello: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked

The linker is why you can split a C program across multiple files: each file compiles independently into an object file, and the linker combines them.

The Full Pipeline

hello.c  -->  [Preprocessor]  -->  hello.i  (expanded source)
hello.i  -->  [Compiler]      -->  hello.s  (assembly)
hello.s  -->  [Assembler]     -->  hello.o  (object code)
hello.o  -->  [Linker]        -->  hello    (executable)

gcc & clang

The two major C compilers are GCC (GNU Compiler Collection) and Clang (part of the LLVM project).

$ gcc -o hello hello.c
$ clang -o hello hello.c

Both accept the same flags for most purposes. Clang generally produces better error messages. GCC is the default on most Linux distributions. macOS ships with Clang (aliased as gcc on some systems).

Check which compiler you have:

$ gcc --version
$ clang --version

Compile Flags That Matter

Never compile without warnings. The compiler knows more about C than you do.

The Essential Flags

$ gcc -Wall -Wextra -Werror -std=c17 -o hello hello.c
  • -Wall — enables most common warnings (uninitialized variables, unused values, format string mismatches)
  • -Wextra — enables additional warnings beyond -Wall (unused parameters, sign comparison)
  • -Werror — treats all warnings as errors; your code does not compile until every warning is fixed
  • -std=c17 — use the C17 standard; ensures consistent behavior across compilers

Optimization Flags

$ gcc -O0 hello.c   # No optimization (default). Best for debugging.
$ gcc -O2 hello.c   # Standard optimization. Use for production builds.
$ gcc -O3 hello.c   # Aggressive optimization. Occasionally slower due to code size.
$ gcc -Os hello.c   # Optimize for size. Good for embedded systems.

Debugging Flags

$ gcc -g hello.c              # Include debug symbols for gdb
$ gcc -fsanitize=address hello.c   # Enable AddressSanitizer (catches memory bugs)
$ gcc -fsanitize=undefined hello.c # Enable UBSan (catches undefined behavior)

A good development command:

$ gcc -Wall -Wextra -Werror -std=c17 -g -fsanitize=address,undefined -o hello hello.c

Multi-File Compilation

Real C programs span multiple files. Each .c file is a translation unit that compiles independently.

// math_utils.h
#ifndef MATH_UTILS_H
#define MATH_UTILS_H

int add(int a, int b);
int multiply(int a, int b);

#endif
// math_utils.c
#include "math_utils.h"

int add(int a, int b) {
    return a + b;
}

int multiply(int a, int b) {
    return a * b;
}
// main.c
#include <stdio.h>
#include "math_utils.h"

int main(void) {
    printf("%d\n", add(3, 4));
    printf("%d\n", multiply(3, 4));
    return 0;
}

Compile and link:

$ gcc -Wall -Wextra -std=c17 -c main.c         # produces main.o
$ gcc -Wall -Wextra -std=c17 -c math_utils.c    # produces math_utils.o
$ gcc -o program main.o math_utils.o             # links into executable
$ ./program
7
12

Or in one step:

$ gcc -Wall -Wextra -std=c17 -o program main.c math_utils.c

The include guard (#ifndef MATH_UTILS_H) prevents the header from being included twice in the same translation unit, which would cause duplicate declaration errors.

Why the Compilation Model Makes C Fast

The compilation model is not an inconvenience — it is an engineering decision with profound consequences.

No runtime overhead. Python reads your source code at runtime and interprets it. Java compiles to bytecode that a virtual machine interprets or JIT-compiles. C compiles directly to machine code before your program ever runs. When the CPU executes your program, it runs native instructions with zero interpretation overhead.

Aggressive optimization. The compiler has time to analyze your entire program and optimize it. It can inline functions, unroll loops, eliminate dead code, reorder instructions, and vectorize operations. An interpreter cannot do this because it processes code one statement at a time.

Predictable performance. Because there is no garbage collector, no JIT warmup, no interpreter overhead, C programs have predictable, consistent performance. This is why real-time systems (audio processing, robotics, avionics) use C.

Common Pitfalls

  • Not reading compiler warnings — the compiler is telling you about bugs. -Wall -Wextra -Werror forces you to listen. Ignoring warnings is the single biggest source of C bugs.
  • Confusing compiler errors with linker errors — "undefined reference to foo" is a linker error, not a compiler error. It means the linker cannot find the implementation of foo. You probably forgot to compile or link the file that defines it.
  • Forgetting include guards — without #ifndef guards, including the same header twice causes redefinition errors.
  • Mixing up #include <...> and #include "..." — angle brackets search system include paths; quotes search the current directory first, then system paths. Use quotes for your own headers.
  • Compiling without -std=c17 — without specifying a standard, the compiler may use GNU extensions or older defaults, producing code that is not portable.

Key Takeaways

  • The C compilation pipeline has four phases: preprocessor, compiler, assembler, linker. Each solves a specific problem.
  • Always compile with -Wall -Wextra -Werror -std=c17. Warnings are not suggestions — they are bug reports from the compiler.
  • C is fast because compilation produces native machine code with no runtime interpretation or garbage collection overhead.
  • Multi-file programs compile each .c file independently into object files, then link them together. Headers (.h files) declare interfaces between translation units.
  • Understanding the compilation pipeline is foundational. When you know what the preprocessor, compiler, and linker each do, error messages make sense and build systems become logical.