Low-Level I/O
POSIX File Descriptors
At the operating system level, files are represented by small non-negative integers called file descriptors. The POSIX API (open, read, write, close) works directly with these descriptors, bypassing the C library's buffering layer.
Three file descriptors are open by default:
| FD | Name | C equivalent |
|---|---|---|
| 0 | stdin | stdin |
| 1 | stdout | stdout |
| 2 | stderr | stderr |
open, read, write, close
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
int main(void) {
/* Open a file for writing, create if needed, truncate if exists */
int fd = open("output.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (fd < 0) {
perror("open");
return 1;
}
const char *msg = "Hello from POSIX I/O\n";
ssize_t written = write(fd, msg, strlen(msg));
if (written < 0) {
perror("write");
close(fd);
return 1;
}
printf("Wrote %zd bytes\n", written);
close(fd);
/* Open for reading */
fd = open("output.txt", O_RDONLY);
if (fd < 0) {
perror("open");
return 1;
}
char buf[128];
ssize_t n = read(fd, buf, sizeof(buf) - 1);
if (n < 0) {
perror("read");
close(fd);
return 1;
}
buf[n] = '\0';
printf("Read: %s", buf);
close(fd);
return 0;
}
Wrote 21 bytes
Read: Hello from POSIX I/O
Key differences from stdio:
openreturns anint(file descriptor), not aFILE*.readandwritework with raw bytes, not formatted text.- There is no buffering. Each
readorwriteis a system call. - Return values are
ssize_t: the number of bytes transferred, or -1 on error.
Common open Flags
| Flag | Meaning |
|---|---|
O_RDONLY |
Open for reading only |
O_WRONLY |
Open for writing only |
O_RDWR |
Open for reading and writing |
O_CREAT |
Create file if it does not exist |
O_TRUNC |
Truncate file to zero length |
O_APPEND |
Writes go to end of file |
O_EXCL |
Fail if file already exists (with O_CREAT) |
When using O_CREAT, always pass a mode argument (e.g., 0644) to set file permissions.
lseek: Moving the File Offset
lseek repositions the read/write offset within a file:
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int main(void) {
int fd = open("data.bin", O_RDWR | O_CREAT | O_TRUNC, 0644);
if (fd < 0) { perror("open"); return 1; }
/* Write some data */
int values[] = {10, 20, 30, 40, 50};
write(fd, values, sizeof(values));
/* Seek to the third integer */
lseek(fd, 2 * sizeof(int), SEEK_SET);
/* Overwrite it */
int new_val = 99;
write(fd, &new_val, sizeof(int));
/* Seek back to the beginning and read all values */
lseek(fd, 0, SEEK_SET);
int result[5];
read(fd, result, sizeof(result));
for (int i = 0; i < 5; i++) {
printf("result[%d] = %d\n", i, result[i]);
}
close(fd);
return 0;
}
result[0] = 10
result[1] = 20
result[2] = 99
result[3] = 40
result[4] = 50
lseek whence values:
| Constant | Meaning |
|---|---|
SEEK_SET |
Offset from start of file |
SEEK_CUR |
Offset from current position |
SEEK_END |
Offset from end of file |
stdio vs POSIX: When to Use Which
stdio (Buffered)
fopen,fread,fwrite,fprintf,fgets,fclose- Buffers reads and writes in user space, reducing system calls
- Provides formatted I/O (
fprintf,fscanf) - Portable across all C implementations
Best for: text processing, line-by-line reading, formatted output, general-purpose file I/O.
POSIX (Unbuffered)
open,read,write,close,lseek- Each call is a direct system call (no buffering unless you add it)
- Returns file descriptors, which are needed for
mmap,dup2,select,poll - Available on Unix-like systems (not standard C)
Best for: binary protocols, network sockets, memory-mapped files, file descriptor manipulation, and cases where you need precise control over I/O behavior.
/* Use stdio for text */
FILE *config = fopen("config.ini", "r");
char line[256];
while (fgets(line, sizeof(line), config)) {
/* process line */
}
fclose(config);
/* Use POSIX for binary network data */
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
char packet[1024];
ssize_t n = read(sockfd, packet, sizeof(packet));
mmap: Memory-Mapped Files
mmap maps a file into memory. The file's contents appear as a byte array that you can read and write directly, without explicit read/write calls.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <string.h>
#include <ctype.h>
int main(int argc, char *argv[]) {
if (argc != 2) {
fprintf(stderr, "Usage: %s <file>\n", argv[0]);
return 1;
}
int fd = open(argv[1], O_RDONLY);
if (fd < 0) { perror("open"); return 1; }
struct stat st;
if (fstat(fd, &st) < 0) { perror("fstat"); close(fd); return 1; }
char *data = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (data == MAP_FAILED) { perror("mmap"); close(fd); return 1; }
close(fd); /* the mapping keeps the data accessible */
/* Count words in the file */
int words = 0;
int in_word = 0;
for (off_t i = 0; i < st.st_size; i++) {
if (isspace((unsigned char)data[i])) {
in_word = 0;
} else if (!in_word) {
in_word = 1;
words++;
}
}
printf("File size: %lld bytes\n", (long long)st.st_size);
printf("Word count: %d\n", words);
munmap(data, st.st_size);
return 0;
}
Advantages of mmap:
- The OS handles paging. Only accessed pages are loaded into RAM.
- Multiple processes can share the same mapping (with
MAP_SHARED). - Random access is natural: just use array indexing.
- For large files, mmap can be faster than repeated
readcalls.
Disadvantages:
- Error handling is harder (segfault on access to a truncated file).
- Not suitable for pipes, sockets, or devices.
MAP_SHAREDwrites are not atomic.
dup2: File Descriptor Redirection
dup2 replaces one file descriptor with a copy of another. This is how shells implement I/O redirection.
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int main(void) {
/* Redirect stdout to a file */
int fd = open("output.log", O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (fd < 0) { perror("open"); return 1; }
int saved_stdout = dup(STDOUT_FILENO); /* save original stdout */
dup2(fd, STDOUT_FILENO); /* stdout now writes to the file */
close(fd);
printf("This goes to the file\n");
printf("So does this\n");
/* Restore original stdout */
dup2(saved_stdout, STDOUT_FILENO);
close(saved_stdout);
printf("This goes to the terminal\n");
return 0;
}
This goes to the terminal
The first two printf calls write to output.log. After restoring stdout, output returns to the terminal.
Real-World Example: Copying a File with POSIX I/O
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int copy_file(const char *src, const char *dst) {
int in = open(src, O_RDONLY);
if (in < 0) { perror("open src"); return -1; }
int out = open(dst, O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (out < 0) { perror("open dst"); close(in); return -1; }
char buf[8192];
ssize_t n;
while ((n = read(in, buf, sizeof(buf))) > 0) {
ssize_t written = 0;
while (written < n) {
ssize_t w = write(out, buf + written, n - written);
if (w < 0) {
perror("write");
close(in);
close(out);
return -1;
}
written += w;
}
}
if (n < 0) { perror("read"); }
close(out);
close(in);
return (n < 0) ? -1 : 0;
}
int main(int argc, char *argv[]) {
if (argc != 3) {
fprintf(stderr, "Usage: %s <src> <dst>\n", argv[0]);
return 1;
}
if (copy_file(argv[1], argv[2]) == 0) {
printf("Copy complete\n");
}
return 0;
}
Note the inner write loop: write may transfer fewer bytes than requested (a short write). The loop ensures all bytes are written.
Common Pitfalls
- Not handling short reads and writes.
readandwritemay transfer fewer bytes than requested, especially on sockets and pipes. Always loop until all data is transferred or an error occurs. - Forgetting the mode argument with O_CREAT. Without a mode,
openreads garbage from the stack for the permission bits, creating files with random permissions. - Mixing stdio and POSIX on the same file. A
FILE*fromfopenhas its own buffer. Mixingfprintfandwriteon the same file produces interleaved, out-of-order output. Usefileno(fp)to get the descriptor and stick to one API. - Not checking return values. Every POSIX I/O call can fail. Ignoring return values leads to silent data loss.
- Using mmap on special files. Pipes, sockets, and some device files cannot be memory-mapped. Always check the return value of
mmap. - Leaking file descriptors. Like memory leaks, failing to
closefile descriptors exhausts the per-process limit (typically 1024). Use cleanup patterns to ensure descriptors are closed.
Key Takeaways
- POSIX I/O operates on integer file descriptors with
open,read,write,close, andlseek. - File descriptors 0, 1, 2 are stdin, stdout, and stderr.
- stdio is buffered and portable; POSIX is unbuffered and gives direct control. Use stdio for text, POSIX for binary, sockets, and descriptor manipulation.
mmapmaps files into memory for efficient random access to large files.dup2redirects file descriptors, which is how shell redirection works.- Always handle short reads/writes, check return values, and close file descriptors when done.