String Manipulation Patterns
C has no string type — just arrays of bytes with a null terminator convention. This means there is no built-in concatenation operator, no string interpolation, no split method, no regex. You build everything from a small set of standard library functions and manual buffer management. These patterns appear in every C codebase that handles text.
Building Strings with snprintf
snprintf is the most important string function in C. It formats data into a buffer with bounds checking and always null-terminates (when the buffer size is at least 1).
#include <stdio.h>
int main(void) {
char buffer[256];
// Simple formatting
snprintf(buffer, sizeof(buffer), "User %s has %d points", "alice", 1500);
printf("%s\n", buffer);
// Building a path
const char *dir = "/var/log";
const char *file = "app.log";
snprintf(buffer, sizeof(buffer), "%s/%s", dir, file);
printf("%s\n", buffer);
// Formatting with precision
double price = 19.99;
snprintf(buffer, sizeof(buffer), "Total: $%.2f", price);
printf("%s\n", buffer);
return 0;
}
User alice has 1500 points
/var/log/app.log
Total: $19.99
Appending to a String with snprintf
Build a string incrementally by tracking the write position:
#include <stdio.h>
int main(void) {
char buffer[256];
int offset = 0;
int remaining = sizeof(buffer);
const char *items[] = {"apple", "banana", "cherry"};
int prices[] = {120, 85, 200};
int count = 3;
for (int i = 0; i < count; i++) {
int written = snprintf(buffer + offset, remaining,
"%s: $%d.%02d\n",
items[i], prices[i] / 100, prices[i] % 100);
if (written < 0 || written >= remaining) {
fprintf(stderr, "buffer full\n");
break;
}
offset += written;
remaining -= written;
}
printf("%s", buffer);
return 0;
}
apple: $1.20
banana: $0.85
cherry: $2.00
Track offset (where to write next) and remaining (how much space is left). This pattern avoids repeated calls to strcat, which must scan the entire string each time.
Tokenizing Strings
strtok — Stateful Tokenizer
strtok splits a string by a set of delimiter characters. It modifies the original string and uses internal state, making it not thread-safe and unsuitable for nested tokenization.
#include <stdio.h>
#include <string.h>
int main(void) {
char csv[] = "alice,42,engineer,seattle";
// First call: pass the string
char *token = strtok(csv, ",");
while (token != NULL) {
printf("field: [%s]\n", token);
// Subsequent calls: pass NULL to continue with the same string
token = strtok(NULL, ",");
}
// WARNING: csv is now modified — commas replaced with '\0'
printf("original string is destroyed: [%s]\n", csv);
return 0;
}
field: [alice]
field: [42]
field: [engineer]
field: [seattle]
original string is destroyed: [alice]
strtok replaces each delimiter with '\0' and returns a pointer into the original string. The original string is destroyed. You cannot tokenize two strings simultaneously with strtok because it uses a single internal pointer.
strtok_r — Reentrant Version
strtok_r is the thread-safe and nestable version. It stores its state in a user-provided pointer instead of internal state:
#include <stdio.h>
#include <string.h>
int main(void) {
char input[] = "name=alice;age=30;city=seattle";
char *saveptr1;
char *pair = strtok_r(input, ";", &saveptr1);
while (pair != NULL) {
// Nested tokenization: split each pair on '='
char *saveptr2;
char *key = strtok_r(pair, "=", &saveptr2);
char *value = strtok_r(NULL, "=", &saveptr2);
if (key != NULL && value != NULL) {
printf("%s -> %s\n", key, value);
}
pair = strtok_r(NULL, ";", &saveptr1);
}
return 0;
}
name -> alice
age -> 30
city -> seattle
Searching Strings
strstr — Find a Substring
#include <stdio.h>
#include <string.h>
int main(void) {
const char *haystack = "The quick brown fox jumps over the lazy dog";
const char *found = strstr(haystack, "brown fox");
if (found != NULL) {
printf("found at offset %td: \"%s\"\n",
found - haystack, found);
}
// Check if a string contains a substring
if (strstr(haystack, "cat") == NULL) {
printf("no cat found\n");
}
return 0;
}
found at offset 10: "brown fox jumps over the lazy dog"
no cat found
strstr returns a pointer to the first occurrence of the substring, or NULL if not found. It performs a linear scan.
strchr & strrchr — Find a Character
strchr finds the first occurrence of a character. strrchr finds the last. Both return NULL if the character is not found.
const char *path = "/usr/local/bin/gcc";
const char *last_slash = strrchr(path, '/');
if (last_slash != NULL) {
printf("filename: %s\n", last_slash + 1); // "gcc"
}
A common pattern for counting occurrences: loop with strchr, advancing past each match.
Parsing Numbers from Strings
strtol — String to Long Integer
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
int parse_int(const char *str, int *out) {
char *end;
errno = 0;
long value = strtol(str, &end, 10);
if (end == str) {
fprintf(stderr, "no digits found in \"%s\"\n", str);
return -1;
}
if (*end != '\0') {
fprintf(stderr, "trailing characters in \"%s\": \"%s\"\n", str, end);
return -1;
}
if (errno == ERANGE) {
fprintf(stderr, "out of range: \"%s\"\n", str);
return -1;
}
*out = (int)value;
return 0;
}
int main(void) {
int value;
if (parse_int("12345", &value) == 0) {
printf("parsed: %d\n", value);
}
parse_int("12abc", &value); // trailing characters
parse_int("hello", &value); // no digits
parse_int("", &value); // no digits
return 0;
}
parsed: 12345
trailing characters in "12abc": "abc"
no digits found in "hello"
no digits found in ""
strtol is superior to atoi in every way:
- It reports where parsing stopped (via
end) - It reports overflow (via
errno) - It supports arbitrary bases (10 for decimal, 16 for hex, 0 for auto-detect)
strtod works the same way for parsing doubles. Both are superior to atoi/atof because they report errors through the end pointer and errno.
Dynamic Strings: malloc & realloc
When you do not know the final string size in advance, grow a buffer dynamically:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
char *data;
size_t length;
size_t capacity;
} String;
int string_append(String *s, const char *text) {
size_t text_len = strlen(text);
if (s->length + text_len + 1 > s->capacity) {
size_t new_cap = (s->length + text_len + 1) * 2;
char *tmp = realloc(s->data, new_cap);
if (tmp == NULL) return -1; // original buffer preserved
s->data = tmp;
s->capacity = new_cap;
}
memcpy(s->data + s->length, text, text_len + 1);
s->length += text_len;
return 0;
}
int main(void) {
String s = { .data = malloc(16), .length = 0, .capacity = 16 };
if (s.data == NULL) return 1;
s.data[0] = '\0';
string_append(&s, "Hello");
string_append(&s, ", ");
string_append(&s, "world!");
printf("%s (len=%zu, cap=%zu)\n", s.data, s.length, s.capacity);
free(s.data);
return 0;
}
Hello, world! (len=13, cap=16)
The pattern: start with a small buffer, double its capacity when it fills up. This gives amortized O(1) cost per character appended. Always check realloc's return value with a temporary pointer — assigning directly to the original pointer leaks memory if realloc fails.
This is the pattern used by Redis (sds strings), SQLite, and many other C projects. C has no string type. It has char * and char[], which are arrays of bytes with a termination convention. Many real-world projects build a string struct (data + length + capacity) to fill this gap — storing the length alongside the data so you never need to scan for the null terminator.
Common Pitfalls
- Using strtok in multi-threaded code —
strtokuses internal static state. Two threads callingstrtoksimultaneously will corrupt each other. Usestrtok_r. - Forgetting that strtok modifies the input —
strtokreplaces delimiters with'\0'. If you need the original string, copy it first. - Using atoi for user input —
atoi("hello")returns 0 with no error indication.atoi("99999999999")overflows with no error indication. Always usestrtol. - Not checking realloc's return —
buffer = realloc(buffer, size)leaks the original buffer ifreallocreturnsNULL. Use a temporary variable. - Repeated strcat calls — each
strcatscans to the end of the destination string. Building a string with Nstrcatcalls is O(n^2) in the total length. Usesnprintfwith offset tracking instead. - Assuming ASCII — C strings are byte arrays. Non-ASCII text (UTF-8) uses multi-byte sequences.
strlencounts bytes, not characters.strchrfinds bytes, not code points. String functions work on bytes, and UTF-8 awareness requires additional logic.
Key Takeaways
snprintfwith offset tracking is the safest and most efficient way to build strings incrementally. Avoid repeatedstrcatcalls.strtokis simple but destructive, stateful, and not thread-safe. Usestrtok_rfor reentrant tokenization, or tokenize manually for full control.strstrfinds substrings.strchr/strrchrfind single characters. Both returnNULLon failure.strtolandstrtodare the only correct way to parse numbers from strings. They provide error reporting through the end pointer anderrno.- Dynamic strings use the
malloc/reallocdoubling pattern. Always checkrealloc's return value with a temporary pointer. - C has no string type — just byte arrays with a convention. Many real-world projects build a string struct (data + length + capacity) to fill this gap.