4 min read
On this page

String Manipulation Patterns

C has no string type — just arrays of bytes with a null terminator convention. This means there is no built-in concatenation operator, no string interpolation, no split method, no regex. You build everything from a small set of standard library functions and manual buffer management. These patterns appear in every C codebase that handles text.

Building Strings with snprintf

snprintf is the most important string function in C. It formats data into a buffer with bounds checking and always null-terminates (when the buffer size is at least 1).

#include <stdio.h>

int main(void) {
    char buffer[256];

    // Simple formatting
    snprintf(buffer, sizeof(buffer), "User %s has %d points", "alice", 1500);
    printf("%s\n", buffer);

    // Building a path
    const char *dir = "/var/log";
    const char *file = "app.log";
    snprintf(buffer, sizeof(buffer), "%s/%s", dir, file);
    printf("%s\n", buffer);

    // Formatting with precision
    double price = 19.99;
    snprintf(buffer, sizeof(buffer), "Total: $%.2f", price);
    printf("%s\n", buffer);

    return 0;
}
User alice has 1500 points
/var/log/app.log
Total: $19.99

Appending to a String with snprintf

Build a string incrementally by tracking the write position:

#include <stdio.h>

int main(void) {
    char buffer[256];
    int offset = 0;
    int remaining = sizeof(buffer);

    const char *items[] = {"apple", "banana", "cherry"};
    int prices[] = {120, 85, 200};
    int count = 3;

    for (int i = 0; i < count; i++) {
        int written = snprintf(buffer + offset, remaining,
                               "%s: $%d.%02d\n",
                               items[i], prices[i] / 100, prices[i] % 100);
        if (written < 0 || written >= remaining) {
            fprintf(stderr, "buffer full\n");
            break;
        }
        offset += written;
        remaining -= written;
    }

    printf("%s", buffer);

    return 0;
}
apple: $1.20
banana: $0.85
cherry: $2.00

Track offset (where to write next) and remaining (how much space is left). This pattern avoids repeated calls to strcat, which must scan the entire string each time.

Tokenizing Strings

strtok — Stateful Tokenizer

strtok splits a string by a set of delimiter characters. It modifies the original string and uses internal state, making it not thread-safe and unsuitable for nested tokenization.

#include <stdio.h>
#include <string.h>

int main(void) {
    char csv[] = "alice,42,engineer,seattle";

    // First call: pass the string
    char *token = strtok(csv, ",");
    while (token != NULL) {
        printf("field: [%s]\n", token);
        // Subsequent calls: pass NULL to continue with the same string
        token = strtok(NULL, ",");
    }

    // WARNING: csv is now modified — commas replaced with '\0'
    printf("original string is destroyed: [%s]\n", csv);

    return 0;
}
field: [alice]
field: [42]
field: [engineer]
field: [seattle]
original string is destroyed: [alice]

strtok replaces each delimiter with '\0' and returns a pointer into the original string. The original string is destroyed. You cannot tokenize two strings simultaneously with strtok because it uses a single internal pointer.

strtok_r — Reentrant Version

strtok_r is the thread-safe and nestable version. It stores its state in a user-provided pointer instead of internal state:

#include <stdio.h>
#include <string.h>

int main(void) {
    char input[] = "name=alice;age=30;city=seattle";
    char *saveptr1;

    char *pair = strtok_r(input, ";", &saveptr1);
    while (pair != NULL) {
        // Nested tokenization: split each pair on '='
        char *saveptr2;
        char *key = strtok_r(pair, "=", &saveptr2);
        char *value = strtok_r(NULL, "=", &saveptr2);

        if (key != NULL && value != NULL) {
            printf("%s -> %s\n", key, value);
        }

        pair = strtok_r(NULL, ";", &saveptr1);
    }

    return 0;
}
name -> alice
age -> 30
city -> seattle

Searching Strings

strstr — Find a Substring

#include <stdio.h>
#include <string.h>

int main(void) {
    const char *haystack = "The quick brown fox jumps over the lazy dog";

    const char *found = strstr(haystack, "brown fox");
    if (found != NULL) {
        printf("found at offset %td: \"%s\"\n",
               found - haystack, found);
    }

    // Check if a string contains a substring
    if (strstr(haystack, "cat") == NULL) {
        printf("no cat found\n");
    }

    return 0;
}
found at offset 10: "brown fox jumps over the lazy dog"
no cat found

strstr returns a pointer to the first occurrence of the substring, or NULL if not found. It performs a linear scan.

strchr & strrchr — Find a Character

strchr finds the first occurrence of a character. strrchr finds the last. Both return NULL if the character is not found.

const char *path = "/usr/local/bin/gcc";
const char *last_slash = strrchr(path, '/');
if (last_slash != NULL) {
    printf("filename: %s\n", last_slash + 1);   // "gcc"
}

A common pattern for counting occurrences: loop with strchr, advancing past each match.

Parsing Numbers from Strings

strtol — String to Long Integer

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

int parse_int(const char *str, int *out) {
    char *end;
    errno = 0;
    long value = strtol(str, &end, 10);

    if (end == str) {
        fprintf(stderr, "no digits found in \"%s\"\n", str);
        return -1;
    }
    if (*end != '\0') {
        fprintf(stderr, "trailing characters in \"%s\": \"%s\"\n", str, end);
        return -1;
    }
    if (errno == ERANGE) {
        fprintf(stderr, "out of range: \"%s\"\n", str);
        return -1;
    }

    *out = (int)value;
    return 0;
}

int main(void) {
    int value;

    if (parse_int("12345", &value) == 0) {
        printf("parsed: %d\n", value);
    }

    parse_int("12abc", &value);    // trailing characters
    parse_int("hello", &value);    // no digits
    parse_int("", &value);         // no digits

    return 0;
}
parsed: 12345
trailing characters in "12abc": "abc"
no digits found in "hello"
no digits found in ""

strtol is superior to atoi in every way:

  • It reports where parsing stopped (via end)
  • It reports overflow (via errno)
  • It supports arbitrary bases (10 for decimal, 16 for hex, 0 for auto-detect)

strtod works the same way for parsing doubles. Both are superior to atoi/atof because they report errors through the end pointer and errno.

Dynamic Strings: malloc & realloc

When you do not know the final string size in advance, grow a buffer dynamically:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct {
    char *data;
    size_t length;
    size_t capacity;
} String;

int string_append(String *s, const char *text) {
    size_t text_len = strlen(text);
    if (s->length + text_len + 1 > s->capacity) {
        size_t new_cap = (s->length + text_len + 1) * 2;
        char *tmp = realloc(s->data, new_cap);
        if (tmp == NULL) return -1;   // original buffer preserved
        s->data = tmp;
        s->capacity = new_cap;
    }
    memcpy(s->data + s->length, text, text_len + 1);
    s->length += text_len;
    return 0;
}

int main(void) {
    String s = { .data = malloc(16), .length = 0, .capacity = 16 };
    if (s.data == NULL) return 1;
    s.data[0] = '\0';

    string_append(&s, "Hello");
    string_append(&s, ", ");
    string_append(&s, "world!");

    printf("%s (len=%zu, cap=%zu)\n", s.data, s.length, s.capacity);
    free(s.data);
    return 0;
}
Hello, world! (len=13, cap=16)

The pattern: start with a small buffer, double its capacity when it fills up. This gives amortized O(1) cost per character appended. Always check realloc's return value with a temporary pointer — assigning directly to the original pointer leaks memory if realloc fails.

This is the pattern used by Redis (sds strings), SQLite, and many other C projects. C has no string type. It has char * and char[], which are arrays of bytes with a termination convention. Many real-world projects build a string struct (data + length + capacity) to fill this gap — storing the length alongside the data so you never need to scan for the null terminator.

Common Pitfalls

  • Using strtok in multi-threaded codestrtok uses internal static state. Two threads calling strtok simultaneously will corrupt each other. Use strtok_r.
  • Forgetting that strtok modifies the inputstrtok replaces delimiters with '\0'. If you need the original string, copy it first.
  • Using atoi for user inputatoi("hello") returns 0 with no error indication. atoi("99999999999") overflows with no error indication. Always use strtol.
  • Not checking realloc's returnbuffer = realloc(buffer, size) leaks the original buffer if realloc returns NULL. Use a temporary variable.
  • Repeated strcat calls — each strcat scans to the end of the destination string. Building a string with N strcat calls is O(n^2) in the total length. Use snprintf with offset tracking instead.
  • Assuming ASCII — C strings are byte arrays. Non-ASCII text (UTF-8) uses multi-byte sequences. strlen counts bytes, not characters. strchr finds bytes, not code points. String functions work on bytes, and UTF-8 awareness requires additional logic.

Key Takeaways

  • snprintf with offset tracking is the safest and most efficient way to build strings incrementally. Avoid repeated strcat calls.
  • strtok is simple but destructive, stateful, and not thread-safe. Use strtok_r for reentrant tokenization, or tokenize manually for full control.
  • strstr finds substrings. strchr/strrchr find single characters. Both return NULL on failure.
  • strtol and strtod are the only correct way to parse numbers from strings. They provide error reporting through the end pointer and errno.
  • Dynamic strings use the malloc/realloc doubling pattern. Always check realloc's return value with a temporary pointer.
  • C has no string type — just byte arrays with a convention. Many real-world projects build a string struct (data + length + capacity) to fill this gap.