Hash Tables
A hash table maps keys to values with O(1) average-case lookup, insertion, and deletion. It is the most practical data structure in C programming. Every time you need fast key-value storage — symbol tables, caches, deduplication, counting — a hash table is usually the answer. C does not provide one in the standard library, so you will implement them yourself.
How Hash Tables Work
A hash table has three components: an array of buckets, a hash function, and a collision resolution strategy.
- The hash function converts a key into an integer
- The integer is reduced to a bucket index (typically with modulo)
- The value is stored at that bucket
- When two keys hash to the same bucket (a collision), the collision strategy resolves it
size_t index = hash(key) % num_buckets;
Hash Functions
A good hash function distributes keys uniformly across buckets with minimal clustering.
djb2
size_t djb2(const char *str) {
size_t hash = 5381;
int c;
while ((c = (unsigned char)*str++)) {
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
}
return hash;
}
djb2 is simple, fast, and produces reasonable distribution for string keys. It is a good default when you need a quick hash function.
FNV-1a
#include <stdint.h>
#include <stddef.h>
size_t fnv1a(const char *str) {
size_t hash = 14695981039346656037ULL; /* FNV offset basis */
while (*str) {
hash ^= (unsigned char)*str++;
hash *= 1099511628211ULL; /* FNV prime */
}
return hash;
}
FNV-1a has better avalanche properties than djb2 — small changes in input produce large changes in output. It is widely used in hash table implementations.
Hashing Non-String Keys
For integer keys, a simple multiplication-based hash works well:
size_t hash_int(int key) {
size_t x = (size_t)key;
x = ((x >> 16) ^ x) * 0x45d9f3b;
x = ((x >> 16) ^ x) * 0x45d9f3b;
x = (x >> 16) ^ x;
return x;
}
Collision Handling: Chaining
Chaining stores colliding entries in a linked list at each bucket.
struct Entry {
char *key;
int value;
struct Entry *next;
};
struct HashMap {
struct Entry **buckets;
size_t num_buckets;
size_t size;
};
struct HashMap *hashmap_create(size_t num_buckets) {
struct HashMap *map = malloc(sizeof(struct HashMap));
if (map == NULL) return NULL;
map->buckets = calloc(num_buckets, sizeof(struct Entry *));
if (map->buckets == NULL) {
free(map);
return NULL;
}
map->num_buckets = num_buckets;
map->size = 0;
return map;
}
Insert
#include <string.h>
void hashmap_put(struct HashMap *map, const char *key, int value) {
size_t index = fnv1a(key) % map->num_buckets;
/* Check if key already exists */
struct Entry *current = map->buckets[index];
while (current != NULL) {
if (strcmp(current->key, key) == 0) {
current->value = value; /* Update existing */
return;
}
current = current->next;
}
/* Insert new entry at head of chain */
struct Entry *entry = malloc(sizeof(struct Entry));
if (entry == NULL) return;
entry->key = strdup(key);
entry->value = value;
entry->next = map->buckets[index];
map->buckets[index] = entry;
map->size++;
}
Lookup
int hashmap_get(struct HashMap *map, const char *key, int *out_value) {
size_t index = fnv1a(key) % map->num_buckets;
struct Entry *current = map->buckets[index];
while (current != NULL) {
if (strcmp(current->key, key) == 0) {
*out_value = current->value;
return 1; /* Found */
}
current = current->next;
}
return 0; /* Not found */
}
Delete
int hashmap_delete(struct HashMap *map, const char *key) {
size_t index = fnv1a(key) % map->num_buckets;
struct Entry *current = map->buckets[index];
struct Entry *prev = NULL;
while (current != NULL) {
if (strcmp(current->key, key) == 0) {
if (prev == NULL) {
map->buckets[index] = current->next;
} else {
prev->next = current->next;
}
free(current->key);
free(current);
map->size--;
return 1;
}
prev = current;
current = current->next;
}
return 0;
}
Cleanup
void hashmap_destroy(struct HashMap *map) {
for (size_t i = 0; i < map->num_buckets; i++) {
struct Entry *current = map->buckets[i];
while (current != NULL) {
struct Entry *next = current->next;
free(current->key);
free(current);
current = next;
}
}
free(map->buckets);
free(map);
}
Collision Handling: Open Addressing
Open addressing stores all entries directly in the bucket array. When a collision occurs, it probes subsequent buckets until an empty one is found.
Linear Probing
struct OAEntry {
char *key;
int value;
int occupied;
int deleted; /* Tombstone marker */
};
struct OAHashMap {
struct OAEntry *entries;
size_t capacity;
size_t size;
};
void oa_put(struct OAHashMap *map, const char *key, int value) {
size_t index = fnv1a(key) % map->capacity;
for (size_t i = 0; i < map->capacity; i++) {
size_t probe = (index + i) % map->capacity;
struct OAEntry *entry = &map->entries[probe];
if (!entry->occupied || entry->deleted) {
entry->key = strdup(key);
entry->value = value;
entry->occupied = 1;
entry->deleted = 0;
map->size++;
return;
}
if (strcmp(entry->key, key) == 0) {
entry->value = value; /* Update */
return;
}
}
/* Table is full - should have resized before this */
}
Open addressing has better cache performance than chaining because entries are stored contiguously in memory. However, deletion requires tombstone markers, and clustering can degrade performance as the table fills.
Load Factor & Resizing
The load factor is the ratio of entries to buckets: size / num_buckets. As the load factor increases, collision chains grow longer and performance degrades.
- Chaining: resize when load factor exceeds 0.75
- Open addressing: resize when load factor exceeds 0.5-0.7
void hashmap_resize(struct HashMap *map, size_t new_capacity) {
struct Entry **new_buckets = calloc(new_capacity, sizeof(struct Entry *));
if (new_buckets == NULL) return;
/* Rehash all existing entries */
for (size_t i = 0; i < map->num_buckets; i++) {
struct Entry *current = map->buckets[i];
while (current != NULL) {
struct Entry *next = current->next;
size_t new_index = fnv1a(current->key) % new_capacity;
current->next = new_buckets[new_index];
new_buckets[new_index] = current;
current = next;
}
}
free(map->buckets);
map->buckets = new_buckets;
map->num_buckets = new_capacity;
}
Resizing typically doubles the capacity. Every entry must be rehashed because the bucket index depends on the number of buckets. This makes individual insertions occasionally O(n), but amortized insertion remains O(1).
Complexity Analysis
| Operation | Average Case | Worst Case |
|---|---|---|
| Insert | O(1) | O(n) |
| Lookup | O(1) | O(n) |
| Delete | O(1) | O(n) |
The worst case occurs when all keys hash to the same bucket, creating a single linked list of length n. A good hash function and proper resizing keep the average case at O(1).
Complete Usage Example
#include <stdio.h>
int main(void) {
struct HashMap *word_count = hashmap_create(64);
hashmap_put(word_count, "hello", 1);
hashmap_put(word_count, "world", 2);
hashmap_put(word_count, "hello", 3); /* Updates existing */
int value;
if (hashmap_get(word_count, "hello", &value)) {
printf("hello: %d\n", value);
}
if (hashmap_get(word_count, "missing", &value)) {
printf("found missing\n");
} else {
printf("missing: not found\n");
}
hashmap_delete(word_count, "world");
hashmap_destroy(word_count);
return 0;
}
hello: 3
missing: not found
Common Pitfalls
- Poor hash function — A hash function that produces clustered output causes many collisions and degrades O(1) to O(n). Test your hash function's distribution with real data.
- Not resizing — A hash table that never resizes becomes a linked list as the load factor approaches 1.0. Check the load factor on every insert and resize when it exceeds the threshold.
- Forgetting to copy keys — If you store a pointer to the caller's string and the caller modifies or frees it, your hash table keys become garbage. Use
strdupto copy keys. - Memory leaks on delete — Deleting an entry must free both the key (if copied with
strdup) and the entry struct. Missing either leaks memory. - Using modulo with signed integers —
hash % num_bucketscan produce negative values ifhashis signed. Usesize_t(unsigned) for hash values and bucket counts. - Not handling the full table — Open addressing without a resize check loops forever when the table is full. Always resize before the table reaches critical load.
Key Takeaways
- Hash tables provide O(1) average-case lookup, insertion, and deletion. They are the most useful data structure for key-value storage in C.
- A hash function maps keys to bucket indices. FNV-1a and djb2 are good defaults for string keys.
- Chaining resolves collisions with linked lists per bucket. Open addressing stores entries in the bucket array and probes for empty slots.
- The load factor determines when to resize. Resize at 0.75 for chaining and 0.5-0.7 for open addressing.
- Resizing rehashes all entries. Individual insertions are amortized O(1).
- Always copy keys with
strdup, free them on deletion, and use unsigned types for hash values and indices. - Hash tables are O(1) average but O(n) worst case. A good hash function and proper resizing keep the worst case from occurring in practice.