Hash Table

Photo by Aedrian on Unsplash

Hash Table

Β·

11 min read

Hello friend πŸ‘‹πŸΎ. One of the best data structures is what I present to you. It’s an incredibly fast, efficient, and well-known data structure.

Ever heard of a dictionary? A dictionary stores words and their meanings. Each word is distinct and can be expressed as the key in the dictionary; while its meaning is expressed as the value of the key. In computer science, a dictionary is a key-value store. A hash table is a method to implement a dictionary.

Hash tables have been used in a variety of fields, from database systems to operating system caching. In this article, you'll learn:

  • What a hash table is.

  • Application of a hash table.

  • What a hash function is.

  • Hash table operations: Insert, Get, and Remove.

  • What a hash collision is.

  • Techniques to resolve hash collisions.

  • Resolving a hash collision using the "separate chaining" method.

Understanding linked lists and the C/C++ programming languages are prerequisites for reading this article. It is okay to continue reading this article even if you don't meet any of these requirements. You'll still be introduced to the concept of "hash tables."

What exactly is a hash table?

A hash table is a form of data structure that stores data associatively. A hash table β€” called a hash map β€” stores data as value and key pair.

A unique index is assigned to each value. These values are stored in an array, but the indexes are distinct in terms of the key associated with the value.

In school, every student has a locker. The lockers are kept somewhere while they are linearly aligned. Each locker is distinct for every student, and there is a key to access it. The student is the key in this situation, and the locker is an index in the array (the locker room). The locker in this situation will also be referred to as "Bucket".

The implementation of a hash table is called "hashing." Hashing is a technique used to perform insertions, deletions, and searches in constant time. To describe a hash table, we need to define the type of data it can store.

struct HashNode
{
    char *key;
    char *value;
};

// method to create a node
struct HashNode *initNode(char *key, char *value)
{
    struct HashNode *hNode;
    // a key cannot be empty.
    if (key == NULL || strlen(key) == 0 || value == NULL) return (NULL);
    hNode = malloc(sizeof(struct HashNode));
    if (hNode == NULL) return (NULL);
    hNode->key = strdup(key);
    hNode->value = strdup(value);
    return (hNode);
}

On average, it takes a time complexity of O(1) to search and insert values into a hash map. In my opinion, a hash table is the data structure with the fastest operations available.

The time complexity of a hash table can range from O(1) to O(N), depending on the technique used to resolve its collision.

Application

A hash table is used for various reasons, which include:

  • A lookup table for compiler operations.

  • A phonebook.

  • For password verification.

  • For graphics.

  • For caching.

  • For building high-performance databases

  • for implementing a set.

A hash table is a preferred data structure due to its efficiency, uniqueness, speed, and scalability.


Hash Functions

A hash function converts a given key into a unique ID for the hash table. It is responsible for mapping the key to its value.

A hash function is said to be nearly perfect in the sense that it rarely leads to a hash collision. For this article, I will be using the djb2 hashing algorithm.

/* djb2 hash function */
unsigned long djb2(unsigned char *str)
{
    unsigned long hash = 5381;
    int ch;
    while ((ch = *str++))
        hash = hash * 33 + ch;
    return (hash);
}

Now that we have our hash function, we can create the hash map data structure.

struct HashMap
{
    int size;
    struct HashNode **array;
};

struct HashMap *initHashMap(int size)
{
    struct HashMap *hMap;
    if (size <= 0) return (NULL);
    hMap = malloc(sizeof(struct HashMap));
    if (!hMap) return (NULL);
    hMap->size = size;
    hMap->array = (struct HashNode **)malloc(sizeof(struct HashNode *) * size);
    if (hMap->array == NULL)
    {
        free(hMap); return (NULL);
    }
    for (int i = 0; i < size; i++)
        hMap->array[i] = NULL;
    return (hMap);
}

We also need to write a wrapper function that'll make use of the "djb2" hashing algorithm.

unsigned long key_index(const char *key, unsigned long size)
{
    unsigned long hIndex  = djb2((const unsigned char *)key);
    return (hIndex % size);
}

Hash table operations: Insert, Get, and Delete.

It will be impossible to express the efficiency of a hash table without an example. In the article, I will create functions to insert, search, and delete a hash node from a hash table.

Insert.

Inserting a hash node into a hash table is simple. I will create a hash node and map its key to a distinct hash index.

A hash node must have a valid key. This means that the key cannot be empty. Though the hash node value can be empty or filled,

Inserting a hash node into a hash map is as quick as adding a value to the index of an array.

// given an array
#define ARRAY_SIZE 100
int arr_score[ARRAY_SIZE];
// the time complexity of the task below
// is O(1).
arr_score[5] = 67;

Same with a hash table β€” given the hash key indexβ€” we can insert a hash node in the hash table.

Here is how to do it:

bool insert(struct HashMap *hmap, const char *key, const char *value)
{
    if (!hmap) return false;
    // create the a hash node
    struct HashNode *node = initHashNode(key, value);
    if (!node) return false;
    unsigned long int hIndex = key_index(key, hmap->size);
    if (hmap->array[hIndex] == NULL){
        hmap->array[hIndex] = node;
        return true;
    }
    // if the key already exists I'll replace the value
    // by deleting it and duplicating the new value.
    if (strcmp(hmap->array[hIndex]->key, key) == 0)
    {
        free(node->key);   
        free(node->value);
        free(node);
        node = NULL;
        node = hmap->array[hIndex];
        free(node->value);
        node->value = strdup(value);
        return true;
    }
}

The insertion function creates a node and generates its hash index. It checks if the index is empty and adds the new hash node. However, if the index is not empty, the value will be replaced because the hash key is the same.

int main()
{
    struct HashMap *phonebook = initHashMap(1024);
    insert(phonebook, "mama", "09065345");
    insert(phonebook, "dada", "01566324");
    printMap(phonebook);
    // OMG!!!! I have to change mama's phone no. she gat a new one! 😲
    insert(phonebook, "mama", "08065354");
    printMap(phonebook);
}

Here is our function to print the hash table.

void printMap(struct HashMap *hmap)
{
    if (!hmap) return;
    printf("{\n");
    for (int i = 0; i < hmap->size; i++){
        struct HashNode *node = hmap->array[i];
        if (node)
            printf("\t\"%s\" : \"%s\"\n", node->key, node->value);
    }
    printf("}\n");
}

Get

Sometimes, we might want to get the value of a node in a hash table. Given the key, we can obtain the value from the map.

const char *value get(struct HashMap *map, const char *key)
{
    if (!map || !key) return (NULL);
    unsigned long hIndex = key_index(key, map->size);
    if (map->array[hIndex] == NULL) return (NULL);
    else
        return map->array[hIndex]->value;
}

Remove

Removing a node from a hash map is also simple. It requires the node key.

bool remove(struct HashMap *map, const char *key)
{
    if (!map || !key) return false;
    unsigned long hIndex = key_index(key, map->size);
    struct HashNode *hnode = map->array[hIndex];
    if (hnode){
        free(hnode->key);
        free(hnode->value);
        free(hnode);
        map->array[hIndex] = NULL;
    }
    return true;
}

You can find a working example here. It contains two or more programs, all describing the use of insert, get, and remove operations in a hash table.

Hash Collision

two bisons fighting head photo.

A hash collision occurs when two or more different keys result in the same hash index. A hash table will be incapable of correctly distinguishing between the hash keys, resulting in a problem.

"Hash collisions are no laughing matter, but sometimes you just have to hash it out!"

- ChatGPT

For example, this code indicates the keys β€” dram and vivency β€” map to the same hash key.

int main()
{
    char *str1 = "dram";
    char *str2 = "vivency";
    printf("hash index for %s is %ul\n", str1, key_index(str1, 1024));
    printf("hash index for %s is %ul\n", str2, key_index(str2, 1024));
    return (0);
}

There is no perfect hashing algorithm that can prevent a hash collision. The chances of a hash collision can only be minimized. Nevertheless, a collision is still likely to happen. A few techniques are considered to handle a colliding occurence.

There are various methods to resolve a hash collision. These include:

  • Separate Chaining

  • Double Hashing

  • Open Addressing

  • Linear Probing

  • Quadratic Probing

Each method has its pros and cons. For simplicity, I'll be using the "separate chaining" method.

Separate Chaining Technique

Separate chaining or open hashing is a method of resolving hash collisions. For this technique, each index in a hash table is treated as a separate linked list. We will add the new hash node to the list when there is a collision of keys. It can be added to the head of the list or the end of the list. This method allows for quick lookup, but it can decrease the time complexity of the hash table when traversing through the list. With just a little tweak to our previous codes, the collision will be resolved.

Handling Hash Collision

struct HashNode
{
    struct HashNode *next;
    char *key;
    char *value;  
};

// ... In initHashNode
{
    node->next = NULL;
}

Our hash node now works as a linked list node. Whenever a hash collision occurs, a new hash node is made and inserted into the head of the hash index.

A linked list is a data structure that arranges elements in linear order. Each element consists of a data part and a pointer to the next element in the list. We will use a linked list to chain each component of the same hash index (but with different hash keys) to each other.

A singly linked list whose nodes contain two fields: an integer value and a link to the next node

In the insert function, to find a value, I will have to traverse down the list and compare the key for each key till I find the current key.

bool insert(struct HashMap *hmap, const char *key, const char *value)
{
    if (!hmap) return false;
    // create a hash node
    struct HashNode *node = NULL;
    unsigned long int hIndex = key_index(key, hmap->size);
    if (hmap->array[hIndex] == NULL){
        node = initHashNode(key, value);
        if (!node) return false;
        hmap->array[hIndex] = node;
        return true;
    }
    // if the key already exists, I'll replace the value
    // by deleting it and duplicating the new value.
    node = hmap->array[hIndex];
    while (node)
    {
        if (strcmp(node->key, key) == 0){
            free(node->value);
            node->value = strdup(node->value);
            return true;
        }
        node = node->next;
    }
    // at this point, the node for the hash key does not exist.
    // though, the hash key exists.
    // I'll add the new node to the head.
    node = initHashNode(key, value);
    if (!node) return false;
    node->next = hmap->array[hIndex];
    hmap->array[hIndex]->next = node;
    return true;
}I

I added a few lines of code to the function. If the hash key outcome is valid, I will traverse the list to the hash key while comparing the keys; if any already exists, I will replace its value. However, if the key is not found in any of the nodes, I will have to create a node and add it to the head of the list at the hash index.

Handling a hash collision itself is not that bad. Remember that a collision occurs when two different hash keys map to the same hash bucket. We treated each index as a linked list and then kept inserting different hash keys at the head of the list. This might not be the most efficient and best way to handle a collision; However, it encourages simplicity and is maintainable.

The function for getting and removing a hash node from a hash table needs to be revamped. You can have this as your task. Here is what we will do.

// get(...)
// ...
node = map->array[hIndex];
while (node)
{
    if (strcmp(node->key, key) == 0)
        return node->value;
    node = node->next;
}
return (NULL); // value not found.

The revamped code:

//...remove(...)
// if the node is at the beginning of the list
if (hnode && strcmp(hnode->key, key) == 0)
{
    next = hnode->next;
    free(hnode->key);
    free(hnode->value);
    free(hnode);
    map->array[hIndex] = next;
    return true;
}
// else traverse through the list and remove it.
next = hnode->next;
while (next)
{
    if (strcmp(next->key, key) == 0)
    {
        hnode->next = next->next;
        free(next->key);
        free(next->value);
        free(next);
        return true;
    }
    hnode = next;
    next = next->next;
}
return false;

Here is the function to print the map.

void printMap(struct HashMap *hmap)
{
    if (!hmap) return;
    printf("{\n");
    for (int i = 0; i < hmap->size; i++){
        struct HashNode *node = hmap->array[i];
        while (node)
        {
            printf("\t\"%s\" : \"%s\"\n", node->key, node->value);
            node = node->next;
        }
    }
    printf("}\n");
}

We need to clear our hash map after use. Here is a function to do that.

void clear(HashMap *map)
{
    if (!map)   return;
    HashNode *node = NULL;
    HashNode *next = NULL;

    for (unsigned long int i = 0; i < map->size; i++)
    {
        node = map->array[i];
        while (node)
        {
            next = node->next;
            free(node->value);
            free(node->key);
            free(node);
            node = NULL;
            node = next;
        }
    }

    free(map->array);
    free(map);
    map = (HashMap *)0;
}

In this article, I hope I have been able to introduce and highlight what a hash table is and its usefulness. Confidently, you can now implement a hash table yourself. If you require some example codes, you can find them here.

The repository consists of two branches: the main branch, which contains the code for a hash table without collision resolution, and the collision branch, which includes a collision resolution implementation. Be sure to check them out, and throw me a starπŸ’›.

Conclusion

Hashing has been applied in many fields like databases, compilers, cryptography, etc. Hopefully, I have been able to impart knowledge about this data structure to you.

Thanks for reading this article. If you find it intriguing, you can drop some questions, suggestions, or contributions in the comment section. You can follow me and subscribe to my newsletter if you enjoyed my article and want to read more.

Happy 😊 Learning πŸ’ͺ.

Β