Hash Map (chaining)

demo

ref –
http://www.algolist.net/Data_structures/Hash_table/Chaining
http://stackoverflow.com/questions/9214353/hash-table-runtime-complexity-insert-search-and-delete

Hash Map with Chaining

The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.

When the number of entries in the hash table exceeds the product of the load factor and the current capacity,
the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has
approximately twice the number of buckets.

The Node

.h file

.cpp file

The Hash Map

HashTable.hpp

The way we go about creating the Hash Table is by using an array of pointers. Each pointer at array[i] will point to a node. That node can link to another node. This is so that we can have an array of linked list. Which by definition, is what constitutes a chained hash map.

Thus, we use double pointers to point to this array of pointers: LinkedHashEntry ** table.

*table (or table[0]) dereferences the 1st array element.
*(table+1) (or table[1]) dereferences the 2nd array element.
..and so on

In order to allocate memory, we allocate this array of pointers on the heap.

Think of it like this:

We want to allocate on the heap (new)
an array of certain size ([MAX])
of LinkedHashEntry pointers (LinkedHashEntry *):

LinkedHashEntry ** table = new LinkedHashEntry * [MAX]

By definition, we always have a pointer pointing to the address of the first element of the array.

In every array, we always assign an array anchor (a pointer) to the 1st element of the array.
For example, an array of integers, int * i = new int [8];

int * i points to the first integer element’s address.
*i will dereference and get the int value.

In our case, since our first element is a pointer, we need a pointer to a pointer to act as the array anchor, and that is
why we use LinkedHashEntry ** table.

Now, we have an array of LinkedHashEntry pointers, and we are set to work.

Insertion

While certain operations for a given algorithm may have a significant cost in resources, other operations may not be as costly.

Amortized analysis considers both the costly and less costly operations together over the whole series of operations of the algorithm. This may include accounting for different types of input, length of the input, and other factors that affect its performance

insertion O(1) average and amortized case complexity, however it suffers from O(n) worst case time complexity.

BUT WHY?
Hash tables suffer from O(n) worst time complexity due to two reasons:

If too many elements were hashed into the same key. Thus, it degrades into a linked list
for that particular slot. Searching through this list may take O(n) time. see 2), where it keeps appending elements at the same slot.

Once a hash table has passed its load balance, it has to rehash [create a new bigger table, and re-insert each element to the table].

The reason why it is O(1) average and amortized case because:

  • It is very rare that many items will be hashed to the same key
  • if you chose a good hash function and you don’t have too big load balance.
  • The rehash operation, which is O(n), can at most happen after n/2 ops
  • which are all assumed O(1): Thus when you sum the average time per op, you get : (n*O(1) + O(n)) / n) = O(1)

Deletion

Deletion has the same concept as Insertion. Amortized case O(1), worst time O(n).

deleteNode is just a simple function that sets a LinkedHashEntry pointer to NULL and wipes out everything.

recurseAndRemove uses recursion to run through a list of LinkedHashEntry nodes, and remove the node with the specified key.

The list of nodes is given by LinkedHashEntry ** temp. Pointer of a pointer means that the double pointer points to the the address of a pointer. That pointer points to the address of the node.

hashentry_dblptr

Hence, if we want to reassign the pointer at this->table[i] to another node, it means:

this->table[i] is a pointer, thus we have to use a double pointer (HashEntry ** temp) to point to this->table[i]. Deference it like this (*temp) to control the pointer this->table[i] itself….and assign it to another node as shown in the image.

If we simply have a pointer like HashEntry * singlePtr = this->table[i], it won’t work because singlePtr will be pointing to whatever this->table[i] is pointing to. singlePtr is pushed onto the local stack, and has nothing to do with the pointer that is this->table[i].

singlePtr simply points to whatever this->table[i] points to. singlePtr is pushed onto the local stack

Once you understand this, the rest is just simple recursion and node traversal.

There are 2 cases:

  • You need to remove the first node. This means in order to get control of the first pointer, you need to get the address of the initial pointer, dereference it (at this point, you have full control of the pointer) and point it
  • You need to remove 2nd..and after node. This is simple because dereferencing the temp pointer gives you full control of the next pointer.