Custom HashSet Implementation in Java

A hash set is a data structure offers constant time performance for the basic operations (addremovecontains and size), assuming the hash function disperses the elements properly among the buckets. There are a number of ways of implementing this data structure. This post is about the simple implementation of hashmap in Java using an array of a linked list.

So, lets first define a class representing a node of a linked list as:

class Node<T> {
    T data;
    Node next;

    Node(T data) {
        this.data = data;
        this.next = null;
    }
    // getters, equals, hashCode and toString
}

Inserting Element

To insert an element, a key and value, we do the following:

  1. First, compute the element's hash code, which will usually be an int. The two different objects could have the same hash code, as there may be an infinite number of elements and a finite number of ints.
  2. Then, calculate the index in the array using hash code using modulo as hashCode (key) % array_length. Here, two different hash codes could map to the same index.
  3. Get the linked list at this index calculated above. Store the element in this index. The use of a linked list is important because of collisions: you could have two different keys with the same hash code or two different hash codes that map to the same index.

The picture below shows explains this.

hashmap-example

Source: Cracking the Coding Interview, 6th edition

This can be implemented as:

public class MyHashSet<T> {

    private static final Integer INITIAL_CAPACITY = 1 << 4; // 16

    private Node<T>[] buckets;

    private int size;

    public MyHashSet(final int capacity) {
        this.buckets = new Node[capacity];
        this.size = 0;
    }

    public MyHashSet() {
        this(INITIAL_CAPACITY);
    }

    public void add(T t) {
        int index = hashCode(t) % buckets.length;

        Node bucket = buckets[index];

        Node<T> newNode = new Node<>(t);

        if (bucket == null) {
            buckets[index] = newNode;
            size++;
            return;
        }

        while (bucket.next != null) {
            if (bucket.data.equals(t)) {
                return;
            }
            bucket = bucket.next;
        }

        // add only if last element doesn't have the value being added
        if (!bucket.data.equals(t)) {
            bucket.next = newNode;
            size++;
        }
    }
    // . . .
}

 

Removing Element

The removal of the element from the hash set can be done with the following steps:

  1. Compute the hash code from the element, and then compute the index from the hash code with modulo operation.
  2. Then, get the linked list at index computed above and search through the linked list for the value with this value.
  3. Remove the node by changing the next reference of the previous element.

The implementation can be as simple as below:

public T remove(T t) {
    int index = hashCode(t) % buckets.length;

    Node bucket = buckets[index];

    if (bucket == null) {
        throw new NoSuchElementException("No Element Found");
    }

    if (bucket.data.equals(t)) {
        buckets[index] = bucket.next;
        size--;
        return t;
    }

    Node prev = bucket;

    while (bucket != null) {
        if (bucket.data.equals(t)) {
            prev.next = bucket.next;
            size--;
            return t;
        }
        prev = bucket;
        bucket = bucket.next;
    }
    return null;
}

Testing

The custom hash set implemented above can be tested easily as:

@Test
public void testMyHashSet() {
    MyHashSet<Integer> set = new MyHashSet<>();

    set.add(3);
    set.add(4);
    set.add(8);
    set.add(4);
    set.add(8);
    set.add(1000);

    assertEquals(4, set.size());

    assertEquals(Integer.valueOf(8), set.remove(8));
    assertEquals(3, set.size());
}

 

Time complexity

Since different keys can be mapped to the same index, there is a chance of collision. If the number of collisions is very high, the worst case runtime is O(N), where N is the number of keys.
However, we generally assume a good implementation that keeps collisions to a minimum, in which case the lookup time is O(1).

 

Conclusion

This post illustrated how the hash set can be implemented with an array-based linked list. You can take a look more examples on Cracking the Coding Interview by Gayle Laakmann McDowell.

The source code for all example presented above is available on GitHub.

Java HashMap Implementation in a nutshell