Prashant Shubham

Prashant Shubham

Computers are all I want to know about

26 Dec 2019

HashMap in Java


In part1 and part2 of this series, we looked upon Hashtable and hashcode() and equals() contract in Java. One might ask why just not use Hashtable instead of HashMap, it’s because as of Java1.2 this collection has been retrofitted to implement the Map interface and the Java docs suggest using HashMap if a thread-safe implementation is not needed and ConcurrentHashMap if a thread-safe implementation is needed.

Now coming to HashMap it implements Map interface and is part of the Java Collections framework. HashMap allows retrieval of values based on a given key, e.g: “key1”->value1, “key2”->value2. We first insert the values corresponding to a key and later we can retrieve them using the keys.

In short, HashMap is backed by an associative array data structure. During insertion of key and value, the hashcode of the key is calculated, and Entry (key+value) is inserted in the array (based on hashcode % array’s size). If more keys are added with the same hashcode, LinkedList is formed with previously added keys. A good hashCode() implementation is required so that the keys don’t get poorly distributed across the array. Please read about the equals() and hashcode() in part2 of this series.

To create a HashMap we import java.util.HashMap package. For initialization,Map<String, String> hash_map = new HashMap<>();

For insertion of keys, hash_map.put(“yourKey”, “yourValue”);

For retrieval of keys, `hash_map.get(“yourKey”); //Output — “yourValue”

Now coming to its internal implementation when we insert one key to the map it calculates the hashCode of the given key and maps it to an array bucket using hashCode() % size_of_array . If hashcode() implementation is poor many collisions will be there i.e multiple keys will be mapped to the same array bucket. When multiple hashCode() values end up in the same bucket, values are placed in an ad-hoc linked list. In the worst case, when all keys are mapped to the same bucket, thus degenerating hash map to the linked list - from O(1) to O(n) lookup time.

In Java 8, HashMap replaces the linked list with a binary tree when the number of elements in a bucket reaches a certain threshold i.e TREEIFY_THRESHOLD . If a bucket contains more than eight items, it should become a tree. Conversion of the list to binary tree allows handling worst-case scenarios from O(n) to O(logn) which is a significant improvement in terms of performance. HashMappromotes list into a binary tree, using hash code as a branching variable. If two hashes are different but ended up in the same bucket, one is considered bigger and goes to the right. If hashes are equal (as in our case), HashMap hopes that the keys are comparable, so that it can establish some order.

If entries are removed from the map, the number of entries in the bucket might reduce such that this tree structure is no longer necessary. The UNTREEIFY_THRESHOLD value allows for this conversion to list if elements number gets less than 6. One more factor which affects the conversion of list to tree is MIN_TREEIFY_CAPACITY ,this constant basically says not to start making buckets into trees if our hash map is very small — it should resize to be larger first instead.

Points to note:

If the initial capacity and load-factors are not provided, they default to 16 and 0.75 respectively. Please refer to part2 of this series for more info on these factors. If the initial capacity is reached, it is increased to the closest factor of 2. Best Case: O(1) Hashcode of all the keys are distinct, Worst case: O(n) before reaching the MIN_TREEIFY_CAPACITY and TREEIFY_THRESHOLD when lists are used. Worst case: O(logn) when list converted to tree. Unless there are programming mistakes or malicious code, hashcodes generally follow Poisson distribution, hence the probability of one bucket having more than 8 elements is extremely rare i.e 1 in 10 million. HashMap class uses Node as a subclass for holding key-value entries. Members of this class are: final int hash; final K key; V value; Node<K,V> next; The binary tree implementation is a Red-Black tree. A red-black tree is a sorted tree, in which search takes max log(n) operations. A linked list is converted to the Red-Black tree only if the list’s size exceeds the threshold (8) and array size is greater than the threshold (64). HashMap can store null key and null as values. By default, if a key is not present in the map and we try to retrieve it will return null as value. HashMap is not thread-safe for thread-safe implementation use ConcurrentHashMap. HashMap doesn’t store the elements in insertion order for maintaining insertion order one needs to use LinkedHashMap. And here ends the HashMap implementation, I highly recommend going through the source code which will allow you to see the implementation of this highly used data structure. One thing we can learn from this implementation is there are always trade-offs between performance and space as in this case using trees instead of linked-list gives us performance improvement over space trade-off. In the next part, we will look into various other map implementations like ConcurrentHashMap, LinkedHashMap till then, Ciao!!!

References: