Prashant Shubham

Prashant Shubham

Computers are all I want to know about

12 Oct 2019

Hashtable in Java

heimdall.lan

heimdall.lan [HashPotato]

Let’s take an instance, you are asked to keep track of list of students in your school along with their phone number. What is the first approach that comes to your mind? Hmmm…

Let’s take a look at few approaches:

  1. We’ll go to each student and ask their phone number and keep writing them in our ledger as we come across them.

This was easy if we have 10 students, when we want to look it up we just check in the ledger for a said student and we get the number. But what if we get 100 students, it will get tough for us to look up his or her number as we wrote down the numbers in random order.

  1. Okay, let’s think of another approach, this time we learnt from our mistake and now we will write them in sorted order so that we can track them easily, as students with name starting with A will be on one page and B will be together on one page and so on…

This makes it easy for us to search for names now as when asked for Alex’s phone number we know it will reside on 1st page. But what if there are too many people with their initials with A, there will again be too much entries on one page and we will have to again look manually traversing the whole page. 🙄

Of course, we can go ahead and sort the page with A initials but that again is time consuming. If we need to the same task via computer programs we need some data structure like map to store these Keys(Student name) and Values(Phone number). These pages in our ledger earlier will correspond to the index in map.

In Java there is no class named Map like in other languages like C++ where map is part of STL and implemented as class, Map is provided as an interface in Java which will be implemented by the classes. List of classes implementing Map interface can be found here. Here we will be talking about one of these classes i.e HashTable.

As per Wikipedia,

hash table is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found.

                   # bucket content/linked list
[bucket] --> ["KEY" | "VALUE" | (HASH_VALUE) --> (Pointer to next linked list in case of collision)] --> [Next entry]

The values for a given key can be stored within bucket/index address or it can be stored with its own data structure, while bucket simply points to the data structure as in above case.

So in summary Hashtable is implementation of map interface in java and is used to store key, value pairs.

  1. How do we create a Hashtable?
// Creating hashtable
Hashtable<String, String> phoneRecord = new Hashtable<String, String>();
phoneRecord.put("Alex", "0123456789"); // Inserting Alex's phone number in table

When we create the Hashtable by default its created with a default initial capacity (11) and load factor (0.75). However, Java does provide us with constructors using which we can specify the above values.

Hashtable(int initialCapacity) // default load factor(0.75)
Hashtable(int initialCapacity, float loadFactor)

As name suggest initialCapacity refers to the size of Hashtable at creation. We will discuss about load factor later in this article.

  1. How do we retrieve values from Hashtable?
// Retrieving the number
String phoneNumber = phoneRecord.get("Alex");
if (phoneNumber != null) {
    System.out.println("Alex's number = " + phoneNumber);
}

Here, you will have observed we are checking for the null value for phone number after retrieving it. Because it might be we haven’t inserted Alex’s phone number in the record yet so when we retrieve it we might get null i.e nothing.

Since we have dealt with the basic functionalities let’s look into the implementation details and facts about HashTable.

For each element we insert we calculate its unique address, its index in the array and we put it to that “address”. Then once we need to retrieve it, we again calculate its index and just return it with a single operation (something like return array[index]). The process of calculating element’s unique address is called hashing and the algorithm how it is done is called a hash function.

Now while calculating hash code followed by it’s index address for a given key can result in same value. This is the reason a bad hash function can lead to a slower lookup time. HashTable has O(1) time complexity, but key search can be a O(n) procedure, where n depends on the number of hash collisions. In case of collision hash table deal with them in one of the following ways:

  1. By having each index address contain a linked list of elements that are hashed to that index address. This is also known as chaining. Now chaining can also be of multiple types like: Separate chaining with linked lists, Separate chaining with list head cells etc.
  2. Another method is Linear probing, technique works on the concept of incrementing until you find an empty index address and place the value at that address.
  3. Double hashing technique where we use two different hashing functions h1(key) and h2(key). If the index address at h1(key) is occupied then the second hashing function h2(key) is used to increment the index and find next available.

Both Linear probing and Double hashing methods belong to open addressing technique of collision resolution. There are methods as well like Coalesced hashing, Robin Hood hashing etc.

Remember earlier we talked about initial capacity and load factor values while creating a Hashtable. What happens if the Hashtable starts getting filled with keys and values here loadfactor comes into picture. The default load factor which by default is 0.75 depicts that when the Hashtable 75% indexes get filled it automatically grows (usually, double) the size of the table, thus forcing to rehash all entries. This rehashing is also known as Dynamic resizing.

As part of dynamic resizing, there are multiple techniques like: Resizing by copying all entries, Incremental resizing in which we allocate new hash table, but keep the old table unchanged and then in each lookup or insert operation we do move from old table to new table.

Now that we have looked into Hashtable implementation, let’s list some of the quirks:

  1. Hashtable is one of the oldest implementations in Java along with array, vector, this class was retrofitted to implement the Map interface, making it a member of the Java Collections Framework.
  2. Hashtable can’t have null values as key.
  3. Hashtable is synchronized thus making it thread safe, it is advised that if a thread-safe is not required HashMap shall be used which we will talk about in later article.
  4. Hashtable guarantees that the order of the map will remain constant over time.
  5. Hashtable search complexity can be O(1) average case and O(n) in worst case as described above.
  6. If the Hashtable uses dynamic resizing, an insertion or deletion operation may occasionally take time proportional to the number of entries.
  7. Hash tables in general exhibit poor locality of reference, data is stored randomly in memory which can cause misses, arrays provide a better option in cases where we need to the linear search if table is relatively small.

Now, since we have discussed a lot about Hashtable there is one thing to note here that it’s regarded as legacy code in Java and java docs itself suggests to use either HashMap or ConcurrentHashMap(in case synchronization is needed).

You can ask why did we read Hashtable then, its cause it resembles quite a lot with HashMap which we will be seeing in next article. There are many things which we will looking in next article along with HashMap like hashcode() and equals() contract, differences between HashMap and Hashtable. Stay tuned…

Resources: