Diving Into Hashtables and Dictionaries

time to read 4 min | 758 words

 

Most programmers had to implement a Hashtable or a map at some point (usually at school). Hopefully, they never do it again, but gain valuable insight about how the collection works.

In .Net, there are two methods that each object implements which are used with hash tables. Those methods are GetHashCode() and Equals(). When you create your own classes, you get the default behavior, which is to check for object reference equality (to the C/C++ guys, that is pointer equality).

The problem starts when you want to provide a different semantic for  Equals. For instance, if you have a range, and you want to say that if both items are the same, they are equals. So you override Equals(), make the appropriate check, and you are ready to go. [1]

First, a Hashtable is composed of buckets, like the ones of the left. A bucket has key and value, although in some cases, it has several key / value pairs.

Well, the Hashtable calls to the key's GetHashCode() method, divide the resulting value with the current capacity of the table. It then takes the remainder and check if there is already a value in the bucket.  If there is, it will double its capacity and try again[2].  Usually if there is a collision (two different keys with the same hash code), both of them will be put into the same bucket.

How does it pull the values out, then? Well, it is actually a lot simpler than that. You give the Hashtable a key and ask for a value. It calls the GetHashCode() method on the key, get the remainder from division with the capacity and then it knows in which bucket to check.

Remember, a bucket may contain more than one key, and even if we limited ourselves to a single key per bucket, it is entirely possible to have two keys with the same hash code. Therefore, the Hashtable will call the Equals method on the two objects. It will continue to do so on all the keys in the bucket  and find one where Equals() return true. If it can't find any key where Equals() is true, it will return null (or throw).

Do you see now why it is so important to implement Equals() and GetHashCode() together? And why it is crucial that objects that are equal to one another will return the same hash code?

If objects that are equals to one another do not return the same hash code, the Hashtable will look in a different bucket and will never find the correct key.

It is also important to remember that you may not want to put your objects as keys in a Hashtable, but there are plenty of others that will. Many frameworks uses Hashtable to add additional data to an object, which they can later access and act upon. Breaking the Equals()/GetHashCode() assumption will cause nothing but trouble, and it is one of those bugs that are extremely hard to find. Mostly because it is extremely non-obvious.

That said, you don't need to start implementing Equals() and GetHashCode() on all your objects, only on those where reference equality functionality is not enough. In the example above, a range is equal if the start and end are equals, and the hash code is simple adding the hash codes of the start and end.

 

 

 



[1] Except that annoying compiler warning, asking you to implement GetHashCode() as well.

[2] There are a lot of things I gloss over here. I'm not describing how to implement a Hashtable, but a high level overview of how it works.