Diving Into Hashtables and Dictionaries
Most
programmers had to implement a Hashtable or a map at some point (usually at
school). Hopefully, they never do it again, but gain valuable insight about how
the collection works.
In
.Net, there are two methods that each object implements which are used with
hash tables. Those methods are GetHashCode() and Equals(). When you create your
own classes, you get the default behavior, which is to check for object
reference equality (to the C/C++ guys, that is pointer equality).
The
problem starts when you want to provide a different semantic for Equals. For
instance, if you have a range, and you want to say that if both items are the
same, they are equals. So you override Equals(), make the appropriate check,
and you are ready to go. [1]
First,
a Hashtable is composed of buckets, like the ones of the left. A bucket has key
and value, although in some cases, it has several key / value pairs.
Well,
the Hashtable calls to the key's GetHashCode() method, divide the resulting
value with the current capacity of the table. It then takes the remainder and
check if there is already a value in the bucket. If there is, it will double
its capacity and try again[2].
Usually if there is a collision (two different keys with the same hash code),
both of them will be put into the same bucket.
How
does it pull the values out, then? Well, it is actually a lot simpler than
that. You give the Hashtable a key and ask for a value. It calls the
GetHashCode() method on the key, get the remainder from division with the
capacity and then it knows in which bucket to check.
Remember,
a bucket may contain more than one key, and even if we limited ourselves to a
single key per bucket, it is entirely possible to have two keys with the same
hash code. Therefore, the Hashtable will call the Equals method on the two
objects. It will continue to do so on all the keys in the bucket and find one
where Equals() return true. If it can't find any key where Equals() is true, it
will return null (or throw).
Do
you see now why it is so important to implement Equals() and GetHashCode()
together? And why it is crucial that objects that are equal to one
another will return the same hash code?
If
objects that are equals to one another do not return the same hash code, the
Hashtable will look in a different bucket and will never find the correct key.
It
is also important to remember that you may not want to put your objects
as keys in a Hashtable, but there are plenty of others that will. Many
frameworks uses Hashtable to add additional data to an object, which they can
later access and act upon. Breaking the Equals()/GetHashCode() assumption will
cause nothing but trouble, and it is one of those bugs that are extremely hard
to find. Mostly because it is extremely non-obvious.
That
said, you don't need to start implementing Equals() and GetHashCode() on all
your objects, only on those where reference equality functionality is not
enough. In the example above, a range is equal if the start and end are equals,
and the hash code is simple adding the hash codes of the start and end.
Comments
Comment preview