Thursday, August 18, 2011

.Net Method to Combine Hash Codes

XOR Venn DiagramA while back I developed a helper method that I've been using to aid me in computing good hash values from sets of properties for an object. I was writing a dotPeek of the Week entry about .Net GetHashcode implementations and I decided to spruce up my implementation and post it on my blog.

I know It's pretty uncommon that you need to override the Equals method, but when you do, you have to take particular care in considering overriding the GetHashCode method as well. To make this a little easier, I developed a method I call CombineHashCodes().

Using CombineHashCodes, I can easily compute a well randomized composite hash code based on several parameters from an object. This curtails a lot of the work involved when dealing with objects that have complicated .Equals overrides.

Microsoft's documentation on Object.GetHashCode() lists the following rules:
A hash function must have the following properties:

If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

For the best performance, a hash function must generate a random distribution for all input.

Thus, if you override the .Equals method such that it's no longer doing the default reference equality, you probably need to change your GetHashCode implementation. I've found this is usually a very simple task. I whipped up a quick example in LINQPad:
void Main()
{
 var lauren1 = new Person { FirstName = "Lauren" };
 var lauren2 = new Person { FirstName = "Lauren" };
 
 Console.WriteLine(lauren1.Equals(lauren2));
 Console.WriteLine("{0}.GetHashCode() = {1}", "lauren1", lauren1.GetHashCode());
 Console.WriteLine("{0}.GetHashCode() = {1}", "lauren2", lauren2.GetHashCode());
 
 if (lauren1.Equals(lauren2) && lauren1.GetHashCode() != lauren2.GetHashCode())
  Console.WriteLine("This is bad.  This is very bad.");
}

public class Person
{
   public string FirstName { get; set; }
}

/*
False
lauren1.GetHashCode() = 47858386
lauren2.GetHashCode() = 20006478
*/

As one would expect, lauren1 does not equal lauren2 and neither do the hashcodes; however, if we were to override the .Equals method on the Person object to compare just the first name, we'll find that lauren1 and lauren2 will be equal but will have different hashcodes! This violates the first rule of GetHashCode Club. Here's an example:
public class Person
{
   public string FirstName { get; set; }
  
   public override bool Equals (object obj)
   {
  var otherPerson = obj as Person;
  
  if (otherPerson == null)
   return false;
   
  return String.Compare(this.FirstName, otherPerson.FirstName, StringComparison.OrdinalIgnoreCase) == 0;
 }
}

/*
True
lauren1.GetHashCode() = 33193253
lauren2.GetHashCode() = 37386806
This is bad.  This is very bad.
*/

That's why we override GetHashCode as well and something like this will work:
public override int GetHashCode()
{
 return FirstName.GetHashCode();
}

/*
True
lauren1.GetHashCode() = 2962686
lauren2.GetHashCode() = 2962686
*/

That's all well and good (except the .Equals() does a case insensitive compare so "Lauren" and "lauren" will be equal but will result in different hash codes unless you do something like return FirstName.ToLower().GetHashCode(). Why'd I leave that error in the post just to parenthetically correct it . . . because I think it's a good demonstration of how easy it is to screw this up :)). Consider what happens when you end up with a more complex class that looks like this:
public class Person
{
   public string FirstName { get; set; }
 public string LastName { get; set; }
 public DateTime BirthDate { get; set; }
  
   public override bool Equals (object obj)
   {
  var otherPerson = obj as Person;
  
  if (otherPerson == null)
   return false;
   
  return String.Compare(FirstName, otherPerson.FirstName, StringComparison.OrdinalIgnoreCase) == 0
   && String.Compare(LastName, otherPerson.LastName, StringComparison.OrdinalIgnoreCase) == 0
   && BirthDate.Equals(otherPerson.BirthDate);
 }
}

You could leave the GetHashCode function the way it was previously, but then all Lauren's will be lumped in the same bucket together which violates the third rule about a random distribution. I've seen people combat this problem by concatenating the string values of the equals fields delimited by some extremely unlikely string and getting the hash code of that.

I felt like that was not really the best way to do it in case someone decides a colon is a good character in a first name or something worse.

Thus, I wrote something like this (which has recently been modified to what you see now based on my explorations for my dotPeek of the Week series). This is CombineHashCodes:
public static class HashCodeHelper
{
 public static int CombineHashCodes(params object[] args)
 {
  return CombineHashCodes(EqualityComparer<object>.Default, args);
 }

 public static int CombineHashCodes(IEqualityComparer comparer, params object[] args)
 {
  if (args == null) throw new ArgumentNullException("args");
  if (args.Length == 0) throw new ArgumentException("args");

  int hashcode = 0;

  unchecked
  {
   foreach (var arg in args)
    hashcode = (hashcode << 5) - hashcode ^ comparer.GetHashCode(arg);
  }

  return hashcode;
 }
}

With this method, you can get a nice pseudo-random distribution of hash codes without violating any of the GetHashCode rules as long as you include the values relevant to your Equals method in your call to CombineHashCodes:
public override int GetHashCode()
{
 return HashCodeHelper.CombineHashCodes(FirstName.ToLower(), LastName.ToLower(), BirthDate);
}

/*
True
lauren1.GetHashCode() = -891990792
lauren2.GetHashCode() = -891990792
*/

1 comment:

  1. Just what I was looking for. Thanks a lot for this post!

    ReplyDelete