D. Patrick Caldwell on Software Engineering: .Net GetHashcode Functions

Last week, I wrote a post about a .Net method to combine hashcodes. In the process of designing that method, I looked into dozens of .Net's GetHashcode implementations.

If you'd like to know the principles of hashcodes, take a look at the aforementioned article. This one is part of the dotPeek of the Week series so I'll just be sharing the insight I got from the framework's implementations here.

First, some really basic ones:

// from Int32
public override int GetHashCode()
{
  return this;
}

// from Int16
public override int GetHashCode()
{
  return (int) (ushort) this | (int) this << 16;
}

// from Int64
public override int GetHashCode()
{
  return (int) this ^ (int) (this >> 32);
}

The Int32 implementation easily meets the hashcode requirements by returning itself. The Int16 copies itself to the top half of the Int32 and returns that. The Int64 takes the bottom half of itself and XORs it with the top half.

This XOR is the first valuable piece of information. If you look at the logical functions, the XOR produces the best bit twiddling for hashing. Basically, the XOR produces a relatively even distribution of bits as opposed to the OR and the AND operations which will be biased 3 to 1 in one direction or the other.

Some more complicated implementations:

// from String
public override unsafe int GetHashCode()
{
  fixed (char* chPtr = this)
  {
    int num1 = 352654597;
    int num2 = num1;
    int* numPtr = (int*) chPtr;
    int length = this.Length;
    while (length > 0)
    {
      num1 = (num1 << 5) + num1 + (num1 >> 27) ^ *numPtr;
      if (length > 2)
      {
        num2 = (num2 << 5) + num2 + (num2 >> 27) ^ numPtr[1];
        numPtr += 2;
        length -= 4;
      }
      else
        break;
    }
    return num1 + num2 * 1566083941;
  }
}

// from Tuple<>
internal static int CombineHashCodes(int h1, int h2)
{
  return (h1 << 5) + h1 ^ h2;
}

These two methods have some really good information in them. The String.GetHashcode implementation has what's called a rolling hash. It loops through the characters, does a barrel shift by 5, and then XORs with the next character.

While I liked this, I preferred the simplicity of the Bernstein Hash. The main component of the Bernstein Hash is the (i << 5) + i. i << 5 == i * 32 (except that bit shifting is much faster than multiplying).

Thus, i << 5 + i == 32i + i == 33i. The Bernstein Hash just takes the current hash value, multiplies it by 33, and XORs in the new value.

I didn't use 33 in my hash function because I feel like using prime numbers is healthy. Thus, instead of adding I subtract so my hashcode method is (hash << 5) - hash ^ value (or 31 * hash ^ value). This, by the way, is the way Java tends to do it.

Here are some interesting ones from System.Drawing:

// from Size
public override int GetHashCode()
{
  return this.width ^ this.height;
}

// from Rectangle
public override int GetHashCode()
{
  return this.X ^ (this.Y << 13 | (int) ((uint) this.Y >> 19)) ^ (this.Width << 26 | (int) ((uint) this.Width >> 6)) ^ (this.Height << 7 | (int) ((uint) this.Height >> 25));
}

// from Point
public override int GetHashCode()
{
  return this.x ^ this.y;
}

D. Patrick Caldwell on Software Engineering

Thursday, August 25, 2011

.Net GetHashcode Functions

No comments:

Post a Comment