At work this week, I had the need to serialize objects and encrypt them while trying to keep the smallest data footprint I could. I figured the easiest thing to do would be to binary serialize them with the
BinaryFormatter. It was, indeed, the easiest thing to do; however, the BinaryFormatter seems to come with a fair amount of overhead. By the time the BinaryFormatter was finished listing all of the necessary assembly-qualified type names, the 60 bytes of data I wanted to preserve were more than a kilobyte!
I needed another way so I extended
BinaryWriter (mostly to get it to serialize the types I needed it to) and now my 60 bytes take 68 bytes to serialize. In the process of writing this class, I looked through the disassembled BinaryWriter using
JetBrains's dotPeek and found this little gem (
Write7BitEncodedInt(int)) and decided it'd make a great
dotPeek of the Week:
protected void Write7BitEncodedInt(int value)
{
uint num = (uint) value;
while (num >= 128U)
{
this.Write((byte) (num | 128U));
num >>= 7;
}
this.Write((byte) num);
}
This is how the
BinaryWriter.Write(string) encodes the length of the string. Thus, if you have a string with fewer than 128 characters, it only takes one byte to encode the length. In my implementation, I used this to specify the length of collections too. In fact, when writing a 32 bit signed integer, you should be able to save space for all positive numbers less than 2 ^ 22 and break even (require four bytes) for 2 ^ 29. Encoding an Int32 that is greater than or equal to 2 ^29 will require five bytes. Thus, if your integers tend to be smaller than 536,870,912, you'll probably save space encoding this way. A similar function could be used for a long where all positive values less than 2 ^ 57 will result in at least breaking even.
Here's how it works:
- Convert the number to an unsigned int so you can do arithmetic on positive numbers (after all, you're just writing bits)
-
While the converted value is greater than or equal to 128 (i.e., 8 bits)
- Write low 7 bits and put a 1 in the high bit (to indicate to the decoder more bytes are coming)
- Shift the 7 bits you just wrote off the number
- When the loop finishes, there will be 7 or fewer bits to write so write them
I think this is really clever and very easy. Reading the data back is a smidgen more complicated (
Read7BitEncodedInt()):
protected internal int Read7BitEncodedInt()
{
// some names have been changed to protect the readability
int returnValue = 0;
int bitIndex = 0;
while (bitIndex != 35)
{
byte currentByte = this.ReadByte();
returnValue |= ((int) currentByte & (int) sbyte.MaxValue) << bitIndex;
bitIndex += 7;
if (((int) currentByte & 128) == 0)
return returnValue;
}
throw new FormatException(Environment.GetResourceString("Format_Bad7BitInt32"));
}
Here's how this works:
- Set up an int to accumulate your data as you read them
- Set up a place to keep track of which 7-bit block you're reading
- While your bit index is less than 35 (i.e., you've read no more than 5 bytes comprising 1 "more bytes" indicator and 7 data bits)
- Read a byte
- Take the byte you just read and logical conjuction (bitwise and) it with sbyte.MaxValue (127 or in binary 0111 1111)
- Left shift those 7 bits to the position in which they belong
- Use a logical disjunction (bitwise or) to write those seven bits into your accumulator int
- Add 7 to your bit index for the next 7 bits you read
- If the current byte you read does not have a 1 in its most significant bit (i.e, byte & 1000 0000 == 0)
- There are no more bytes to read so just return the current accumulator value
- If you get to this point, you've read a 6th of 5 bytes and it's time to let the user know there was a formatting problem