I needed another way so I extended BinaryWriter (mostly to get it to serialize the types I needed it to) and now my 60 bytes take 68 bytes to serialize. In the process of writing this class, I looked through the disassembled BinaryWriter using JetBrains's dotPeek and found this little gem (Write7BitEncodedInt(int)) and decided it'd make a great dotPeek of the Week:
protected void Write7BitEncodedInt(int value) { uint num = (uint) value; while (num >= 128U) { this.Write((byte) (num | 128U)); num >>= 7; } this.Write((byte) num); }This is how the BinaryWriter.Write(string) encodes the length of the string. Thus, if you have a string with fewer than 128 characters, it only takes one byte to encode the length. In my implementation, I used this to specify the length of collections too. In fact, when writing a 32 bit signed integer, you should be able to save space for all positive numbers less than 2 ^ 22 and break even (require four bytes) for 2 ^ 29. Encoding an Int32 that is greater than or equal to 2 ^29 will require five bytes. Thus, if your integers tend to be smaller than 536,870,912, you'll probably save space encoding this way. A similar function could be used for a long where all positive values less than 2 ^ 57 will result in at least breaking even.
Here's how it works:
- Convert the number to an unsigned int so you can do arithmetic on positive numbers (after all, you're just writing bits)
-
While the converted value is greater than or equal to 128 (i.e., 8 bits)
- Write low 7 bits and put a 1 in the high bit (to indicate to the decoder more bytes are coming)
- Shift the 7 bits you just wrote off the number
- When the loop finishes, there will be 7 or fewer bits to write so write them
protected internal int Read7BitEncodedInt() { // some names have been changed to protect the readability int returnValue = 0; int bitIndex = 0; while (bitIndex != 35) { byte currentByte = this.ReadByte(); returnValue |= ((int) currentByte & (int) sbyte.MaxValue) << bitIndex; bitIndex += 7; if (((int) currentByte & 128) == 0) return returnValue; } throw new FormatException(Environment.GetResourceString("Format_Bad7BitInt32")); }Here's how this works:
- Set up an int to accumulate your data as you read them
- Set up a place to keep track of which 7-bit block you're reading
- While your bit index is less than 35 (i.e., you've read no more than 5 bytes comprising 1 "more bytes" indicator and 7 data bits)
- Read a byte
- Take the byte you just read and logical conjuction (bitwise and) it with sbyte.MaxValue (127 or in binary 0111 1111)
- Left shift those 7 bits to the position in which they belong
- Use a logical disjunction (bitwise or) to write those seven bits into your accumulator int
- Add 7 to your bit index for the next 7 bits you read
- If the current byte you read does not have a 1 in its most significant bit (i.e, byte & 1000 0000 == 0)
- There are no more bytes to read so just return the current accumulator value
- If you get to this point, you've read a 6th of 5 bytes and it's time to let the user know there was a formatting problem
I am building a native win32 program that communicates with a .net application that reads data with a BinaryReader. The sample code here was exactly what I needed to immediately be able to send strings to the .net server. Thanks a lot.
ReplyDelete