The image files in Lemmings Revolution are somewhat misleadingly named ".bmp", since the files themselves are not actual Windows Bitmap files. However, the Lemmings files
do represent Windows Bitmap files through a layer of compression.
The File FormatThe file format is as follows:
File
=============================================================================
ChunkNum UInt8 The number of chunks in the file
Chunks Chunk[] The data chunks
=============================================================================
Chunk
=============================================================================
Header ChunkHeader Describes how the chunk is encoded
Data Byte[] The compressed input data
=============================================================================
ChunkHeader
=============================================================================
PackedSize UInt16 The total size of this chunk, including the header
UnpackedSize UInt16 The size of the data this chunk represents
(Unknown) UInt8 Value of unknown significance; usually (always?) 4
RawTree BinTree Binary tree for raw byte counts
LengthTree BinTree Binary tree for distance/length lengths
DistanceTree BinTree Binary tree for distance/length distances
=============================================================================
BinTree
=============================================================================
The lower 4 bits of the first byte indicates how many additional values need
to be read from the input. If the value is 0, no additional input is needed
as the binary tree will not be used. If it's greater than 0, then add 2 to it
and read that many more values.
The values themselves are in 4-bit increments starting with the high bits of
the byte that contained the count, followed by however many bytes are
necessary to encode the remaining 4-bit values. For each byte, the lower 4
bits are read first, followed by the higher 4 bits.
The last byte may have 4 bits of padding so that the next data begins on a
byte boundary.
=============================================================================
Since the compression algorithm's implementation was designed for 16-bit systems, chunks are limited to an unpacked size of 64 KB. For files larger than this, additional chunks are present and they are simply concatenated to form the decompressed file.
Binary TreesUPDATE: Apparently the means by which these trees are stored in the file is called
canonical Huffman code.
Each chunk header can define up to 3 binary trees that are used to more concisely encode numbers into the input data. Rather, the trees are stored in the data for decoding those concise numbers. The first tree is used when reading raw byte counts, the second for when reading distance/length lengths, and the third for when reading distance/length distances. More on those later.
Note: For the uninitiated, you can mask out the lower 4 bits of a value by performing a
bitwise AND with the number 15.
The lower 4 bits of the first byte read for a binary tree indicates how many values are to be added to the tree, and those values describe how to construct that tree. These values are all 4 bits, and can be decoded with the following algorithm:
- Read a byte from the input into CURBYTE
- Copy the lower 4 bits of CURBYTE into COUNT
- If COUNT is 0
- This binary tree will not be used
- Otherwise
- Add 2 to COUNT
- Create an array NIBBLES that contains COUNT elements, with the first element being at index 0 in the list
- LOOP: Iterate X from 0 to COUNT - 1
- If X is even, divide CURBYTE by 16 (or shift CURBYTE right by 4)
- If X is odd, read the next byte from the input into CURBYTE
- Copy the lower 4 bits of CURBYTE into NIBBLES[X]
For example, the byte string 22 31 03 will yield 4 nibbles: 2, 1, 3, 3 and the 0 is just padding.
The final structure of the binary tree is not known until all of the nibbles are processed. Note that any given node in a binary tree can either be a
terminal node with a value, or a
branching node that has two child nodes. These are the only two types of nodes that can exist in a binary tree, and the topmost node that connects everything else is called the
root node.
Initially, the root node is set to be a branching node and its child nodes are of unknown type (in regards to terminal or branching). The root node is at the top of the tree, meaning it has no "depth" into the tree. For this documentation, I'll refer to this as having a "depth level" of 0. The two child nodes of the root node have a depth level of 1. If they have child nodes, their children will be at depth level 2 and so-on.
The initial tree looks like this:
Level 0 root
0/ \1
Level 1 ? ?
To determine the significance of these nodes, the list of nibbles is searched in order to find one where the nibble
value is equal to the depth level of the node in question.
IMPORTANT: The order in which the nodes are considered is determined by the
preorder traversal algorithm, where "left" corresponds with the 0 side (not the 1 side).
Consider the list of nibbles from earlier:
Nibble Value: 2 1 3 3
Nibble Index: 0 1 2 3
In this case, the first node of unknown significance is the root node's 0 child (according to preorder traversal), which is in depth level 1. Therefore, a nibble with a
value of 1 is sought. There is one, so the nibble's
index becomes the
node's value--in this case also a 1--and the nibble is processed in such a way so that it will not be considered in future searches: it is effectively "crossed out."
At this point, the tree looks like this:
Level 0 root
0/ \1
Level 1 (1) ?
Nibble Value: 2 x 3 3
Nibble Index: 0 1 2 3
Note that the value of the nibble at index 1 has been crossed out.
When the list of nibbles is searched again for another nibble with a
value of 1 (for the current depth level), none can be found because the only remaining nibbles contain 2, 3 and 3. In this situation, the node becomes a branching node and that node's 0 child is the next to be considered (according to the preorder traversal algorithm):
Level 0 root
0/ \1
Level 1 (1) @
0/ \1
Level 2 ? ?
Nibble Value: 2 x 3 3
Nibble Index: 0 1 2 3
The remainder of the nibbles and nodes are processed in the same manner--again, according to preorder traversal--and the remaining steps look like this:
=Step 3= =Step 4= =Step 5= =Step 6=
Level 0 root Level 0 root Level 0 root Level 0 root
0/ \1 0/ \1 0/ \1 0/ \1
Level 1 (1) @ Level 1 (1) @ Level 1 (1) @ Level 1 (1) @
0/ \1 0/ \1 0/ \1 0/ \1
Level 2 (0) ? Level 2 (0) @ Level 2 (0) @ Level 2 (0) @
0/ \1 0/ \1 0/ \1
Level 3 ? ? Level 3 (2) ? Level 3 (2) (3)
Nibble Value: x x 3 3 Nibble Value: x x 3 3 Nibble Value: x x x 3 Nibble Value: x x x x
Nibble Index: 0 1 2 3 Nibble Index: 0 1 2 3 Nibble Index: 0 1 2 3 Nibble Index: 0 1 2 3
Note: Nibbles need not define any values for a particular depth level. In fact, in most situations, level 1 will not contain any definitions and thus both of the child nodes of the root node themselves become branching nodes. It's important to stick to the preorder traversal algorithm and attempt to use the nibbles to assign values at each node, one step at a time.
Note: If a given nibble index doesn't need to appear in the binary tree as a node value, the nibble value at that index can be 0. Since a nibble specifying depth level 0 will never be searched for, this will effectively force that index to be skipped. For the same reason, nibbles can be "crossed out" by setting their values to 0.
Note: Data errors can easily make the tree impossible to construct. If, when searching, all of the nibbles have a value less than the current depth level, the process should be aborted before an endless loop and a memory leak occur.
Note: The process ordinarily terminates when there are no further nodes of unknown significance. In the event there are still remaining, "unused" nibbles, they are simply ignored.
The Compression AlgorithmThe compression algorithm itself is a type of
ROLZ that, in addition to encoding repeating strings of bytes in distance/length references (which is the technique used by
LZ77), is able to reduce the size of literal numbers in the data by using a binary tree, presumably constructed with the
Huffman technique. This particular implementation, however, is quite restrictive and will lose out over conventional compression algorithms for larger files. For the small data in Lemmings Revolution, however, this variant of ROLZ will generally out-perform
DEFLATE, the algorithm used in ZIP files, for total compression ratio.
The general decompression algorithm looks like this:
- Initialize the decoder to the RAW state
- LOOP: Begin decoding until all output data is processed or an error occurs
- If the current decoder state is RAW
- Read a number from the input
- Read that number of bytes (8 bit values) from the input to the output
- If the current decoder state is COPY
- Read a number from the input for LENGTH and add 2
- Read a number from the input for DISTANCE and add 1
- From the current position in the output buffer, start reading from DISTANCE bytes earlier and copy LENGTH bytes to the end of the output
- Alternate decoder states: if it's RAW, switch to COPY and vice-versa
Note: It's totally valid for LENGTH to be greater than DISTANCE. All that means is that it will copy bytes that have already been copied during the current operation. When repeating a single byte many times, DISTANCE can just be 1 and LENGTH can be however many copies you want.
Reading BitsFor the purpose of reading bits from the input data, the first bit is the highest bit (2^7) of the first byte of the input data. The second bit is 2^6 and so-forth. The ninth bit will be the highest bit of the second byte, and this continues through the input data. For example:
- Input data (as hex): 9A 36 E3
- Input data (as bits): 10011010 00110110 11101011
- To read 3 bits will return 100
- To read another 4 bits will return 1101
- To read another 10 bits will return 0001101101
Reading NumbersTo "read a number from the input," as the algorithm above indicates, follows a particular convention. Numbers can be read in one of two ways, depending on whether or not there is a binary tree present for the purpose of the number being read (raw, length or distance). This is determined by the number of nibbles read for the tree: if the number was 0, then the tree is not present.
If no tree is present for the current purpose, then numbers are read entirely from the input and conform to a "normal" format.
If a tree
is present for the current purpose, then the contents of that tree will dictate what bits are read from the input and what the value of the number will be.
The Normal Number FormatThe Normal format for numbers is as follows:
- Initialize COUNT to 0
- LOOP: Begin reading bits from the input until the bit read is a 0
- If the bit is a 1, increment COUNT
- Read an additional COUNT bits from the input into BITS
- The value of the number is 2 ^ COUNT - 1 + BITS
Note: If COUNT reaches 16 (perhaps 17?), then the data contains an error and the loop should be aborted. This isn't a violation of the format, per sé, but it
will cause the 16-bit Lemmings Revolution implementation to overflow.
The Tree Number FormatThe Tree format for numbers is as follows:
- Initialize NODE to the root node of the appropriate binary tree
- LOOP: Read 1 bit at a time from the input until NODE is a terminal node
- If the bit is a 0, set NODE to its 0 child node
- If the bit is a 1, set NODE to its 1 child node
- Copy the value of NODE into PARAM
- If PARAM is less than 2
- The value of the number is PARAM
- Otherwise
- Subtract 1 from PARAM
- Read PARAM bits from the input into BITS
- The value of the number is 2 ^ PARAM + BITS