Like
we did for Revolution, I'm starting a thread to document the Lemmings 2 data files and their contents. I am aware that there is
existing documentation out there, but I don't feel that it's quite verbose enough to make full sense of what the files truly represent, so I'm starting from scratch. However, I would like to thank geoo, Mindless and any other contributors ahead of time, as I
am using those documents for reference.
__________
Compression FormatInteger data types are little-endian.
File
===============================================================================
Identifier String*4 ASCII "GSCM"
UnpackedSize UInt32 Total size of decompressed data
Chunks Chunk[] Compressed data chunks
===============================================================================
Chunk
===============================================================================
LastChunkFlag Byte Last chunk = 0xFF; 0x00 otherwise
SymbolCount UInt16 Number of symbol definitions
SymbolList Byte[Count] Destination symbol names
SymbolValuesA Byte[Count] First byte of symbol
SymbolValuesB Byte[Count] Second byte of symbol
DataSize UInt16 Number of bytes in encoded data
Data Byte[Size] Encoded data bytes
===============================================================================
At the beginning of each chunk, a 256-symbol dictionary is initialized.
Dictionary entries are indexed 0x00 through 0xFF, and each symbol is
initialized to the single byte value that matches its index. The symbol
definitions are then modified by concatenating symbols as specified by the
lists in the chunk header.
For each entry according to the entries in SymbolList (by index) the symbol is
redefined by concatenating the values of the symbols according to
SymbolValuesA and SymbolValuesB (also by index). All three lists represent
items in parallel to one another: the lists are all the same length, and the
item at position X in one list is to be used with the item in position X in the
other two lists.
The following specifies the algorithm for each item X in the lists, from 0 to
SymbolCount - 1. Symbols must be redefined in that order. Let "+" represent a
byte-wise concatenation operator:
Symbol[List[X]] = Symbol[ValuesA[X]] + Symbol[ValuesB[X]]
When bytes are processed from Data, they represent the indexes of symbols in
the dictionary. The symbols are copied wholesale, in the order they are
specified in Data, to the output. This takes place after the symbols are
redefined using the symbol lists.
Chunks will continue to be processed until a chunk with a non-zero
LastChunkFlag has finished processing.
In the event data is not compressed, it will not begin with the "GSCM"
identifier. Use the data as-is in this case. Both compressed and uncompressed
data can be used in most contexts, so they are processed according to whether
that "GSCM" identifier exists or not.
Hermann Schinagl made a LEMZIP utility for converting files to and from this compression format. It can be found at the following page:
http://www.camanis.net.ipv4.sixxs.org/lemmings/tools.php__________
Data FileMany of the data files in the game are packed into a structured archive format, where the file is broken up into sections. This includes the .dat files for graphics and levels.
Integer data types are little-endian.
File
===============================================================================
Identifier String*4 ASCII "FORM"
DataSize UInt32 Size of the remainder of the file
DataType String*4 ASCII identifier for the data file type
Section Section[] File sections containing additional data
===============================================================================
Section
===============================================================================
Identifier String*4 ASCII identifier for the current section
DataSize UInt32 Size of the remainder of the section
Data Byte[Size] Data specific to the type of section
===============================================================================
The number of sections in the file depends on the respective sizes of existing
sections as well as the total data size reported in the file header.
__________
Graphics RepresentationFor 256-color graphics, palettes are specified and pixels are plotted on the screen. Pixel information is specified in the "styles" data files as well as the full-screen background images. These images specify a particular pixel order, which is demonstrated with the following animation:
Pixels are drawn every fourth pixel, starting with the top-left pixel of the image. Pixels proceed left-to-right, skipping three pixels each time. All rows of pixels are drawn in this manner, from top to bottom. Once the end of the last row is reached, drawing moves back to to the top row, but starts from the second pixel from the left. As before, every fourth pixel is drawn for every row of pixels. After four passes, every pixel in the image has been drawn.
Let's say you have a byte buffer containing pixel data for an image. Each byte is one pixel. The pixel
Byte within that buffer, designated as
Data[Byte], can be translated to X and Y coordinates with the following general formula:
Operators:
= Assignment
/ Integer division
% Remainder division
* Multiplication
+ Addition
Stride = Width / 4
Pass = Byte / (Stride * Height)
X = Byte % Stride * 4 + Pass
Y = Byte / Stride % Height
Here,
Byte is the current byte in the buffer, starting at index 0.
Stride represents the number of pixels drawn per scanline in a single pass.
Pass indicates the current pass of drawing pixels for each scanline.
Width and
Height are the final dimensions of the image, and
X and
Y are the pixel coordinates within that image, relative to the top-left corner, of the pixel specified by the current byte.
Tiles in level data are 16x8 pixels in size. Plugging in those numbers produces the following formula:
Stride = 16 / 4
(Stride = 4)
Pass = Byte / (4 * 8)
X = Byte % 4 * 4 + Pass
Y = Byte / 4 % 8
...
X = Byte % 4 * 4 + Byte / 32
Y = Byte / 4 % 8
This is a somewhat complex formula for the X and Y coordinates, but it cannot be simplified any. However, processing pixel data
can be simplified by taking a different approach.
Let's now say that you read pixel data from the original byte buffer, called
Data, and you want to re-order them and store them in a new byte buffer, called
Output. To do this, we have the input byte index specified as
Data[Byte], and the output byte index specified as
Output[Pos]. From here, we get the simple end goal formula:
Output[Pos] = Data[Byte]
The calculation of
Pos, therefore, is necessary. Using the earlier X and Y pixel coordinates,
Pos can be calculated accordingly:
Pos = Y * Width + X
Substituting for
X,
Y and
Width yields a pretty ugly expression, but it can be simplified. Take a look:
Operators:
<< Bitwise left shift
>> Bitwise right shift
| Bitwise OR
& Bitwise AND
Pos = (Byte / 4 % 8) * 16 +
(Byte % 4 * 4 + Byte / 32)
...
Pos = (((Byte >> 2) & 7) << 4) +
(( Byte & 3) << 2) + (Byte >> 5)
...
Pos = ((Byte & 28) << 2) +
((Byte & 3) << 2) + (Byte >> 5)
...
Pos = ((Byte & 31) << 2) + (Byte >> 5)
...
Pos = Byte % 32 * 4 + Byte / 32
How 'bout them apples? Turns out both of those 32s in the expression come from the following expression:
Width * Height / 4
Meaning our final, unadulterated formula looks like this:
Quarter = Width * Height / 4
Output[Byte % Quarter * 4 + Byte / Quarter] = Data[Byte]
This works not only for the level tiles, but any of the graphics stored in the same general format: