Author Topic: Lemmings 2 File Formats  (Read 16336 times)

0 Members and 1 Guest are viewing this topic.

Offline GuyPerfect

  • Posts: 363
    • View Profile
Lemmings 2 File Formats
« on: March 30, 2013, 09:06:52 PM »
Like we did for Revolution, I'm starting a thread to document the Lemmings 2 data files and their contents. I am aware that there is existing documentation out there, but I don't feel that it's quite verbose enough to make full sense of what the files truly represent, so I'm starting from scratch. However, I would like to thank geoo, Mindless and any other contributors ahead of time, as I am using those documents for reference.
__________

Compression Format

Code: [Select]
Integer data types are little-endian.

File
===============================================================================
Identifier         String*4       ASCII "GSCM"
UnpackedSize       UInt32         Total size of decompressed data
Chunks             Chunk[]        Compressed data chunks
===============================================================================

Chunk
===============================================================================
LastChunkFlag      Byte           Last chunk = 0xFF; 0x00 otherwise
SymbolCount        UInt16         Number of symbol definitions
SymbolList         Byte[Count]    Destination symbol names
SymbolValuesA      Byte[Count]    First byte of symbol
SymbolValuesB      Byte[Count]    Second byte of symbol
DataSize           UInt16         Number of bytes in encoded data
Data               Byte[Size]     Encoded data bytes
===============================================================================

At the beginning of each chunk, a 256-symbol dictionary is initialized.
Dictionary entries are indexed 0x00 through 0xFF, and each symbol is
initialized to the single byte value that matches its index. The symbol
definitions are then modified by concatenating symbols as specified by the
lists in the chunk header.

For each entry according to the entries in SymbolList (by index) the symbol is
redefined by concatenating the values of the symbols according to
SymbolValuesA and SymbolValuesB (also by index). All three lists represent
items in parallel to one another: the lists are all the same length, and the
item at position X in one list is to be used with the item in position X in the
other two lists.

The following specifies the algorithm for each item X in the lists, from 0 to
SymbolCount - 1. Symbols must be redefined in that order. Let "+" represent a
byte-wise concatenation operator:

  Symbol[List[X]] = Symbol[ValuesA[X]] + Symbol[ValuesB[X]]

When bytes are processed from Data, they represent the indexes of symbols in
the dictionary. The symbols are copied wholesale, in the order they are
specified in Data, to the output. This takes place after the symbols are
redefined using the symbol lists.

Chunks will continue to be processed until a chunk with a non-zero
LastChunkFlag has finished processing.

In the event data is not compressed, it will not begin with the "GSCM"
identifier. Use the data as-is in this case. Both compressed and uncompressed
data can be used in most contexts, so they are processed according to whether
that "GSCM" identifier exists or not.

Hermann Schinagl made a LEMZIP utility for converting files to and from this compression format. It can be found at the following page:

http://www.camanis.net.ipv4.sixxs.org/lemmings/tools.php
__________

Data File

Many of the data files in the game are packed into a structured archive format, where the file is broken up into sections. This includes the .dat files for graphics and levels.

Code: [Select]
Integer data types are little-endian.

File
===============================================================================
Identifier         String*4       ASCII "FORM"
DataSize           UInt32         Size of the remainder of the file
DataType           String*4       ASCII identifier for the data file type
Section            Section[]      File sections containing additional data
===============================================================================

Section
===============================================================================
Identifier         String*4       ASCII identifier for the current section
DataSize           UInt32         Size of the remainder of the section
Data               Byte[Size]     Data specific to the type of section
===============================================================================

The number of sections in the file depends on the respective sizes of existing
sections as well as the total data size reported in the file header.
__________

Graphics Representation

For 256-color graphics, palettes are specified and pixels are plotted on the screen. Pixel information is specified in the "styles" data files as well as the full-screen background images. These images specify a particular pixel order, which is demonstrated with the following animation:



Pixels are drawn every fourth pixel, starting with the top-left pixel of the image. Pixels proceed left-to-right, skipping three pixels each time. All rows of pixels are drawn in this manner, from top to bottom. Once the end of the last row is reached, drawing moves back to to the top row, but starts from the second pixel from the left. As before, every fourth pixel is drawn for every row of pixels. After four passes, every pixel in the image has been drawn.

Let's say you have a byte buffer containing pixel data for an image. Each byte is one pixel. The pixel Byte within that buffer, designated as Data[Byte], can be translated to X and Y coordinates with the following general formula:

Code: [Select]
Operators:
= Assignment
/ Integer division
% Remainder division
* Multiplication
+ Addition

Stride = Width / 4
Pass   = Byte / (Stride * Height)
X      = Byte % Stride * 4 + Pass
Y      = Byte / Stride % Height

Here, Byte is the current byte in the buffer, starting at index 0. Stride represents the number of pixels drawn per scanline in a single pass. Pass indicates the current pass of drawing pixels for each scanline. Width and Height are the final dimensions of the image, and X and Y are the pixel coordinates within that image, relative to the top-left corner, of the pixel specified by the current byte.

Tiles in level data are 16x8 pixels in size. Plugging in those numbers produces the following formula:

Code: [Select]
Stride = 16 / 4
(Stride = 4)
Pass   = Byte / (4 * 8)
X      = Byte % 4 * 4 + Pass
Y      = Byte / 4 % 8

...

X = Byte % 4 * 4 + Byte / 32
Y = Byte / 4 % 8

This is a somewhat complex formula for the X and Y coordinates, but it cannot be simplified any. However, processing pixel data can be simplified by taking a different approach.

Let's now say that you read pixel data from the original byte buffer, called Data, and you want to re-order them and store them in a new byte buffer, called Output. To do this, we have the input byte index specified as Data[Byte], and the output byte index specified as Output[Pos]. From here, we get the simple end goal formula:

Code: [Select]
Output[Pos] = Data[Byte]
The calculation of Pos, therefore, is necessary. Using the earlier X and Y pixel coordinates, Pos can be calculated accordingly:

Code: [Select]
Pos = Y * Width + X
Substituting for X, Y and Width yields a pretty ugly expression, but it can be simplified. Take a look:

Code: [Select]
Operators:
<< Bitwise left shift
>> Bitwise right shift
|  Bitwise OR
&  Bitwise AND

Pos = (Byte / 4 % 8) * 16 +
      (Byte % 4 * 4 + Byte / 32)

...

Pos = (((Byte >> 2) & 7) << 4) +
      (( Byte       & 3) << 2) + (Byte >> 5)

...

Pos = ((Byte & 28) << 2) +
      ((Byte &  3) << 2) + (Byte >> 5)

...

Pos = ((Byte & 31) << 2) + (Byte >> 5)

...

Pos = Byte % 32 * 4 + Byte / 32

How 'bout them apples? Turns out both of those 32s in the expression come from the following expression:

Code: [Select]
Width * Height / 4
Meaning our final, unadulterated formula looks like this:

Code: [Select]
Quarter = Width * Height / 4
Output[Byte % Quarter * 4 + Byte / Quarter] = Data[Byte]

This works not only for the level tiles, but any of the graphics stored in the same general format:


Offline Mindless

  • Posts: 719
  • Inactive - may respond to PM.
    • View Profile
Re: Lemmings 2 File Formats
« Reply #1 on: March 31, 2013, 03:43:01 AM »
Hermann Schinagl made a LEMZIP utility for converting files to and from this compression format.

Just a note from when I contacted Hermann about LEMZIP:

Quote from: Hermann Schinagl
Decompression works well, but I didn't get the compression
work perfectly. Maybe you can do this part.

To be honest, I 'found' the decompression in Lemmings II code...
That's why it is assembly language. So be carefull with using it
officially.

So you might not want to use LEMZIP for compression -- there is however lem2zip which, as far as I know, compresses correctly.

Offline GuyPerfect

  • Posts: 363
    • View Profile
Re: Lemmings 2 File Formats
« Reply #2 on: April 03, 2013, 08:56:23 PM »
I'm not fully done with the level styles yet, but I figured I'd post my progress before my eyes fall out.
__________

Style Palette - L2CL

This data section is found within a FORM data file and has the section name "L2CL"

The palette is expressed as 128 RGB triplets. The represent colors 0-127 in the global palette. The other 128 entries are used by the GUI.

Code: [Select]
Integer data types are little-endian.
Note that the FORM data stores integers as big-endian.

L2CL
===============================================================================
(Unknown)          UInt16         Unknown. Seems to always be 0x0001.
Palette            RGB[128]       RGB color data
===============================================================================

RGB
===============================================================================
Red                Byte           Red channel (lower 6 bits only)
Gree               Byte           Green channel (lower 6 bits only)
Blue               Byte           Blue channel (lower 6 bits only)
===============================================================================

The palette from MEDIEVAL.DAT:


__________

Style Tiles - L2BL

This data section is found within a FORM data file and has the section name "L2BL"

Most map features, both animated and not, are build up from a number of 16x8-pixel "blocks" of tile data.

Pixels with color value 0 are considered to be "air", while all other pixels are "solid".

Code: [Select]
Integer data types are little-endian.
Note that the FORM data stores integers as big-endian.

L2BL
===============================================================================
TileCount          UInt16         The total number of tiles in the style.
Tiles              Tile[]         Tile pixel data
===============================================================================

Tile
===============================================================================
Pixels             Byte[128]      1 byte per pixel, refers to palette indexes.
===============================================================================

To re-order the pixel buffer so that pixels are stored linearly left-to-right
and top-to-bottom, process every byte B in the Pixels array accordingly:

    LinearPixels[(B % 32) * 4 + B / 32] = Pixels

In the above expression, % represents remainder division.

The tiles from MEDIEVAL.BAT:


__________

Style Presets - L2BE

This data section is found within a FORM data file and has the section name "L2BE"

Styles are packaged with "presets", rectangular groups of tiles that are useful for creating level terrain without specifying tiles individually. The game engine does not use these presets, however, but rather levels are stored as big groups of tiles. These presets are only useful to level editors.

Code: [Select]
Integer data types are little-endian.
Note that the FORM data stores integers as big-endian.

L2BE
===============================================================================
PresetCount        UInt16         The total number of presets in the style.
Presets            Preset[]       Preset definitions.
===============================================================================

Preset
===============================================================================
(Unknown1)         Byte           Unknown. Possibly used by editor.
(Unknown2)         Byte           Unknown. Possibly used by editor.
Width              Byte           Number of tiles wide.
Height             Byte           Number of tiles tall.
DataSize           UInt16         Total size of this Preset, including header.
Tiles              UInt16[]       Tile indexes (from L2BL).
===============================================================================

Tiles are arranged within presets in the order of left-to-right then
top-to-bottom.

Here are some sample presets from MEDIEVAL.DAT:


__________

Style Sprites - L2SS

This data section is found within a FORM data file and has the section name "L2SS"

There is a handful of different objects in the game that can interact with Lemmings, yet themselves are not represented by tile graphics. These include the cannons and the Medieval dragon and catapult.

Code: [Select]
Integer data types are little-endian.
Note that the FORM data stores integers as big-endian.

L2SS
===============================================================================
SpriteCount        UInt16         The total number of sprites in the style.
Sprites            Sprite[]       Sprite definitions.
===============================================================================

Sprite
===============================================================================
DataSize           UInt16         Size in bytes of the remainder of the Sprite.
Width              UInt16         Width of the Sprite, in pixels.
Height             UInt16         Height of the Sprite, in pixels.
ImagePointers      UInt16[4]      Pointers to the data for the 4 pixel layers.
ImageData          Byte[]         Encoded data representing image content.
===============================================================================

The values of the ImagePointers are relative to the first byte of the Sprites
array. However, they do not account for the DataSize bytes, meaning these
offsets need to be increased by 2 for each element in the Sprites array that
came before it.

The offset relative to the first byte in the Sprites array, therefore, within
sprite S in the array (beginning at 0), can be calculated with the following
formula:

  RealImagePointer = ImagePointer + 2 * (S + 1)

As with the other graphics in the game, these sprite graphics are expressed as
4 layers, where each layer represents vertical "stripes" of pixels every 4
columns. The data pointed to by the first ImagePointer represents pixel columns
0, 4, 8, 12, etc. The second ImagePointer represents columns 1, 5, 9, 13, etc.
This continues for the remaining pointers.

The ImageData bytes encode pixel values in such a way that pixels with palette
entry 0 (transparent pixels) are not directly expressed. This results in a
moderate level of compression in most cases, but in some instances will cause
the image data to bloat.

When first processing the data for an image layer, as pointed to by the
ImagePointer, the current X and Y drawing positions within the sprite are
initialized to 0. This corresponds with the top-left pixel of the sprite. When
bytes are read from the data to be used as pixel values, the X position will be
incremented once per byte. A special "newline" command will reset the X
position to 0 and increment Y by 1.

Bytes are processed 4 bits at a time, starting with the high 4 bits of each
byte. The resulting nibbles, here called the "high nibble" and "low nibble",
are bit-packed values containing the following fields:

  Field  mccc
  Bit    3  0

    m = Mode
        0 - Copy
        1 - Skip

    c = Count

Drawing stops when two specific conditions are met: the X position is 0, and
the high nibble is 0xF. Once this occurs, no further drawing is processed for
the current layer.

The actual X position in the final image is equal to the X position within the
layer, multiplied by 4, then with the layer number added (where the first layer
is layer 0). In other words:

  FinalX = LayerX * 4 + LayerNum

Pixels are processed first by the high nibble, then by the low nibble.

If Mode is Copy, then Count bytes are read from ImageData and stored in the
pixel buffer. If Mode is Skip, then the X position is increased by Count, but
no bytes are read from ImageData.

Should the value of the low nibble be 0, then, after processing both nibbles, a
"newline" operation occurs: X is reset to 0, and Y is incremented by 1.

The special sprites from MEDIEVAL.DAT:


Offline GuyPerfect

  • Posts: 363
    • View Profile
Re: Lemmings 2 File Formats
« Reply #3 on: April 03, 2013, 09:02:55 PM »
Just a note that the L2SS pixel representation is rather different than in geoo's document. The format as I've documented it here is 100% correct (to the best of my research), and I have conducted extensive testing with the game engine to figure out how data gets processed.

Offline geoo

  • Administrator
  • Posts: 1475
    • View Profile
Re: Lemmings 2 File Formats
« Reply #4 on: April 04, 2013, 03:30:28 AM »
I tested my L2SS docu by exporting all sections of the style graphics, VLEMMS and the .IFF files, and they display correctly (see here): http://www.lemmingsforums.com/index.php?topic=329.msg8284#msg8284
I didn't test for any compression codes not used in the original files. Are you sure you read my documentation correctly (the descriptions are not very detailled).

The Lemmings 2 demo uses a somewhat different format, btw.

Offline GuyPerfect

  • Posts: 363
    • View Profile
Re: Lemmings 2 File Formats
« Reply #5 on: April 04, 2013, 05:54:00 PM »
Are you sure you read my documentation correctly (the descriptions are not very detailled).

Yes I am. Did you read mine? (-:

Your work with the format was very helpful; I don't want to belittle your efforts at all. When I looked at the data, I couldn't make heads or tails of what it was representing, which meant I couldn't make modifications and go in the game to see what changed. Your document gave me a place to start, let me know what I was looking at, and enabled me to press on to find all of the details of how pixels are processed for these sprites.

Having said that, a few things do stand out in your document:
  • Nibble values are not 4-bit numeric values, but rather a 1-bit flag and a 3-bit numeric value. The stuff with "< 8" and ">= 8" doesn't accurately apply to how data is being processed.
  • Byte value 0xFF does not necessarily indicate end of stream. In fact, it's a perfectly valid encoding that effectively represents "skip 14".
    • Additionally, byte values other than 0xFF can be used to indicate end of stream.
  • A high nibble value of 0xE is not a special-case value. It's just the code for "skip 6".

Offline geoo

  • Administrator
  • Posts: 1475
    • View Profile
Re: Lemmings 2 File Formats
« Reply #6 on: April 04, 2013, 08:41:32 PM »
Nope, I actually haven't yet, takes a bit of time to read and make sense of it which I wasn't willing to spend this week. Perhaps next week I can give a somewhat more qualified statement. I just thought you meant something is wrong with its content when you said 'different', not just different in writing style, sorry about that. (You don't need to remind me that it's horribly written, I'm aware of that :XD:, good thing you're giving it a proper writeup now.) It might be missing some cases that are not used in the game, and in hindsight might be very weirdly written (especially the special case E, which for some reason back then I thought didn't fit the pattern), but from the plain viewpoint of content, it should be subset of yours as it works correctly on the original data. I'll read yours next week, as it's probably a big improvement over this old horrible description of mine, writing-wise.

Offline EricLang

  • Posts: 464
    • View Profile
Re: Lemmings 2 File Formats
« Reply #7 on: January 06, 2014, 09:28:25 AM »
Ok I don't understand the decompression.... 

At the beginning of each chunk, a 256-symbol dictionary is initialized.
Dictionary entries are indexed 0x00 through 0xFF, and each symbol is
initialized to the single byte value that matches its index


Code: [Select]
var dictionary: array[0..255] of byte;
for i := 0 to 255 do
  dictionary := i;

So we get a dictionary 0,1,2,3,4,5.....

What to do next?
if someone is trying to explain it please refer clearly to what is what.

we have
the chunk
- symbols[0.. symbolcount - 1]
- symbolvaluesa[0.. symbolcount - 1]
- symbolvaluesb[0..symbolcount - 1]
- compresseddata[0..datasize - 1]
and
- some dictionary[0..255]




Offline namida

  • Administrator
  • Posts: 12399
    • View Profile
    • NeoLemmix Website
Re: Lemmings 2 File Formats
« Reply #8 on: January 06, 2014, 09:38:44 AM »
dictionary[0] = 0
dictionary[1] = 1
dictionary[2] = 2
etc

Initially, that is. I didn't read the thing in full, but I think it can change later?
My Lemmings projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)

Offline EricLang

  • Posts: 464
    • View Profile
Re: Lemmings 2 File Formats
« Reply #9 on: January 06, 2014, 02:17:41 PM »
I had a look in the C-code that I found, so decompression is working :)

Offline ccexplore

  • Posts: 5311
    • View Profile
Re: Lemmings 2 File Formats
« Reply #10 on: January 06, 2014, 08:19:20 PM »
Glad to hear you got it worked out.  I suppose the documentation could maybe use some pseudo-code and examples there to aid implementers?

Offline namida

  • Administrator
  • Posts: 12399
    • View Profile
    • NeoLemmix Website
Re: Lemmings 2 File Formats
« Reply #11 on: January 06, 2014, 10:23:05 PM »
I found the examples quite helpful in your DOS formats documentation, definitely. (Though one thing I found quite confusing - maybe it's just me - is how, when you get a 9-bit value, you were writing it as say, 0x000, 0x008, 0x010, 0x018, etc... rather than just 0x00, 0x01, etc. I don't know, to some people that way probably makes more sense, but for me... :/ xD
My Lemmings projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)

Offline ccexplore

  • Posts: 5311
    • View Profile
Re: Lemmings 2 File Formats
« Reply #12 on: January 06, 2014, 10:38:21 PM »
@namida: I think you are actually referring to snippets of the DOS Lem1 LVL (ie. uncompressed level) file format documentation, which was actually written by rt (of Clones fame) and not me.  I noticed the same thing there and agree with you on this.

Offline EricLang

  • Posts: 464
    • View Profile
Re: Lemmings 2 File Formats
« Reply #13 on: January 06, 2014, 11:00:59 PM »
I just was wondering if the compression in dos lemmings (original, ohno etc.) is basically the same as in lemmings tribes. I never really studied the algorithm, but just 'blindly' translated the C or VB code that I encountered.

Offline namida

  • Administrator
  • Posts: 12399
    • View Profile
    • NeoLemmix Website
Re: Lemmings 2 File Formats
« Reply #14 on: January 06, 2014, 11:14:22 PM »
@namida: I think you are actually referring to snippets of the DOS Lem1 LVL (ie. uncompressed level) file format documentation, which was actually written by rt (of Clones fame) and not me.  I noticed the same thing there and agree with you on this.

To be honest, I'm mostly just going off the L1 documentations as a whole and I remember having problems with this, I don't remember specifically where it was implemented. And if it's the LVL format, that may very well be my fault, as the "TalveSturges" mentioned in the credits is yet another of my millions of past aliases.
My Lemmings projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)

Offline ccexplore

  • Posts: 5311
    • View Profile
Re: Lemmings 2 File Formats
« Reply #15 on: January 07, 2014, 01:32:37 AM »
I just was wondering if the compression in dos lemmings (original, ohno etc.) is basically the same as in lemmings tribes. I never really studied the algorithm, but just 'blindly' translated the C or VB code that I encountered.

I'm pretty sure no, they aren't the same, even beyond superficial differences like headers etc.  To be fair I have yet to read anything on the Lemmings 2 compression format in detail, but from what little I skimmed, it sounds like if nothing else, Lemmings 1 examines the data as bits when looking for redundancy, while Lemmings 2 does so in bytes.

Offline namida

  • Administrator
  • Posts: 12399
    • View Profile
    • NeoLemmix Website
Re: Lemmings 2 File Formats
« Reply #16 on: January 07, 2014, 03:58:09 AM »
From a quick glance, I don't even see any minor resemblances...
My Lemmings projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)

Offline EricLang

  • Posts: 464
    • View Profile
Re: Lemmings 2 File Formats
« Reply #17 on: January 08, 2014, 07:02:15 AM »
So trying to do L2 objects now. In the style files we have palettes of 128 length. But if my eyes do not deceive me then - when staring at the bytes of the L2SS section - I see palette indices > 127.
What and where is the second part (GUI part) of the palette? Or am I staring wrong :)

Offline EricLang

  • Posts: 464
    • View Profile
Re: Lemmings 2 File Formats
« Reply #18 on: January 08, 2014, 09:37:23 AM »
Pascal code for decompressing L2.
(Delphi added some new syntax, but the code should be readable by a programmer).

Code: [Select]
unit L2Decompress;

interface

type
  TSigChars = array[0..3] of AnsiChar;

  TSignature = packed record
    case byte of
      0: ( Chars: TSigChars );
      1: ( Id: Cardinal );
  end;

  TL2Decompressor = class
  private
    const
      CHUNK_SIZE = 2048; // maximum decompression size
    type
      TBuffer = array[0..CHUNK_SIZE - 1] of Byte; // decompressbuffer
      // ref entity for chunk decompression
      TRef = packed record
        Byte: Byte;
        Pair: array[0..1] of Byte;
      end;
      // one compressed chunk in compressed file
      TChunk = packed record
        LastChunkFlag  :  Byte;         // Last chunk = $FF; $00 otherwise
        RefCount       : UInt16;        // Number of refs
        Refs           : TArray<TRef>;  // 'dictionary' [RefCount]
        CompressedSize : UInt16;        // Number of bytes in compressed data
        CompressedData : TArray<Byte>;  // compressed data [CompressedSize]
        procedure LoadFromStream(S: TStream; out IsLast: Boolean);
        procedure DecompressChunk(out Buffer: TBuffer; out Size: Integer);
      end;
  public
    procedure Decompress(Src, Dst: TStream); overload;
  end;

implementation

procedure TL2Decompressor.TChunk.LoadFromStream(S: TStream; out IsLast: Boolean);
// load one chunk of compressed data from stream
var
  j, i: Integer;
begin
  // read the chunk flag
  S.ReadBuffer(LastChunkFlag, 1);
  if not (LastChunkFlag in [$FF, $00]) then raise Exception.Create('LastChunkFlag error');
  IsLast := LastChunkFlag = $FF;

  // read the refcount
  S.ReadBuffer(RefCount, 2);
  SetLength(Refs, RefCount);

  // and fill the refs byte by byte
  for i := 0 to RefCount - 1 do
    if S.Read(Refs.Byte, 1) <> 1 then raise exception.create('chunk ref read error a');

  // and fill the byte-pairs
  for j := 0 to 1 do
    for i := 0 to RefCount - 1 do
      if S.Read(Refs.Pair[j], 1) <> 1 then raise exception.create('cunks byte pair read error 2');

  // read the compressed size
  S.ReadBuffer(CompressedSize, SizeOf(CompressedSize));
  if CompressedSize > 100000 then raise exception.Create('datasize error');
  // and finally read the compressed data in to the buffer
  SetLength(CompressedData, CompressedSize);
  S.ReadBuffer(CompressedData[0], CompressedSize);
end;

procedure TL2Decompressor.TChunk.DecompressChunk(out Buffer: TBuffer; out Size: Integer);
// decompress one chunk of data
var
  i, j, d, s, Cnt: Integer;
begin
  Size := CompressedSize;
  if Size > CHUNK_SIZE then raise Exception.Create('chunk decompression error (chunksize)');
  // initialize the decompression buffer
  FillChar(Buffer, SizeOf(Buffer), 0);
  Move(CompressedData[0], Buffer[0], Size);

  for i := RefCount - 1 downto 0 do begin
    Cnt := 0;
    for j := Size - 1 downto 0 do
      if (Buffer[j] = Refs.Byte) then
        Inc(Cnt);
        if Size + Cnt > CHUNK_SIZE then raise Exception.Create('chunk decompression error (overflow)');
        d := Size - 1 + Cnt;
   for s := Size - 1 downto 0 do begin
          if (Buffer = Refs.Byte) then begin
            for j := 1 downto 0 do begin
             Buffer[d] := Refs.Pair[j];
             dec(d);
           end;
        end
      else begin
        Buffer[d] := Buffer;
        Dec(d);
      end;
      if d < -1 then raise Exception.Create('chunk decompression error (underflow)');
    end;
    Inc(Size, Cnt);
  end;
end;

procedure TL2Decompressor.Decompress(Src, Dst: TStream);
// decompress all chunks from sourcestream (src) to destinationstream (dst)
var
  Chunk: TChunk;
  OutSize: Integer;
  TotalSize: Integer;
  Buffer: TBuffer;
  F: TFileStream;
  IsLast: Boolean;
  Sig: TSignature;
  DecompressedSize: UInt32;
begin
  // read signature
  Src.ReadBuffer(Sig, 4);
  if Sig.Chars <> 'GSCM' then raise Exception.Create('L2 decompression read signature error');

  // read size
  Src.ReadBuffer(DecompressedSize, 4);
  if DecompressedSize > MegaByte then raise Exception.Create('decompress error: decompressed size too large'); // don't know limit

  TotalSize := 0;
  IsLast := False;
  while not IsLast do begin
    Chunk.LoadFromStream(Src, IsLast); // read 1 compressed chunk
    OutSize := 0;
    Chunk.DecompressChunk(Buffer, OutSize); // decompress 1 chunk
    Inc(TotalSize, OutSize);
    Dst.WriteBuffer(Buffer[0], OutSize);
  end;

  if TotalSize <> DecompressedSize then raise Exception.Create('Decompress OutSize mismatch');
end;

end.

Offline ccexplore

  • Posts: 5311
    • View Profile
Re: Lemmings 2 File Formats
« Reply #19 on: January 08, 2014, 11:21:39 AM »
So trying to do L2 objects now. In the style files we have palettes of 128 length. But if my eyes do not deceive me then - when staring at the bytes of the L2SS section - I see palette indices > 127.
What and where is the second part (GUI part) of the palette? Or am I staring wrong :)

Unfortunately it doesn't look like the second part of the palette is documented, unless I missed something on reading (quite possible).  I believe it's similar to how in Lemmings 1, the palette is split into two halves, with the lower 8 entries being effectively fixed across all graphics sets, while the upper 8 are the only ones that truly changes from set to set.  In the L2 case it apparently means the fixed upper half is not stored in the style files, but rather in some unknown location in other files of the game (possibly L2.exe itself).

Your best bet may unfortunately be to just create a modified L2SS section that uses the 128-255 indices, get a screenshot of that graphics in DOSBox, and consume the resulting bitmap to find out what those 128 palette entries should be.

Offline EricLang

  • Posts: 464
    • View Profile
Re: Lemmings 2 File Formats
« Reply #20 on: January 08, 2014, 11:54:26 AM »
Yep, same system I thought too. Only the low and high palettes are 128 in size now.
I''ll first scan through all tribes files to check if there is some gui palette hidden inside there.

Offline ccexplore

  • Posts: 5311
    • View Profile
Re: Lemmings 2 File Formats
« Reply #21 on: January 08, 2014, 07:19:58 PM »
If/when you do manage to find out the values for the remainder of the palette, do post something here that allows the documentation to be completed.  Even a binary file attachment (with suitable explanation on how the values are stored) can suffice--I can take over converting that to something more human-readable if necessary.

Offline EricLang

  • Posts: 464
    • View Profile
Re: Lemmings 2 File Formats
« Reply #22 on: January 08, 2014, 09:41:25 PM »
Currently exploring the files.

Candidate found, but I think this is just a picture...
Picture attachment goes wrong on the forum
The first (128 * 3) bytes (RGB) of ARK.ANM certainly looks like a palette.

Candidate found in LOAD.RKO at addres 202

Offline ccexplore

  • Posts: 5311
    • View Profile
Re: Lemmings 2 File Formats
« Reply #23 on: January 08, 2014, 10:02:08 PM »
I don't think there's a good way for you to be sure without ultimately testing it out with a specially created L2SS that uses all the upper palette entries.  There are far too many other graphics in this game that are not styles.  They could have their own palettes that nevertheless still only apply to the lower entries.  For example, I'm guessing ARK.ANM contains graphics for the animation of the ark pieces (on the screen that shows the map of all 12 tribes, and your progress as represented by ark pieces moving towards the center).  LOAD.RKO looks more promising, but could also potentially be a number of other things in the game.

I actually seem to recall now that there is somehow a way in one of DOSBox's more hidden UI to make it display the current palette in used by the (emulated) graphics card--I'm 99% sure I saw it with my own eyes at some point.  If that's the case, then I guess there is a far simpler way to solve this issue.  I'll try to find out more about this later today. [edit: scratch that for now]

Offline EricLang

  • Posts: 464
    • View Profile
Re: Lemmings 2 File Formats
« Reply #24 on: January 08, 2014, 10:12:46 PM »
The L2SS is a good idea, but I just started decoding the stuff, so that is a bit premature for me.

I ran a test on the files in the root of the Lemmings tribes.
There are not much candidates there so that is hopefull too.
Here are some pictures:
http://ericenzwaan.nl/eric/lemmings/Temp/candidates.zip
Say I found something in VSTYLE.DAT at position 23377 then the bitmap is called VSTYLE_DAT_27377.bmp
None of the results is in a compressed file.

I just mapped a 128 * 3 byte buffer through the files, doing 2 checks
1) all values must have R < 64 and G < 64 and B < 64.
2) if 80% is zero the check fails too

The endsection of VSTYLE.DAT is the best candidate now.

[candidate: ][ARK.ANM][0]
[candidate: ][ARK.ANM][384]
[candidate: ][INSTALL.EXE][40890]
[candidate: ][L2.RKO][21643]
[candidate: ][LOAD.RKO][202]
[candidate: ][LOAD.RKO][586]
[candidate: ][LOAD.RKO][970]
[candidate: ][LOAD.RKO][1354]
[candidate: ][LOAD.RKO][1738]
[candidate: ][PROCESS.RKO][1096]
[candidate: ][PROCESS.RKO][21070]
[candidate: ][VGA.RKO][14229]
[candidate: ][VGA.RKO][14613]
[candidate: ][VSTYLE.DAT][27377]

Offline ccexplore

  • Posts: 5311
    • View Profile
Re: Lemmings 2 File Formats
« Reply #25 on: January 08, 2014, 10:34:51 PM »
I actually seem to recall now that there is somehow a way in one of DOSBox's more hidden UI to make it display the current palette in used by the (emulated) graphics card--I'm 99% sure I saw it with my own eyes at some point.  If that's the case, then I guess there is a far simpler way to solve this issue.  I'll try to find out more about this later today.

On second thought, I think it's far more likely that I mis-remembered, and what I remember seeing was probably from an emulator for something else like Genesis or something.  Granted, I haven't updated my DOSBox from v0.73 so maybe there is such a feature in more current versions but I won't hold my hopes up.

Offline kaimitai

  • Posts: 6
    • View Profile
    • kaimitai's GitHub
Re: Lemmings 2 File Formats
« Reply #26 on: September 03, 2021, 10:07:39 PM »
Yep, same system I thought too. Only the low and high palettes are 128 in size now.
I''ll first scan through all tribes files to check if there is some gui palette hidden inside there.

Since the topic is not locked, I will take my chances reviving an old thread.

I have injected a tile of palette entries 128-255 into the styles files, and used it in one level of each style - which allowed me to extract the rgb-values for the high palettes of each. An interesting find is that palette entries 145, 146 and 147 cycle in all styles; red, red, yellow. (at different modes - so you can use it to make some kind of monochrome animation) Palette entries 164 and up are black in all styles.

The attached file contains the styles and levels (need to be used in combination) that i used, as well as screenshots of each palette, plus a text file with rgb-values for each entry. Note that these colors are according to dosbox on my system. Those who are interested can load up the files on their own systems and compare. (I used the l2-fix executable to run the game)



I also want to point out that the formula used in the original post for creating bitmaps from the 4-layer encoded graphics only works if the width of the image is a multiple of 4. Otherwise the 4 layers will have different sizes, the first being the biggest. For example if the width is 43, layer 1,2 and 3 will have widths 11, while layer 4 has width 10. (at least this is how it seems to be)

This is important for the run-length encoded sprite data (L2SS section) in the style-files which I have trouble decoding in some cases. For Medieval I have no problems, for Classic there is supposed to be one sprite which I cannot decode, and for Circus the 7th sprite cannot be decoded by my implementation.
EDIT: After loading in the extracted palettes and redrawing the sprites, they are all little off especially near the bottom. I think my sprite decoding algorithm is not 100%.

For those who can answer, I have 2 questions (which I will investigate myself if I have to, but if someone can tell me it will save me some time):

  • Is it possible for the run length encoding to spill over from one line to the next (meaning I need to check the x-index for each pixel copied/skipped and "manually" do a line shift)?
  • Can there be garbage data in this section if it is not used by any sprite animation? Say if the animations only use sprites #0, #1 and #2 - there is no use trying to decode sprite #3 even if the L2SS header tells me there are 4 sprites?


My implementation for decoding one layer is roughly as follows (I have the correct layer width parameter I believe, as I know they will vary for sprites where (width mod 4) !=0);

Code: [Select]
decode_sprite_layer(int layer_width, int layer_height, const std::vector<byte>& input) {
std::vector<byte> result(layer_width* layer_height, 0); // initialize with all zeroes
int x{ 0 }, y{ 0 };
        int stream_index {0};

while (true) {
    int high_nibble = (input.at(stream_index ) & 0xf0) >> 4;
    int low_nibble = input.at(stream_index ) & 0x0f;

    ++stream_index ;

if (x == 0 && high_nibbe == 0xf)
return result;

// perform copy/skip for high nibble
bool copy_mode = (high_nibble >> 3) == 0;
int count = high_nibble & 0b0111;
   
if(copy_mode) {
  // copy all bytes between input[stream_index] to input[stream_index + count - 1] inclusive, to result, starting at result[y*layer_width+x]
  // increase x and the stream_index by count
  } else {
  // only increase x by count, but do not touch the stream index
}

// HERE: perform copy/skip for low nibble
// logic the same as for the high nibble

if(low_nibble==0) {
x=0;
++y;
}

}

}
« Last Edit: September 03, 2021, 11:12:18 PM by kaimitai »
Life is like a sewer. What you get out of it, depends on what you put into it. --Tom Lehrer

Offline kaimitai

  • Posts: 6
    • View Profile
    • kaimitai's GitHub
Re: Lemmings 2 File Formats
« Reply #27 on: September 04, 2021, 10:52:36 PM »
The sprite decoding algorithm I described above is correct, I just had a bug in my concrete implementation that made it "almost" work. I have decoded all the sprites successfully, and to answer my own questions - the answers are no and no.

Incidentally, all the styles have at least one sprite associated with them - but the styles that only have one sprite have this image, in different palettes:

Life is like a sewer. What you get out of it, depends on what you put into it. --Tom Lehrer