Lemmings Forums

Lemmings Boards => Tech & Research => Topic started by: deuteragenie on October 31, 2024, 11:56:07 AM

Title: Decompression question
Post by: deuteragenie on October 31, 2024, 11:56:07 AM
Hello,

I am trying to decompress an (original version) main.dat file, following the instructions at:
https://www.camanis.net/lemmings/files/docs/lemmings_dat_file_format.txt

The length of the first section is 8670 bytes (compressed), incl. 10 bytes for the header.

At position 8670 in main.dat, I find the following bits (in reverse order):

111 00000010 0000000 ...

According to the spec, "111" indicates that the following 8 bits represent an amount of bytes to read + 9.

So, we should have 2 + 9 = 11 bytes to read.

But ... the ldecomp says 13, i.e. 4 + 9 bytes to read.

Am I overlooking something ? Should it be that the next 9 bits indicate the number of bytes to read, like this:

111 000000100 00000000 ...

Or is ldecomp wrong ? Or the spec wrong ?

Any help appreciated.
Title: Re: Decompression question
Post by: Simon on October 31, 2024, 01:01:18 PM
Welcome to the forums!

In 2008, I implemented this decompression in C++ from that same spec document lemmings_dat_file_format.txt. See crunch.h (https://github.com/SimonN/Lix/blob/master/src/level/crunch.h) and crunch.cpp (https://github.com/SimonN/Lix/blob/master/src/level/crunch.cpp) on github. The part with algo == 6 should be relevant.

I'll see if I find the time to take a detailed look myself and answer your question directly. I'll have to study the spec and my implementation again.

I believe that, in 2008, I didn't see such a bug in the spec. I only submitted to ccexplore/Mindless the "small fix by Simon" noted at the top of lemmings_dat_file_format. This fix was about which sections come when, not about how to interpret the bitstream.

Caveats with my C++ source:


-- Simon
Title: Re: Decompression question
Post by: deuteragenie on October 31, 2024, 03:32:25 PM
Thanks,

It is quite useful to see another implementation.

It looks like in case 6 ("111"), you also took the next 8 bits to determine the length / number of bytes to read.  This is according to specs, but ... does it produce the correct results ?
Title: Re: Decompression question
Post by: Mindless on October 31, 2024, 05:13:00 PM
111 00000100 ... is what I see for the first operation in decompressing the first segment of MAIN.DAT (of Lemmings and Oh No! More Lemmings).
Title: Re: Decompression question
Post by: deuteragenie on October 31, 2024, 06:46:14 PM
Thanks,

I am looking at main.dat with a hex editor, position 8670
The first 2 bytes (reversed) are 11100000 01000000
Broken according to specs, this is : 111 00000010

So I am still puzzled as to how one gets 4 + 9 for the number of bytes to read, as opposed to 2 + 9
Title: Re: Decompression question
Post by: Mindless on November 03, 2024, 10:28:03 PM
The num_bits_in_first_byte field is 7 so you need to throw away the last bit of the first byte, which the leaves you with 1110000 01000000 or 111 00000100 ....
Title: Re: Decompression question
Post by: deuteragenie on November 04, 2024, 07:23:10 AM
Yes, I think that might be it. Thank you !

I was puzzled by the source code of ldecomp which doesn´t seem to consider num_bits_in_first_byte, or maybe I overlooked it.

I will try again and report how it goes.


Title: Re: Decompression question
Post by: deuteragenie on November 05, 2024, 06:32:13 PM
Alright,

Taking into account num_bits_in_first_byte I can produce the same output as ldecomp for main.dat.

Many thanks for your help !
Title: Re: Decompression question
Post by: EricLang on November 07, 2024, 04:07:36 PM
I ported this decompression algorithm in 4 languages: delphi, c#, rust and zig. But I never dived into it, what it really does :)
I know it makes a lot of sense to check this checksum as well to be sure of a correct result.
Title: Re: Decompression question
Post by: deuteragenie on November 09, 2024, 08:58:52 AM
Hello,

To make it dead simple, at the cost of a bit of memory and cpu, but that is irrelevant compared to 1990's, I ended up doing this in Python:

1. Preparation
Convert the input bytestream to a bit array
Reverse the bit array
Suppress the extraneous bit(s) as determined by bits_in_first_byte

2. Processing
Then the 6 cases outlined in the specs are straightfowardedly implemented by reading the bit array "in order" until all decompressed bytes are produced and written to file.

To ensure correctness, I compared the output with the one of ldecompress.

As for the checksum, it is the compressed bytes xor'd, so maybe it is not useful to check the correctness of a decompression implementation.

Hope it helps.