Hello,
I am trying to decompress an (original version) main.dat file, following the instructions at:
https://www.camanis.net/lemmings/files/docs/lemmings_dat_file_format.txt
The length of the first section is 8670 bytes (compressed), incl. 10 bytes for the header.
At position 8670 in main.dat, I find the following bits (in reverse order):
111 00000010 0000000 ...
According to the spec, "111" indicates that the following 8 bits represent an amount of bytes to read + 9.
So, we should have 2 + 9 = 11 bytes to read.
But ... the ldecomp says 13, i.e. 4 + 9 bytes to read.
Am I overlooking something ? Should it be that the next 9 bits indicate the number of bytes to read, like this:
111 000000100 00000000 ...
Or is ldecomp wrong ? Or the spec wrong ?
Any help appreciated.
Welcome to the forums!
In 2008, I implemented this decompression in C++ from that same spec document lemmings_dat_file_format.txt. See crunch.h (https://github.com/SimonN/Lix/blob/master/src/level/crunch.h) and crunch.cpp (https://github.com/SimonN/Lix/blob/master/src/level/crunch.cpp) on github. The part with
algo == 6 should be relevant.
I'll see if I find the time to take a detailed look myself and answer your question directly. I'll have to study the spec and my implementation again.
I believe that, in 2008, I didn't see such a bug in the spec. I only submitted to ccexplore/Mindless the "small fix by Simon" noted at the top of lemmings_dat_file_format. This fix was about which sections come when, not about how to interpret the bitstream.
Caveats with my C++ source:
- Comments are in German. I was a beginner at open-source software development then.
- I never decompressed main.dat with this. I used my implementation only on levels and tilesets. Maybe both crunch.cpp and the spec are wrong, and my usages merely never ran into such a bug.
- It's been 16 years since I wrote this, and I haven't maintained the C++ repository for 10 years.
-- Simon
Thanks,
It is quite useful to see another implementation.
It looks like in case 6 ("111"), you also took the next 8 bits to determine the length / number of bytes to read. This is according to specs, but ... does it produce the correct results ?
111 00000100 ... is what I see for the first operation in decompressing the first segment of MAIN.DAT (of Lemmings and Oh No! More Lemmings).
Thanks,
I am looking at main.dat with a hex editor, position 8670
The first 2 bytes (reversed) are 11100000 01000000
Broken according to specs, this is : 111 00000010
So I am still puzzled as to how one gets 4 + 9 for the number of bytes to read, as opposed to 2 + 9
The num_bits_in_first_byte field is 7 so you need to throw away the last bit of the first byte, which the leaves you with 1110000 01000000 or 111 00000100 ....
Yes, I think that might be it. Thank you !
I was puzzled by the source code of ldecomp which doesn´t seem to consider num_bits_in_first_byte, or maybe I overlooked it.
I will try again and report how it goes.
Alright,
Taking into account num_bits_in_first_byte I can produce the same output as ldecomp for main.dat.
Many thanks for your help !
I ported this decompression algorithm in 4 languages: delphi, c#, rust and zig. But I never dived into it, what it really does :)
I know it makes a lot of sense to check this checksum as well to be sure of a correct result.
Hello,
To make it dead simple, at the cost of a bit of memory and cpu, but that is irrelevant compared to 1990's, I ended up doing this in Python:
1. Preparation
Convert the input bytestream to a bit array
Reverse the bit array
Suppress the extraneous bit(s) as determined by bits_in_first_byte
2. Processing
Then the 6 cases outlined in the specs are straightfowardedly implemented by reading the bit array "in order" until all decompressed bytes are produced and written to file.
To ensure correctness, I compared the output with the one of ldecompress.
As for the checksum, it is the compressed bytes xor'd, so maybe it is not useful to check the correctness of a decompression implementation.
Hope it helps.