Hm... would I be right in thinking then that the first code (in the stored order - the last code in processing order) would *have* to be a reference one in order for it to work properly, then?
I think the short direct encoding (ie. 00 nnn bytes ...) can actually also work there. The game's decompression algorithm would've already read the byte that contains the "00 nnn" before it got overwritten, so even though the memory location containing "00 nnn" is already overwritten before processing the final 00nnn encoding, the byte containing those bits was already read earlier and so the algorithm can still pull the correct bits out from the read value (as opposed to having to read it from the memory location where it got overwritten). Whereas the long direct encoding (ie. 111 nnnnnnnn bytes) will not work as a final processed code, because the "111 nnnnnnnn" cannot fit within one byte--the game would've already read the byte containing "111" but not yet "nnnnnnnn" when both become overwritten from processing the second-to-last code, and thus you lost the nnnnnnnn.
To be fair, it does seem like it could be theoretically possible to create LVL data that a non-aware compression algorithm would use the long direct encoding there. (Indeed, even several consecutive short direct encodings there would become problematic.) I will have to do some more testing and investigation to better understand this.
Of course, for VGAGR files you could simply add some junk data that's never actually referred to by any objects, but for other files (especially levels) that could be problematic (however, it seems that so far, everything's managed to deal with it).
The junk data approach is probably the simplest way to handle this for VGAGR files. I don't believe a general method exists--in theory you could be given basically uncompressible data, such that there'd be no way to encode the data in the compressed format without taking up more bytes than the original data. Whenever that's the case it'd be impossible for the algorithm to work correctly using the same memory buffer for both.
Also - a thought on the unused bytes in the DAT files, since they're big-endian, isn't it possible that the data sizes are just 4-byte values?
Well yes, the format may have been designed as such, but the DOS game definitely only uses the lower 2 bytes and ignore the upper 2. It is probably due to the fact that real-mode DOS programs like Lemmings are 16-bit--conceivably the same file format could've been used across other platforms like Amiga, where 32-bit arithmetic might have been natively supported by the CPU and those ports could handle the data sizes as 32-bit values.