In regard to my last post – which, after making the video, I somehow never completed…
One of the cruxes of VTerm was the relatively large amount of data (haha – “large amount” meaning one or two thousand bytes) that needed to be transferred from MAME to PiTrex.
My first example was Lunar Lander, which has a surprisingly high vector count for such an old game – easily up to 500 vectors. The data PiTrex received per frame was approximately 1700–1800 bytes every 1/50 second (over a 912000 baud serial connection).
Right now, I’m at the Baltic Sea again. I don’t have a Vectrex with me – but I still feel the urge to follow up.
Size Reduction
Therefor I have been communicating with chatgpt and with its help implemented:
- RLE
- LZ4 algorythm
- Huffman encoding
- Binary DIFF
RLE
RLE (Run-Length Encoding) is a simple compression method that replaces sequences of the same data value with a single value and a count.
Example: AAAABBBCCDAA
→ 4A3B2C1D2A
.
Ok. No. I lied – I did not. Run Length Encoding is probably not a good way to reduce size of “arbitrary” vector data.
LZ4
LZ (Lempel-Ziv) and LZ4 are compression methods that replace repeated data with references (offset and length) to earlier occurrences in the data. LZ4 is a fast variant optimized for speed over maximum compression.
Huffman
Huffman encoding is a compression method that assigns shorter binary codes to more frequent symbols and longer codes to less frequent ones, reducing overall data size.
Noteworthy with regard to Huffman encoding is that it always requires a dictionary – one that specifies which original data is represented by how many and which bits.
The Huffman code I used is based on a 256-byte dictionary.
So, each packed data block also includes 256 bytes that have nothing to do with the original content.
It might be possible to find a fixed dictionary that works well. If such a fixed dictionary is used on both ends (encoding and decoding), the size of Huffman-compressed data could be reduced by those 256 bytes.
(I haven’t started an analysis yet – perhaps even a seperate dictionary per game?)
Diff
Diff encoding stores only the differences between two versions of data (e.g. current and previous), instead of the full new version. This reduces size when changes are small.
Mode
Another variant I experimented with is how the data to be sent is represented.
So far, I’ve tried three different approaches:
- Full Lines
Each line is represented by an absolute start point(x0, y0)
and an absolute end point(x1, y1)
. - Half Delta
Each line is represented by an absolute start point(x0, y0)
and delta values(dx, dy)
relative to that point. - Full Delta
Each line is represented entirely by deltas: a delta to the last known position (starting at(0,0)
), giving(dx0, dy0)
, followed by deltas from that to the end point(dx1, dy1)
.
As it turns out, the mode of representation makes a big difference when applying LZ4 or Huffman encoding – but almost no difference when doing “diffs.”
I also tried stacking different encodings – for example, applying LZ4 first and then Huffman (resulting in LZH encoding). However, due to the relatively small size of the data, this usually yields little to no benefit. In some cases, the final size is even larger than the original uncompressed data, often due to the fact that Huffman encoding always adds 256 bytes for the dictionary.
Result
In my experiments (I’ve only tested a few games so far), the simple DIFF encoding is by far the most effective. Adding Huffman compression could help – if a fixed dictionary could be found. Without a fixed dictionary, however, Huffman always increases the data size!
Although the chosen mode of representation doesn’t affect the DIFF method as much as it does other encodings, all data clearly show that the combination of Full Delta mode + DIFF results in very efficient data transfer.
On average:
- Lunar Lander
- Without packing: ~1750 bytes
- With DIFF + Full Delta: ~150 bytes
- Major Havoc
- Without packing: ~1500 bytes (in-game), up to ~3500 bytes (title/other screens)
- With DIFF + Full Delta: ~300 bytes
- Star Wars
- Without packing: ~1500–2000 bytes (space scenes), 2000–2800 bytes (trench scenes)
- With DIFF + Full Delta: ~320 bytes
Some Data:
Lunar Lander
After about 250 “screens”
Mode: Full Lines
Lz4 average size: 1583!
Huf average size: 1669!
Dif average size: 155!
Dif Huf average size: 343!
Mode: Half delta
Lz4 average size: 1357!
Huf average size: 1271!
Dif average size: 154!
Dif Huf average size: 341!
Mode: Full delta
Lz4 average size: 1101!
Huf average size: 824!
Dif average size: 151!
Dif Huf average size: 341!
Star Wars
After about screens 2000 in game, entering the trench.
Mode: Full Lines
Lz4 average size: 2093!
Huf average size: 2201!
Dif average size: 513!
Dif Huf average size: 650!
Mode: Half delta
Lz4 average size: 1983!
Huf average size: 1853!
Dif average size: 578!
Dif Huf average size: 682!
Mode: Full delta
Lz4 average size: 1610!
Huf average size: 1235!
Dif average size: 319!
Dif Huf average size: 460!
Major Havoc
After about 2400 Screens, walking in the first level.
Mode: Full Lines
Lz4 average size: 1240!
Huf average size: 1404!
Dif average size: 505!
Dif Huf average size: 647!
Mode: Half delta
Lz4 average size: 1130!
Huf average size: 1132!
Dif average size: 430!
Dif Huf average size: 541!
Mode: Full delta
Lz4 average size: 925!
Huf average size: 771!
Dif average size: 305!
Dif Huf average size: 438!