Misc Ideas - cr88192/bgbtech_misc GitHub Wiki

Table of Contents

BTLZHZIP

Possible stand alone stream compressor for BTLZH.

  • Would serve a similar purpose and have a similar interface to GZIP or XZ.
Possibility A:
  • Reuse GZIP format, but with BTLZH payload.
Possibility B:
  • Use BTIC CTLV or BTLV2
  • Compressed buffer would consist of a number of compressed segments, each encoded independently.
    • Segment size will be specified in a header, but is likely measured in MB for normal files.
    • Likely, the segment size will be between 1x and 4x the size of the sliding window.
FOURCC("BTLZ")
  • params:WORD
    • Indicates which optional parameters are present.
  • segSize:BYTE
    • log2 of segment size.
  • winSize:BYTE
    • log2 of window size.
Segment:
  • Each will consist of a Context-Dependent Data marker, which in turn holds compressed data in Zlib format.
  • Either Deflate or BTLZH or BTLZA may be used.
Possible:

(Possible) AdERk

Unary prefix followed by ((Q+1)*(k+(Q>>2))>>1) bits.

  • k=0: 0, 10, 110, 1110, 111100 111101 ...
  • / k=1: 00 01, 1000..1011, 110000..110111, 111000000..111011111, 111100000000..111101111111, ...
  • k=1: 0, 100 101, 1100..1101, 111000..111011, 11110000..11110111, ...
  • k=2: 00 01, 1000..1011, 110000..110111, 11100000..11101111, ...
  • ...
Uses same basic adaptation rules as AdRice.

Base (for Q):

  • k=0: 0, 1, 2, 3, 4, 6, ...
  • k=1: 0, 1, 3, 5, 9, ...
  • k=2: 0, 2, 6, 14, 30, ...
Issue: Sucks.

ARk2

Idea: Scheme similar to AdRice, but tweaking the code scheme.

As before, the Rk factor will specify a constant number of extra bits.

So, Prefixes:

  • 0, 0
  • 1, 10
  • 2, 110
  • 3, 1110
  • 4+, 1111 (Gamma)
    • Encode Q-4 using a Gamma code.
    • Gamma code consists of a Unary code (N) followed by N bits.
Issue: Can't reused Rice LUTs.

AdRiceDC

AdRice + Damage Control

AdRiceDC:

 Q=0..7: Encoded as AdRice
 Otherwise, use a VLI (with multiples of 5 bits)
     0-   31: 1111-1111 0xxx-xx
     32-1023: 1111-1111 10xx-xxxx xxxx
  1024-32767: 1111-1111 110x-xxxx xxxx-xxxx xx
 32768-   1M: 1111-1111 1110-xxxx xxxx-xxxx xxxx-xxxx
    1M- 128M: 1111-1111 1111-0xxx xxxx-xxxx xxxx-xxxx xxxx-xx
  128M-   1G: 1111-1111 1111-10xx xxxx-xxxx xxxx-xxxx xxxx-xxxx xxxx

In VLI case, K is increased by 2 for each multiple of 5 bits in the suffix.

For AdRice:

  • 0: k=k-1
  • 1: k=k
  • 2: k=k+1
  • 3: k=k+1
  • 4..7: k=k+2
Issue:
  • Works, but contrived.
Update:
  • Initial form was distribution sensitive, better for some cases, but worse for others.
  • Changing cutoff from 8 to (8+k) and 5 to (5+k) seems to have improved results.
⚠️ **GitHub.com Fallback** ⚠️