entropy example 4 - davidar/scholarpedia GitHub Wiki

To understand the formula

<math>\label{approx}
R(B) \approx \frac{\frac 1nH_{\mu_B}(\Lambda^n)}{\log_2(\#\Lambda)}, </math>

consider a lossless compression algorithm applied to a message <math>B</math> of a finite length <math>N\ .</math> To save on the output length, the decoding instructions must be relatively short compared to <math>N\ .</math> This is easily achieved in codes which refer to relatively short components of the messages only. For example, the decoding instructions may consist of a complete list of the subwords <math>A</math> of some carefully chosen length <math>n</math> appearing in <math>B</math> and their images <math>\Phi(A)\ .</math> The images may have different lengths (as short as possible). The image of <math>B</math> is obtained by cutting <math>B</math> into subwords <math>B=A_1A_2\dots A_k</math> of length <math>n</math> and concatenating the images of these subwords: <math>\Phi(B)=\Phi(A_1)\Phi(A_2)\cdots\Phi(A_k)\ .</math> (There are additional issues here: in order for such a code to be invertible, the images <math>\Phi(A)</math> must form a prefix-free family, so that there is always a unique way of partitioning <math>\Phi(B)</math> back into the images <math>\Phi(A_i)\ .</math> But this does not essentially affect the computations.) For best compression results, it is reasonable to assign the shortest images to the subwords appearing in <math>B</math> with highest frequencies. It can be proved, with the help of the Shannon-McMillan-Breiman theorem, that such a code realizes the compression rate as in \eqref{approx} for nearly all sufficiently long messages <math>B\ .</math>

For example, consider the binary message of length 30:

<math>
B \ = \ 011001111001001111010001111110 \ = \ 011,001,111,001,001,111,010,001,111,110. </math>

On the right, <math>B</math> is shown divided into subwords of length 3. The frequencies of the subwords in this division are:

<math>
\begin{matrix} 000 - 0 & 001 - 0.4 & 010 - 0.1 & 011 - 0.1 \\ 100 - 0 & 101 - 0\ \ \ & 110 - 0.1 & 111 - 0.3 \end{matrix} </math>

The theoretical value (obtained using the formula \eqref{approx} for <math>n=3</math>) of the best compression ratio is

<math>
-0.4\log_2(0,4)-0.3\log_2(0,3)-3\cdot 0.1\log_2(0.1)\approx 68,2\%. </math>

A binary prefix-free code giving the shortest images to the most frequent subwords is

<math>
\begin{matrix} 001 &\mapsto& 0, \\ 111 &\mapsto& 10, \\ 010 &\mapsto& 110, \\ 011 &\mapsto& 1110, \\ 110 &\mapsto& 1111. \end{matrix} </math>

The image of <math>B</math> by this code is:

<math>
\Phi(B)= 1110,0,10,0,0,10,110,0,10,1111 = 111001000101100101111 </math>

The compression rate achieved on <math>B</math> using this code equals <math>21/30 = 70\%\ ,</math> which means that the code is quite efficient on <math>B\ .</math>

Remark: The image given here does not include the decoding instructions, which comprise simply a record of the above prefix-free code. One has to imagine that <math>B</math> is much longer and then the decoding instructions (of fixed length) do not affect the compression ratio.

⚠️ **GitHub.com Fallback** ⚠️