Appendix Palette Prediction - AlexBoswellVCD/gitlab_wiki_test GitHub Wiki
A palette refers to a subset of the color space. Palette prediction is used to reduce the redundancy in coding pixel information within a given block. Palette prediction is suitable for applications where most colors come from a very small subset of the color space (e.g. PC screen content), there is little or no noise in the pictures, and where the picture may involve repetitive patterns.
In AV1, a palette specifies the most likely colors present in a given block. Palette prediction makes use of separate color index maps with 2 to 8 base colors for each of the Y, U and V planes, and is applicable only to block sizes that are larger than 8x8 blocks and have a dimension smaller than 128.
The context associated with each pixel in the block is used to encode the palette index for that pixel. Wavefront processing of the pixels is considered to allow for parallel processing. The bitstream includes the number of base colors and the base colors used in encoding a given block.
To illustrate the idea behind palette prediction, consider the 4x4 block shown in the figure below. In this case the palette consists of three colors with index 0, 1 and 2. Based on the selected palette, the index map of the block can be generated as shown below. The encoding of the indices then proceeds in a wavefront manner as indicated below.
Figure 1. Example of a 4x4 source block, corresponding palette, index map and wavefront processing pattern.
The index for each pixel is encoded using the top and left encoded indices as context, as shown in the table below.
Pixel | Context |
---|---|
0 | - |
1 | 0 |
2 | 0 |
3 | 1 |
4 | 1, 2 |
5 | 2 |
6 | 3 |
… | |
15 | 13, 14 |
Inputs: Input source video
Outputs: Palette information (palette size, colors, and map tokens)
Control macros/flags:
Flag | Level (sequence/Picture) | Description |
---|---|---|
--palette | Configuration |
To enable palette from the cli. 0: off; 1: slow; … ; 6: fastest. Auto mode=-1 if not set from the configuration |
palette_mode | Picture based | Set based on the configuration palette mode. For auto mode it is set to 6 for M0. |
The feature is currently active only:
- For encoder mode 0.
- Only when screen content encoding is active, either through
- Setting screen content encoding to Auto mode, where creen-content-type of pictures are flagged based on detector information, or
- Setting the screen content encoding to Manual mode, where the input sequence is encoded as screen content.
Palette prediction applies to both Intra and Inter frames.
Details of the implementation
Candidates Injection:
For the blocks where palette is enabled (based on block sizes and also
only when screen content encoding is set) via
(**svt_av1_allow_palette
).
The function inject_palette_candidates
is invoked to create and
inject palette candidates.
The candidates are signaled using the Intra DC mode. This function
calls another function (**search_palette_luma
) in order to
determine all palette candidates.
The palette prediction candidates are determined by performing two types of search, namely a search based on the most dominant colors and a search based on the K-means clustering of the colors in the block.
Most dominant colors search: In this search, a histogram of the colors in the source block is generated. The number of the most used colors to pick from the histogram depends on the number of colors it is desired to include in the palette, which ranges from a minimum of 2 to a maximum of 8.
The K-means clustering of the colors: A K-mean algorithm is used to cluster the set of input source block into a fixed number of colors. The number of clusters to consider depends on the number of colors it is desired to include in the palette. The algorithm goes through a maximum of 50 iterations to converge on the best clustering for a given number of clusters.
Given that the number of colors should be in the range [2..8], up to 7 candidates can be generated from each of the two methods described above. In total, up to 14 candidates are created for each source block.
In mode decision, palette prediction candidates are assigned a special MD candidate class.
The following quality complexity trade-offs are present in the code. The tradeoff is realized by adjusting the number of input candidates (NIC) in the last MD stage and by restricting the search options for the best number of colors. The NIC is expressed as a triplet x/y/z corresponding to the case of a base layer picture, reference picture and non-reference picture, respectively (e.g. the number of input palette prediction candidates for reference pictures at the last MD stage is y). The input candidates could be a mix of candidates generated based on the two candidate search methods outlined above.
palette_mode | Description |
---|---|
1:Slow | NIC=7/4/4 |
2 | NIC=7/2/2 |
3 | NIC=7/2/2 + No K-means injection for non-reference frames |
4 | NIC=4/2/1 |
5 | NIC=4/2/1 + No K-means for Inter frame |
6:Fastest | NIC=4/2/1 + No K-means for non-base + step_2 for non-base for most dominant colors |
For palette_mode
6, step 2 refers to a method of selecting the subset
of the most dominant colors to include in the palette. For example, if
the block involves 7 colors, then only three candidates with palettes
based on the most dominant 7, 5 and 3 colors are injected, and the
candidates based on the most dominant 6 and 4 colors are discarded.
The most important signals/parameters which are sent in the bit stream regarding palette prediction:
- Size of the palette: How many colors are used for the current block.
- The palette colors:
- Special techniques are used to code the colors in the bit-stream;
- The encoder and decoder maintain a cache of colors computed based on the neighboring colors. Information on whether some colors could be re-used from the cache is sent in the bit-stream using fewer bits as compared to signaling the colors directly.
- Moreover, in the search, some colors are changed to use one of the cache elements if they are close to each-others
- The palette indices map