Misc. technical details - HelpSeeker/Restricted-WebM GitHub Wiki

This page is a collection of thoughts / technical details for those interested.

General process

Here a quick overview over the process behind this script.

  -----------      ----------------------      -----------      -------------      ------------------  
  |  INPUT  |  ->  |  GENERAL SETTINGS  |  ->  |  LIMIT  |  ->  |  ENHANCE  |  ->  |  FINAL OUTPUT  |  
  -----------      ----------------------      -----------      -------------      ------------------

The goal is to produce an output that is smaller than the max. size limit (--size) and bigger than the min. size limit = undershoot ratio * max size (--undershoot). It's easier to split this goal into two parts and focus on one at a time.

First we collect the general settings. Those settings will stay the same for all encoding attempts. They are

Input settings (input/output path)
Trim settings (start time, output duration)
Map settings (stream selection)
Audio settings (channel bitrate, stream bitrate, sample rate)
Initial video bitrate

With those settings gathered we can start limiting the output file size to be below the max. size limit.

To limit the output size we go in the worst case through 3 bitrate modes.

VBR / CQ + qmax (minimum quality)
VBR / CQ
CBR

Each bitrate mode gets i encoding attempts (--iterations) to reduce the file size below the max. size limit and each encoding attempt includes the following steps

set bitrate (either initial one for the first attempt of a mode or calculated based on the previous one and how much the resulting size exceeds the limit; a min. reduction of 10% is enforced)
reduce height / frame rate based on bitrate and bpp threshold
assemble video settings
assemble filter string
convert input to temporary file
read temp file size
if it's the smallest file yet,rename temp file to output and save its size as best try
if the last temp size is only 1% larger/smaller than the current one, skip ahead to the next bitrate mode

The first output below the max. limit will terminate the limiting process. If the smallest file still exceeds the max. limit after all 3 modes, then the script will print an error and move on to the next one.

Provided the output is small enough, it enters the enhance process. Its purpose is to raise the output size above the min. limit. If the file is already big enough after the limit process, the enhance process will immediately terminate and the script will move on to the next input.

Otherwise the enhance process has another i encoding attempts to raise the output size, while using the bitrate mode, which produced the first small enough output. Those attempts go through the same stages as before, except

the first bitrate already gets calculated based on the last bitrate/size combination
the temp size needs to smaller than the max. limit, but also bigger than the current best try to become the best try

The first output in between the min. and max. size limit will terminate the enhance process. Then the script moves on to the next file.

If the output size is still to small after the enhance process, then the script will print an error message and move on as well.

Audio bitrate

Just like the initial video bitrate, the audio bitrate gets chosen based on a pre-defined formula. This formula takes the max. size limit, the output duration and the number of audio channels (of all output audio streams) into consideration. To reduce the amount of variable settings for each encoding attempt, the audio bitrate gets chosen in the beginning and stays the same throughout all attempts.

When I talk about the audio bitrate, I actually mean the audio channel bitrate. This value multiplied with the number of channels of the stream results in the bitrate that is used to encode the stream.

Here the actual decision process:

formula = max_size_in_Bytes * 8 / (duration_in_sec * audio_factor * audio_channels * 4 * 1000)"

factor  <   1 : bitrate =  6 Kbps
factor  <   2 : bitrate =  8 Kbps
factor  <   3 : bitrate = 12 Kbps
factor  <   4 : bitrate = 16 Kbps
factor  <   6 : bitrate = 24 Kbps
factor  <   8 : bitrate = 32 Kbps
factor  <  28 : bitrate = 48 Kbps
factor  <  72 : bitrate = 64 Kbps
factor  < 120 : bitrate = 80 Kbps
factpr >= 120 : bitrate = 96 Kbps

It is, at its core, supposed to recreate my experience with 4MB WebMs and was extended from there. For low size limits the audio factor represents the video/audio bitrate ratio (i.e. factor 5.5 -> audio bitrate ~18% of the video bitrate). Its meaning for higher bitrates is more abstract, as the threshold for switching to the next higher bitrate gets continually increased.

Bitrate modes

I admit that I use the phrase "bitrate mode" far too loosely. Strictly speaking libvpx offers 4 distinct ways to control the bitrate.

Constant quality (Q)
Constrained quality (CQ)
Variable bitrate (VBR)
Constant bitrate (CBR)

You can find more in-depth information here.

Restricted-WebM by default uses VBR and CBR, as those are most suited to produce files with a certain size.

CQ may be used with the corresponding flag, but in general it's not worth it. Having a minimum quality via qmax is far more important for the overall quality than the usage of CQ. CQ has the potential to produce a slightly more consistent quality, but at the same time it makes the output size far more unpredictable. 2-pass VBR is likely to hit the mark even with very high undershoot ratios (>0.95), while CQ has trouble with ratios >0.9. The difference in max. achievable file size (while still staying below the max. size limit) is often more beneficial to the quality than CQ's slightly better bitrate distribution.

The ability to use Q won't be included, as Restricted-WebM's focus lies on size control and not max. quality.

Multithreading

A few words regarding multithreading.

libvpx isn't a good encoder. One of its many shortcomings is the lack of advanced multithreading support. Currently libvpx only offers tile-based multithreading, which has three drawbacks

you'll never reach 100% CPU usage on anything but ancient hardware
more tiles decrease the quality slightly
consecutive encoding attempts with the same settings produce slightly different output sizes

The legacy version provides a different approach to multithreading (see this wiki entry). To summarize, it split the input video into X parts and then encoded each part singlethreaded parallel. This approach made it possible to reach 100% CPU usage even for small resolutions and new CPUs, but unfortunately it introduced a multitude of problems

drastically reduced quality for low bitrate encodes (libvpx can't shift bits to where they are needed)
could potentially explode the file size (based on the used bitrate mode)
using a lot of threads required ridiculous amounts of RAM
Time-based filters couldn't be used

I decided against including this makeshift multithreading in newer versions of Restricted-WebM, because of those problems.

User filters

FFmpeg filters are powerful tools for media manipulation. The possible combinations and ways to chain them together are sheer endless. That's why a simple sounding task, like adding two filters to the end of an already existing filter string, is actually quite complicated.

Let's look at a few examples:

yadif
            -> yadif,scale=-2:480:flags=lanczos,fps=24

volume=200
            -> [0:a]volume=200;[0:v]scale=-2:480:flags=lanczos,fps=24

[0:v]colormatrix=bt709:bt601;[0:a]volume=50
            -> [0:v]colormatrix=bt709:bt601,scale=-2:480:flags=lanczos,fps=24;[0:a]volume=50 or
            -> [0:v]colormatrix=bt709:bt601[x];[0:a]volume=50;[x]scale=-2:480:flags=lanczos,fps=24 or
            -> [0:a]volume=50;[0:v]colormatrix=bt709:bt601,scale=-2:480:flags=lanczos,fps=24

Yes, filter concatenation can get quite tricky. However, there's a workaround that this script uses. Instead of trying to apply all filters in one go, it applies the user filters first and then the filters set by the script. This is done via a "raw command" before the actual conversion.

ffmpeg -i input.video -c:v rawvideo -c:a pcm_s16le -filter_complex USER_FILTER -strict -2 -f matroska - | \
ffmpeg -i - [...] output.webm

This simplifies things, but there's one drawback: You can't run the first FFmpeg command without additional resource usage. I'd have to lie if I wanted to claim that this performance impact isn't noticeable. To minimize the impact Restricted-WebM will toggle stream copying for the raw command, if no filter gets applied to the type of stream (i.e. video gets copied if only audio filters are present and vice versa).

Colormatrix

When converting an input video with a color space other than BT.601 to VP8 or SD VP9 the output colors will be slightly off. The reason is that ffmpeg tags the output as BT.601, but ignores any color transformation that might be necessary. For VP8 that's always the case as it doesn't support any other color space. VP9 supports BT.709, but ffmpeg still drops any information about the correct color profile when scaling it down to SD resolution.

I wanted to include an automatic correction for some time now, but at this point I deem it impossible. While ffprobe is capable of showing the necessary information, most videos I encountered don't have their color space properly tagged, leaving one with nothing but unknown to work with.
Therefore I leave it to the user to address this issue. Use ffmpeg's colormatrix filter to properly transform the colors. For example: To convert BT.709 to BT.601 use colormatrix=bt709:bt601.

If somebody knows how to reliably detect the input color space or tell ffmpeg to automatically do the transformation, please open an issue and let me know.

AV1

AV1 is a new video coding format developed by the industry consortium Alliance for Open Media. AV1 is actually quite interesting. It's an open standard, but in contrast to past open video standards (Theora, VP8, VP9), it actually offers a better compression than its patent encumbered counterparts (AVC, HEVC). Because of this and the enormous industry backing, people are actually interested in AV1 and already started working on FOSS encoders other than the reference implementation libaom, which hopefully will prevent another libvpx situation (i.e. the standard suffers from only having subpar encoders).

The reason why I write about AV1 at this point is that AV1 is WebM compatible and would therefore be qualified to be added to this script as another video option. However, despite my interest in the topic, I won't include AV1 support any time soon. There are a few reasons for this

Encoding speed

This is the biggest drawback of AV1 right now. The encoding speed of libaom is abysmal. While this is understandable for a reference encoder of a new format, it is still less than ideal for a script that uses multiple encoding attempts to fit a file within a certain file size limit.

FFmpeg ships with libaom

As mentioned before, there are other FOSS encoders next to libaom, which are rav1e and SVT-AV1. I have especially good experiences with SVT-AV1 and in fact I would implement AV1 support, if most FFmpeg builds would come with it by default. However, you have to compile FFmpeg yourself in order to add SVT-AV1 support, which can't be expected of the average user.

Not enough experience with AV1

As it stands now I just don't have enough experience with AV1 encoding. I've managed to get some good results with SVT-AV1, dabbled a bit with rav1e and did some lossy and lossless comparisons with libaom, but that's about it. The slow encoding speeds and the relatively newness of the format stopped me from doing more than that.