Legacy Information - HelpSeeker/Restricted-WebM GitHub Wiki

This page contains all the previous wiki entries regarding the original (legacy) version.

Please note that I can't guarantee 100% technical accuracy. I haven't updated/corrected these paragraphs in a long time and don't plan to do so in the near future.


Table of contents

  1. Options in more detail
  2. Change defaults permanently
  3. Fast encoding mode
  4. Known errors

Options in more detail

Trim mode:

Lets you define which part of the input video to use. Input is required for each individual file, so if you want to convert a folder full of videos in one go, trim them beforehand. The start and end time must be specified in seconds (also works with fractions of a second). Start defaults to 0 seconds and end to the complete length of the video, if the input line is left empty.

Audio mode:

Adds an audio stream to the output webm. The audio bitrate gets chosen automatically based on video length and file size limit.

  • Standard mode: 48 - 192 kbps
  • HQ mode: 96 - 192 kbps
  • Audio showcase mode: 96 - 192 kbps (also higher bitrates get chosen sooner)

The script will also attempt to copy the input audio stream, if

  • it has the same codec (Vorbis or Opus, based on the -n flag).
  • the audio bitrate of the input is less or equal the one chosen by the script.
  • the trim mode isn't being used.

Audio mode is off by default, but gets used automatically during the audio showcase mode. So there's no need for the -a flag, if you already have to -c flag set.

HQ (high quality) mode:

This mode does three things to ensure a higher quality output.

  • Further downscaling compared to normal mode
  • 2-pass encoding
  • Minimum audio bitrate of 96kbps

It should be mentioned, that HQ mode (despite its name) doesn't always produce superior results compared to the normal mode. Mainly if the video bitrate is less than 200Kbps. The reason for that is VP8's 2-pass encoding.

VP8's 2-pass encoding delivers better results at high bitrates, but at the lower end of the spectrum it becomes a wildcard. Consistency is the keyword. Single pass tends to produce a much more consistent visual quality throughout the whole video. 2-pass on the contrary will make some frames look much better and others a lot worse. This shift of quality between scenes is jarring and will make the video look worse as a result.

VP9 doesn't suffer the same problem and should always be used in combination with HQ mode.

To summarize: In general HQ mode will produce better looking webms with a smaller resolution. If the video bitrate is less than 200Kbps and you use VP8, it's usually better to stick with single pass or do both and compare the results.

Newer codecs:

VP9/Opus are the successors of VP8/Vorbis and produce better results, especially at low bitrates. If a website allows webms with those codecs and you don't mind (much) longer encoding times, then go for them.

Please note, that VP9 should only be used in combination with HQ mode. Single pass VP9 is in my opinion broken and has the potential to produce worse results than VP8.

Fast encoding mode:

See the fast encoding mode wiki page.

File size limit:

Not much to say here. While I had 4chan's limits in mind while writing this script, it works with any file size limit.

Audio showcase mode:

Produces webm files with a frame rate of 1fps and with a static image as video stream. This leads to an incredible small video stream size, so that we can put more emphasis on the audio bitrate (ranges from 96kbps to 192kbps in this mode). The three flavours of this mode define how to locate the necessary input images.

  • auto: The script assumes that there is a picture (with matching filename) for every input file in to_convert, located in showcase_pictures. The extension doesn't matter. You can use any image that your version of ffmpeg is able to handle. This is the best option if you want to convert many files in one go.
  • manual: The script asks you for the location of each input picture. This doesn't require attention when it comes to additional folder structure, but prevents continuous encoding.
  • video: Instead of looping an input picture, the script applies the usual showcase settings to the input videos in to_convert. Use this if you already have a video with a static image as video stream (e.g. from YouTube). Any other video content will become a slide show and is likely to exceed the specified file size limit.

Filters:

Here you can enter your usual ffmpeg filters. This string will be used directly in the ffmpeg command, so it'll throw an error if you make any mistakes in it. Note that using the scale filter will disable automatic downscaling, so if you want to force the input resolution or have a min. resolution higher than 180p, use the scale filter manually.

Normally it doesn't matter in which sequence the individual filters are lined up, but that changes if:

  • audio and video filters will be applied at the same time

AND

  • automatic downscaling shall be used

In such a case audio filters must come before video filters. For example:

-f "[0:a]afade=t=out:st=60:d=5;[0:v]fade=t=out:st=60:d=5"

These filters will fade out the video and audio after 60 seconds, over the course of 5 seconds. It's also important to use quotes, as audio and video filters need to be separated by ; .

Undershoot limit:

The initial video bitrate calculation is no exact science and the final output size heavily depends on what footage the video is showing. The undershoot limit prevents the script from stopping / going on to the next file, when it doesn't utilize a certain percentage of the given file size limit (default 75%). Can range from 0 (completely disabled) to 1 (has to be exactly the file size limit). Personally I use 0.9 most of the time, which works fine. I wouldn't go higher than 0.95 though, as it only leads to a lot more encoding attempts for a minimal gain.

Iterations:

The script cycles through 3 bitrate modes and during each bitrate mode it adjusts the bitrate several times. With the -i flag you can specify how many encoding attempts there will be for each bitrate mode (default: 3). Additionally the script will make i*2 attempts (so by default 6) once it got a webm within the file size limit, but not above the undershoot limit.

Height threshold:

To prevent those famous "webms for ants" there's a minimum height threshold (by default it's 180 pixels). This threshold provides a limit to how much the script is able to downscale the output webm. If the input video height is less than the threshold, then the output will have the input's height. This option overrides the default threshold and provides an easy way to define a minimum height for the output webm.

Bpp threshold:

The bpp (bits per pixel) value is a quality control factor and mainly used to determine how much to downscale the output. Basically: Higher bpp threshold -> smaller output resolution -> higher perceived quality. By default the normal mode uses a bpp threshold of 0.04, while HQ and audio showcase mode aim at 0.075.
Setting a custom bpp threshold lets you further improve the quality of your webm at the cost of resolution. Note that values between 0.1 and 0.2 should already provide very high quality output.

HQ min. audio bitrate:

HQ mode is supposed to produce higher quality webms. This holds true for both video and audio. During HQ mode there's higher minimum audio bitrate. By default it's 96Kbps, which is a decent bitrate for Vorbis (comparable to MP3@128Kbps; see Results of the public multiformat listening test - July 2014).
This minimum audio bitrate can become troublesome for very long webms with a relatively small file size limit (e.g. 4MB for a 4 minute long video) or if you don't want to waste the bitrate on human speech. With this option you can reduce the minimum audio bitrate (or increase it for whatever reason) for HQ mode. Setting it to anything between 0 and 48 will produce the same results audio-wise as the normal mode.


Change defaults permanently

You can easily change the default behaviour of this script to cater to your needs.
If you open the script with a text editor, you'll see the following section at the beginning.

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Default settings
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# These values represent the script's default behaviour
trim_mode=false
audio_mode=false
hq_mode=false
showcase=false
new_codecs=false
parallel_convert=false
parallel_afilter=false
ask_parallel_afilter=true

# These values change the default limits of the script
file_size=3
parallel_process=1
undershoot_limit=0.75
adjust_iterations=3
height_threshold=180
bpp_threshold=0.04
# hq_bpp_threshold is used for both HQ and audio showcase mode
hq_bpp_threshold=0.075
# 96kbps produces decent results for Vorbis, comparable to mp3 at 128kbps
# http://listening-test.coresv.net/results.htm
hq_min_audio=96

Modes

To use a special mode (e.g. HQ mode) or VP9/Opus by default, set the corresponding variable from this list to true.

trim_mode=false
audio_mode=false
hq_mode=false
new_codecs=false

Note that showcase and parallel_convert aren't on this list.

If you want to use the audio showcase mode by default, you also need to add a default showcase_mode (auto, manual or video).
For example:

showcase=true
showcase_mode="auto"

To stop using the audio showcase mode by default, be sure to also remove the default showcase_mode. Otherwise it will break the script.

If you want to use the fast encoding mode by default, you also need to adjust parallel_process further down. Ideally use the number of cores your CPU has.

parallel_convert=true
[...]
parallel_process=6

Only if both variables have a custom value will the fast encoding mode be used. Otherwise the script will revert to its normal behaviour.

Audio filter prompt

The audio filter prompt only appears under specific circumstances (see the fast encoding mode wiki page for more infos).
If this prompt annoys you, take a look at these two variables

parallel_afilter=false
ask_parallel_afilter=true

parallel_afilter sets the default behaviour of the script during fast encoding mode.

true ... The script will apply audio filters, if there are any present.
false ... The script won't apply audio filters.

ask_parallel_afilter controls whether or not the prompt appears.

true ... The prompt will appear. parallel_afilter's default doesn't matter as it gets overwritten.
false ... No prompt will appear. The script will fall back on parallel_afilter's default value.

Limits/thresholds

The following list holds all limits or thresholds used throughout the script. If you haven't done so, take a look at the Options in more detail page. Afterwards all these variables should be self explanatory.

file_size=3
parallel_process=1
undershoot_limit=0.75
adjust_iterations=3
height_threshold=180
bpp_threshold=0.04
# hq_bpp_threshold is used for both HQ and audio showcase mode
hq_bpp_threshold=0.075
# 96kbps produces decent results for Vorbis, comparable to mp3 at 128kbps
# http://listening-test.coresv.net/results.htm
hq_min_audio=96

Fast encoding mode

Overview

By default this script takes a long time to produce webms. The main reason is the usage of -threads 1 in the ffmpeg commands. While very slow, as it only uses a fraction of your CPU's capabilities, it produces a very predictable output filesize (-/+1% when using the same settings for consecutive tries). Some functionality of the script depends on this to skip unnecessary encoding attempts. Additionally multi-threading for VP8 and VP9 is rather weird (some might even call it stupid). The max. number of threads utilized depends on the video's width. Max. threads = width / 500 (always rounded down). So if you want to encode e.g. a 1920x1080 video you can use 3 threads maximum, even if you specify more via the -threads option.

To provide a way to fully utilize your CPU (therefore avoiding this weird restriction) and still produce a predictable file size, the fast encoding mode uses a different approach. The video gets split into n parts (n being specified by the user). These parts then get encoded at the same time. Given enough parts this will fully utilize your CPU. Afterwards the individual clips will be combined into the final video (without any additional losses, as you can copy those streams into the same container).

Note that this only applies to the video footage. The audio stream gets encoded normally, as the audio can't be trimmed with the necessary precision. Audible cuts would be the result of converting the audio in separate parts.

How to choose n?

n <= 1 will be silently ignored while the script reverts to its normal behaviour. n > 1 will result in an overall faster encoding speed. Ideally n should be the number of threads your CPU has. This way you will achieve 100% CPU usage. You can set n even higher, but this will have barely any additional effect on the encoding speed.

Fast encoding and audio showcase mode

These two don't work together. The 1fps frame rate makes it too inaccurate to split the video into individual parts and the keyframe interval adjustment (currently the only way to adjust the file size during audio showcase mode) isn't applicable.
Therefore the audio showcase mode automatically deactivates the fast encoding mode.

Drawbacks

Right now there are several drawbacks to the fast encoding mode, which is why it's not the default behaviour.

File size difference compared to normal mode

The resulting file size will differ from one continuous encoding attempt. In my experience this is heavily influenced by the used bitrate mode and whether you use VP8 or VP9. For VP8 the constrained quality mode seems to produce the most consistent results (differences range between <0.1 to 0.5 MiB). Classic VBR leads to an explosion in file size for very high bitrates (effectively doubling the file size at 10Mbps). While not being used in this script, constant quality mode has the opposite effect and becomes unpredictable for high crf values (i.e. low bitrates), albeit not as bad as with VBR.

I didn't perform many tests with VP9 yet, but it seems even less predictable than VP8.

Decreased output quality for difficult footage

The fast encoding mode shouldn't be used for difficult encodes. The encoder will likely produce worse results as the bitrate allocation is done separately for each part of the video, with all parts having the same importance.
For example the encoder might decide to use less bitrate on the last 30 seconds of a video, as the first 30 seconds depict more movement and therefore need a higher bitrate to achieve the same visual quality. When split into individual parts, the encoder doesn't know about the last 30 seconds (which are done in a different part) and can't optimize the bitrate allocation for the entire video.

2-pass VP9 is the only option that produces worse but acceptable results in such a situation.

High RAM usage

Running a great number of ffmpeg instances parallel consumes quite a bit of RAM. A combination of powerful CPU (e.g. 16 threads) and <=8GB RAM can lead to problems. If you experience problems under Linux, check your swappiness and if necessary lower n.

Time-based filters

This problem is fairly obvious. Splitting the original video into separate parts prevent ffmpeg from applying time-based filters correctly. Telling it to fade to black after 90 seconds has no effect, if each individual clip is only 50 seconds long. The exception to this shortcoming are audio filters. Due to the audio getting encoded in one go, you can still apply audio time-based filters like you always would. However, this may take a long time (an explanation can be found further down). Therefore the script prompts the user once at the very beginning whether or not it should assume (any) audio filters, if

  • the fast encoding mode is active
  • the user set custom filters
  • audio gets encoded
~~~~~~~~~~~~~~~~~~
ATTENTION!
Please specify if your filter string contains audio filters.
Choosing 1 (Yes) will lead to audio filters getting applied. This may take some time for long videos.
Choosing 2 (No) will lead to no audio filters getting applied. This will speed up the conversion.
~~~~~~~~~~~~~~~~~~
1) Yes
2) No
Assume audio filters? 

Why may it take long? Because of two factors playing together.

  1. The user defined filters are one long string. The script doesn't know which filters get applied. They could be video or audio filters.
  2. ffmpeg doesn't suppress the video stream, if video filters get applied.

Knowing these two facts, there can be 2 possible outcomes when trying to only encode the audio:

  1. The user told the script that there will be no audio filters -> no filter string gets applied -> fast audio-only conversion
  2. The user told the script that there will be audio filters -> filter string gets applied -> video stream can't be suppressed -> ffmpeg falls back on a default video encoder (in this case libtheora) -> slow video/audio conversion

The second outcome can be problematic. It might not matter for a 30 second clip, but it does for a 30 minute video.


Known errors

This is a list of errors that I know of / expect, but trust the user to avoid. Some of them can also be safely ignored.

Audio filters and stream copying:

The script will try to copy the input audio stream if three requirements are met:

  • Has the same codec (Vorbis or Opus, based on the -n flag)
  • The audio bitrate is less or equal the one being chosen by the script
  • Trim mode is inactive

If you want to apply audio filters via the -f flag, while the script tries to copy the input audio stream, then ffmpeg will throw an error (as you can't apply filters without encoding). To avoid this error use the trim mode for such files and simply hit enter, when asked for further input.

Only audio filters:

Currently you can't pass only audio filters with this script. For example

-f volume=0.5

will throw an error, because of the automatic downscaling. The final filter string would look like this

-filter_complex volume=0.5,scale=-2:480

and goes against ffmpeg's filter syntax. Currently you can solve this by scaling manually

-f "[0:a]volume=0.5;[0:v]scale=-2:480"

or applying an arbitrary video filter

-f "[0:a]volume=0.5;[0:v]crop=iw:ih"

VP9 and -tune ssim:

The -tune ssim option leads to better results while using VP8. VP9 doesn't offer this option (despite ffmpeg's internal documentation saying the opposite), so you'll see an error message when using VP9 (Failed to set VP8E_SET_TUNING codec control: Invalid parameter. Additional information: Option --tune=ssim is not currently supported in VP9.). This message however can be safely ignored, as ffmpeg ignores the -tune option and continues as usual.

Pictures with transparency:

VP8 (and perhaps VP9, haven't tested it yet) is unable to handle input images with transparency (e.g. RGBA png files). It seems like there is a bug that prevents those pictures to work with the alternate reference frame. Since I can't think of a way to detect those pictures with ffprobe and the script uses the alternate reference frame for all 2-pass encodes, I'll leave it to the user to make sure, that no input picture has an alpha channel.

Gifs are the exception for this problem. The script gives them their own ffmpeg commands.

Wrong color matrix:

When using input videos with a BT.709 color matrix, converting them to VP8 or SD VP9 webms will lead to the colors being slightly off. The reason is that the encoder switches to the BT.601 color matrix. For VP8 that's always the case as it doesn't support any other color matrix. VP9 uses BT.709 for HD, BT.601 for SD footage (that's pretty much the norm nowadays). I'm currently working on detecting those videos automatically with ffprobe. Until then apply the filter colormatrix=bt709:bt601 , but be certain that your input really has the BT.709 color matrix. Using this filter for a BT.601 input, will also lead to wrong colors.