📺 How to Upscale a Video - Sirosky/Upscale-Hub GitHub Wiki

🏠 Introduction

This guide is a beginner's introduction on video upscaling, and will focus on the easiest way to get started with video upscaling. This assumes that you already have a model picked out already, and a video you would like to upscale. If you need help downloading or picking a model, check out this guide.

Note that while chaiNNer and enhancr both have video upscaling capabilities, we are using VideoJaNai here because it is significantly faster than chaiNNer at videos and free (whereas enhancr is freemium). This makes it the best option for video upscaling if you need a user interface.

📜 Instructions

Install VideoJaNai, which is a free and open source app for upscaling and interpolating videos. Follow the link and download the latest release (either portable and installer).
Launch the app and it will begin downloading dependencies. This may take a while. Note that as of time of writing, you may need to re-attempt the download a few times as there have been reports of problems while installing. Use the Reinstall Python Dependecies button to attempt a redownload.

After you complete installation, you should be greeted with the following screen.

1. Input and Output

From here, you can select the video(s) you want to upscale, as well as where to save them.
Unless you know what you're doing, it's suggested to just stick with FFMPEG presets included. The default presets are the following:
- NVENC: Fast, but moderate file size and lower quality than x264 / x265.
- x265 (CPU): Very slow, but smaller file size and good quality.
- x264 (CPU): Slow, but somewhat small file size (bigger than x265) and good quality.
- Lossless (CPU): Fast and good quality, but massive file size. Not recommended unless the output is just a temporary or intermediate step.

2. Upscale Settings

Load the ONNX model. If the model you downloaded is in PTH format, simply follow the PTH to ONNX conversion guide.
Select the backend. For recent NVIDIA GPUs, you want to select TensorRT. Everything else (older NVIDIA GPUs or AMD GPUs), try DirectML first. TensorRT is extremely fast, but limited to fairly new NVIDIA GPUs. I don't know what the exact cutoff is, but 2000s, 3000s and 4000 GPUs definitely support TensorRT.
- For TensorRT, in most instances, you can just use the Automatic TensorRT Engine Settings (a new feature in 1.0.0!). However, if something isn't working, you can try custom engine commands. See the below section for further details on engine conversion.
There are also a few settings to adjust resolutions before resizing and after upscaling, should you need it.

3. Upscale

Click upscale, and the process should begin. Note that for TensorRT, VideoJaNai will build an engine file first. This engine file is GPU-specific (i.e., you can't just build one and share it with a friend), and make take up to 20-30 minutes depending on your specifications and the size of the model. However, this process only happens once for the model-- once you've built an engine for a model, you won't need to do it again the next time you use the model.

👀 TensorRT for AniSD and Other Custom Models (Advanced)

TensorRT is NVIDIA software library designed to optimize AI models on most NVIDIA GPUs. Some models can see up to a 30x uplift in speed over plain old Pytorch inference. As discussed above, VideoJaNai supports TRT inference. This portion of the guide will cover advanced usage of VideoJaNai to maximize TRT performance and compatibility for models.

Basics of TRT Inference

First, you need a model that can be converted from PTH into ONNX. Most models can be converted to ONNX-- see the guide here. However, some, such as those based on the CRAFT architecture are completely incompatible with ONNX. This in turn means no TRT inference.

Then, the ONNX file will need to be converted into an engine file, which is essentially an optimized version of the model. Some architectures may be compatible with ONNX, but then aren't compatible with the engine conversion process-- this means no TRT inference for them.

As of the 1.0.0, VideoJaNai has greatly improved its engine building behavior. It will correctly recognize architectures (or at least most of them) and use optimal settings to build the engine. However, there may be times where you'll want to build it manually, such as to set optimal resolutions which VideoJaNai otherwise defaults to 1080p. Thus, this guide covers manual engine building.

Engine Building

To access the engine building commands, disable Use Automatic TensorRT Engine Settings as indicated above. This will give you some new options to play with.
Then, make sure you know the following about the model you're creating an engine for.
1. Does the architecture support TensorRT to begin with? Some are incompatible, such as CRAFT, HAT, RGT, ATD, etc. This can be discovered through trial and error.
2. Is your ONNX model static or dynamic? This is usually indicated by the trainer when you download it.
3. Does your model support FP16 or BF16? FP16 tends to be fastest, but it isn't compatible with all archs.
Some of the more popular archs are discussed below, so if you know what arch you're using, skip there. The VideoJaNai interface also provides a helpeful overview.
Once you've answered the questions in step 2, use the appropriate preset. For dynamic engine, it doesn't hurt to modify the target resolutions to match your source. For example, AniSD models should be used on 480p, 576p and other standard definition sources, so it doesn't make sense to target 1080p with a dynamic engine.
1. Example of a FP16 engine generation targeting SD sources: --fp16 --minShapes=input:1x3x256x256 --optShapes=input:1x3x480x640 --maxShapes=input:1x3x576x720 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference. This sets the minimum resolution of the video to 256x256 (feel free to tweak, but there shouldn't be anything smaller than that), optimal resolution to 640x480 (note the resolutions in the command reverse width and height compared to how they're typically listed) and max resolution to 720x576.
Once you start the upscale, the engine building should begin automatically. Be patient as it may take up to 30 minutes to complete this process.

Architecture Overview

Some archs have their own quirks for conversion. See below for an overview.

SwinIR

SwinIR only works with static axes ONNX, such as the ones provided with AniSD. FP16 engine provides optimal speeds-- see below for an example command. trtexec.exe --fp16 --onnx="2x_AniSD_AC_G6i2b_SwinIR_117500_256x320_fp32.onnx" --saveEngine=2x_AniSD_AC_G6i2b_SwinIR_117500_256x320_fp32.engine --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw --tacticSources=+CUDNN --skipInference

Note that the tile size used by the ONNX file cannot exceed the dimensions of the video. In the example above, the tile size is 320x256-- which is much smaller than most video files. A good rule of thumb is that the ONNX file's tile size should be half of the video file size for optimal speeds (at least when I tested on my system)

This command should also work for OmniSR, but you may need to disable the --fp16 flag.

DAT2

DAT2 doesn't require static ONNX, but does require static engine. You can use a command like this:

trtexec.exe --bf16 --onnx="2x_AniSD_DC_DAT2_97500_fp32FO.onnx" --optShapes=input:1x3x480x640 --saveEngine=2x_AniSD_DC_DAT2_97500_fp32FO.engine --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw --tacticSources=+CUDNN --skipInference

Note the use of --bf16. FP16 engine does not work with DAT2. You can customize --optShapes to your target resolution-- in the example, it's targeted at 640x480.

RealPLKSR

Normal RealPLKSR (see below for RealPLKSR with dysample) works with dynamic engine and ONNX. Make sure to use FP16 engine, otherwise it'll be slower than using it with DML inference (at which point, might as well just use DML).

trtexec.exe --fp16 --onnx="2x_AniSD_AC_RealPLKSR_127500_fp32_FO_dynamic_FP16e.onnx" --optShapes=input:1x3x480x640 --saveEngine=2x_AniSD_AC_RealPLKSR_127500_fp32_FO_dynamic_FP16e.engine --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw --tacticSources=+CUDNN --skipInference

However, note that RealPLKSR with dysample (should be indicated in the model description) does not support dynamic ONNX. Thus, you will need to convert to static ONNX, then use the static engine.

[!TIP] Real PLKSR also supports DML inference, which isn't as fast as TRT inference, but is still a reasonably fast option. See below for further information on DML inference.

👀 DirectML for AniSD and Other Custom Models (Easy)

DirectML is an alternative inference backend such as NCNN and Pytorch. To use DML, simply select DirectML in the Upscaling Settings options and load in an ONNX as you normally would.

While TensorRT is the fastest inference backend for many archs, DML remains a good option for those without nvidia GPUs. It's worth noting that RealPLKSR is also surprisingly quick with DML.

DML can take dynamic ONNX, which makes it more accessible (think any ONNX converted in chaiNNer) for some archs. However, DML does not appear to work with SwinIR and DAT.