Appendix GM - AlexBoswellVCD/gitlab_wiki_test GitHub Wiki

Global Motion Compensation Appendix

1. Description of the algorithm

Global motion compensation concerns the estimation and compensation of motion characteristics that affect the whole frame, as, for example, in video clips shot using a hand-held camera. In the example shown in the Figure 1 below, matched features in the two images below indicate a translation and rotation motion between the two pictures. In general, the key steps involved in estimating global motion comprise identifying features in both images, matching the identified features, and estimating global motion parameters based on the matched features.

Figure 1. Example of global motion involving translation and rotation.

The general motion model is given by:

where and are the pixel coordinates in the current and reference frames, respectively. The supported motion models include:

  • Affine projection: . This transformation preserves parallelism and has six parameters to estimate.

  • Rotation-zoom projection: , which corresponds rotation + scaling. This transformation preserves angles and has four parameters to estimate.

  • Translation: . This transformation preserves orientation and size and has two parameters to estimate.

The global motion estimation involves two main steps. The first step concerns feature matching where the objective is to identify features that are present in both the source and reference pictures. The second step concerns model identification, where the identified features are used to estimate the motion model parameters. In SVT-AV1, the global motion parameters are computed for each reference frame using feature matching + random sample consensus (RANSAC) algorithm, and the estimated parameters are sent in the bitstream.

Feature matching

To identify features that are common to both the source and reference pictures, the Features from Accelerated Segment Test (FAST) algorithm is used as a feature detector. The Fast algorithm identifies corner points by examining a circle of 16 pixels (Brensenhan circle of radius 3) around the pixel p of interest. If out of the 16 pixels, 12 contiguous pixels all have values above the pixel p by at least a given threshold or all have values below that of p by at least a given threshold, then the pixel is considered a feature (corner point) in the image. Such features are robust to motion and brightness changes. Once features on the source frame and on the reference frame are identified, feature matching is performed by computing the normalized cross-correlation function between the two sets of features. A feature (i.e. corner point) is selected if:

  • The feature on the reference frame is located within a pre-specified distance from the feature in the source frame.

  • The correlation between the point in the reference frame and that in the source frame is highest.

Model identification

The model is identified based on the matched feature points from the feature matching step. A least squares estimation is performed to compute the model parameters using the matched feature points. The RANSAC (Random Sample Consensus) algorithm is used in the estimation. The algorithm minimizes the impact of noise and outliers in the data. The set of parameters to be estimated depends on the motion model (Translation, rotation-zoom, affine) specified. The identified parameters are included in the bitstream.

The RANSAC algorithm finds model parameters that yield the best match to the motion of the identified features. The steps involved in the algorithm are as follows:

  • A small number of matched features (corner points) are used in the model parameter estimation (as dictated by the number of parameters to estimate).

  • The remaining features are used to evaluate the fitness of the model by counting the number of those matched features where the model yields a small error (inliers). The remaining tested features are considered outliers.

  • Steps 1 and 2 are repeated based on another small set of matched features and the number of resulting outliers is recorded.

  • The process stops when the number of outliers is below a specified threshold.

2. Implementation of the algorithm

Input to motion_estimation_kernel: Input frames of the stream.

Outputs of motion_estimation_kernel: Estimated global motion models per frame with their references.

Input to enc_dec_kernel: Estimated global motion models.

Outputs of enc_dec_kernel: Encoded frame with global motion encoded blocks if they provide a cost advantage.

Control macros/flags:

Table 1 below summarizes the control macros and flags for global motion compensation.

Table 1. Control macros and flags for global motion compensation.
Flag Level (sequence/Picture) Description
GLOBAL_WARPED_MOTION

Macro to enable global warped motion estimation and mode insertion.

When disabled, it restores the previous global motion implementation which only supports the TRANSLATION mode.

global_mv_injection Block Controls whether global motion candidates should be estimated.

Details of the implementation

The global motion tool consists of two parts, namely global motion estimation and mode decision.

Global motion estimation

This process is executed by the global_motion_estimation function. This function is called for the first segment of each frame in the motion_estimation_kernel (but processing the whole frame). The function involves a loop that runs over all reference frames. At each iteration of the loop, the FAST features of the reference frames are extracted and matched to those of the current frame. They are computed in the av1_fast_corner_detect function thanks to the fastfeat third-party library. The matching is done by computing the cross-correlation between patches around the feature centers in the av1_compute_cross_correlation_c function.

Then, the rotation-zoom and affine global motion models are tested with the RANSAC algorithm by the av1_compute_global_motion. Their parameters are refined in the av1_refine_integerized_param function.

As saving global motion parameters takes space in the bit stream, the global motion model is kept only if the potential rate-distortion gain is relevant. This decision is made thanks to the computed frame error, the storage cost of the global motion parameters and empirical thresholds.

The AV1 specifications define four global motion types:

  • IDENTITY for an identity model,

  • TRANSLATION for a translation model,

  • ROTZOOM for a rotation and zoom model,

  • AFFINE for an affine model.

In the DetectGlobalMotion function, only the ROTZOOM and AFFINE models are considered. The evaluation of the TRANSLATION model is not very useful since translations can already be well captured by other local predictors.

Mode decision

Each block that is 8x8 or larger in size can be a candidate for local or global warped motion. For each block, we insert in the inject_inter_candidates function global motion candidates for the simple and compound modes for the LAST_FRAME and the BWDREF_FRAME frame types. The compound mode implementation only mixes global warped motions for both references.

To identify global warped motion candidates, the warped_motion_prediction function has been modified to support the compound mode for warped motions for the case where high bit-depth is enabled and for the case where it is not.

3. Optimization of the algorithm

There is currently no particular optimization of the global motion estimation process and candidate ranking. In fact, with respect to ranking the global motion candidates, the current implementation supports a dedicated path for global motion candidates that allows some of the those candidates to possibly survive until the last and most costly stage of the mode decision process.

4. Signaling

The global motion parameters are written in the bitstream for each encoded frame with their corresponding references.

Boolean parameters encode the type of global motion models among the four available: IDENTITY, TRANSLATION, ROTZOOM or AFFINE, as indicated in Table 2

Table 2. Global motion model signals.
Frame level Values (for 8-bit content) Number of bits
is_global {0, 1} 1
is_rot_zoom {0, 1} 1
is_translation {0, 1} 1

Depending on the model complexity, parameters corresponding to entries in the affine transformation matrix are also encoded, as indicated in Table 3.

Table 3. Global motion parameters in the bitstream.
Frame level Values (for 8-bit content) Number of bits
Global motion parameters: Up to 12
0 parameter for IDENTITY
2 parameters for TRANSLATION
4 parameters for ROTZOOM
6 parameters for AFFINE

References

  • Sarah Parker, Yue Chen, David Barker, Peter de Rivaz, Debargha Mukherjee, “Global and Locally Adaptive Warped Motion Compensation in Video Compression,” International Conference on Image Processing, pp. 275-279, 2017.

  • Peter de Rivaz and Jack Haughton, “AV1 Bitstream & Decoding Process Specification”, 2019

⚠️ **GitHub.com Fallback** ⚠️