Appendix Local Warped Motion - AlexBoswellVCD/gitlab_wiki_test GitHub Wiki
Warped motion modes are inter-prediction modes where the prediction is generated by applying an (affine) transform to the reference. AV1 has two affine prediction modes: global warped motion and local warped motion (LW). The following description is with respect to the latter.
AV1 has three types of motion modes that specify the motion of a block, namely SIMPLE, OBMC and LW. Local warped motion aims at describing various types of local motion. Minimal signaling overhead is realized by signaling one flag in the inter block mode info, and that only under some conditions. LW cannot be combined with OBMC.
where and represent the sample pixel coordinates in the current and reference frames, respectively. The decoder performs the same model estimation, so the encoder needs only to signal whether local warped motion is the selected mode for the current block and the corresponding translational model parameters and , i.e. the rest of the model parameters are not signaled in the bitstream.
To simplify the model estimation, and are assumed to represent the entries in the current block motion vector (MV in the Figure 1 below).
Let MV= . Then the above implies . The remaining parameters , , and are estimated using a least squares approach.
To illustrate the estimation of the parameters , , and using a least squares approach, consider the example shown in Figure 1 below.
Figure 1. Current block in yellow is a 32x32 block. Neighboring blocks that refer to the same reference picture as the current block are in blue. MVs (in orange) for the current block and the blue blocks are used to infer the local warp motion of the yellow block.
In the following, assume the x and y coordinates are specified with reference to the top left corner of the yellow block. Let be the center of the current block, and the projection of (C) onto the reference frame using the motion vector MV for the current block. According to the motion model:
For block 6, define to be the center of block 6, and to be the projection of onto the reference frame using the mv MV6 for block 6. Assuming the same motion model as above, it follows that:
Taking the difference between the two equations above:
The local warp transformation defines how the vector relating and in the source frame is projected into the vector relating and in the reference frame.
= where
The vectors and are shown in purple in Figure 1. The steps above are then repeated for blocks 5 and 3. The least squares minimization problem is then broken into two estimation problems: One to estimate the parameters and one to estimate the parameters such that A1H1 = B1 and A2H2=B2 where the matrices A1, B1, A2 and B2 are constructed from the data above. The solutions to the least squares estimation problems are then given as H1 = (A1’A1)-1 A1B1 and H2 = (A2’A2) -1 A2B2.
For implementation purposes, the local warp transform is implemented as two shears: A horizontal shear and vertical shear. The model matrix H is then decomposed as follows:
where are shear model parameters. The Vertical shear is given by the following model:
whereas the horizontal shear is given by:
The combined transform is given by:
The shear parameters are determined based on the parameters , , and .
Both the horizontal and vertical shears are implemented using 7-tap interpolation filters with 64th precision.
The final warped motion model is applied on an 8x8 basis in the source frame. The predicted block is constructed by assembling the 8x8 predicted warped blocks from the reference picture.
At the decoder side, the affine transform parameters are derived at the block-level using as input the motion vectors of the current and neighboring blocks.
Control macros/flags:
LW can be enabled/disabled at the sequence and the picture level as indicated in Table 1.
Flag | Level (sequence/Picture) | Description |
---|---|---|
-local-warp | Config / Sequence | Encoder configuration to enable/disable LW |
enable_local_warp_flag | Sequence | Enable/disable LW |
allow_warped_motion | Picture | Enable/disable LW |
At the block level, LW it is applied when the following conditions are true:
- Both width and height of the current block are equal to or greater than 8.
- The flag force_integer_mv, which is usually employed for screen content, is not set to 1.
- The flag motion_mode is set to 1 (that is, motion mode is not SIMPLE).
- The neighboring motion vectors are small enough in order to allow the derivation of applicable warping parameters.
Details of the implementation
As with other prediction mode candidates in the encoder, candidates
for the LW mode are first injected into MD and then processed through
several MD stages of RD optimization. A high-level diagram of the
function calls relevant to the two main LW functions, namely
inject_inter_candidates
and warped_motion_prediction
is given in
Figure 2 below.
The two main steps involved in the LW processing in MD, namely the injection of the LW candidates and the generation of the LW predictions are outlined in the following.
Step 1: Injection of the LW candidates.
The injection is performed by the function inject_inter_candidates. A diagram of the relevant function calls is given in Figure 3.
Figure 4. Continuation of Figure2 with the function calls related to the injection of LW candidates.
-
Check if the current block has overlappable blocks above and/or to the left of the current block (
has_overlappable_candidates
). Overlappable blocks are adjacent blocks above or to the left of the current block that are inter blocks with width >= 8 and height >= 8. -
Inject warped candidate (function
inject_warped_motion_candidates
) if the current block is such that width >= 8 and height >= 8 and warped_motion_injection is set.-
Get an MV. The MV would be from List 0 and could correspond to NEAREST MV, NEAR MV or NEW MV.
-
Compute warped parameters (function warped_motion_parameters)
-
Get warp samples
-
Get MVs from overlappable neighboring blocks in the causal neighborhood, i.e. top and left of the current block. (
wm_find_samples
) -
Generate the list of warp samples, i.e., selection of samples (
select_samples
). To perform the selection of samples, the difference between the MV for the current block and the MV of the neighboring block is computed. The sum of the absolute values of the x and y components of the difference are compared to a threshold. A neighboring blocks that result in a large sum are not considered. Stop if number of samples in the list is small, since the estimated warping parameters would be unreliable.
-
-
Warp parameters estimation (function
eb_find_projection
)-
Generate the warping parameters with the warp samples using the least squares fit (
find_affine_int
). Stop if parameters don’t fit threshold criteria. -
Generate warp variables alpha, beta, gamma and delta for the two shearing operations (i.e., horizontal and vertical, which combined make the full affine transformation). (
eb_get_shear_params
). Stop if the shear parameters are not valid (is_affine_shear_allowed
).
-
-
-
If not discarded, the LW candidate is added to the RD andidate list.
-
Step 2: Evaluation of the LW candidates in MD
The generation of the LW predictions in MD is performed using the function warped_motion_prediction. A diagram of the associated function call is shown in Figure 4 below.
Figure 5. Continuation of Figure 2 with the function calls related to the evaluation of the LW predictions in MD.
The steps involved in the generation and evaluation of the predictions are outlined below.
-
Generate warped motion predicted samples for each plane (
plane_warped_motion_prediction
)-
Create the 2D prediction array containing the warped inter predicted samples (
eb_av1_warp_affine
). Horizontal shear is applied first, followed by vertical shear. -
This step is performed at the level of 8x8 blocks, until the prediction for the entire block is generated
-
-
Compute RD for the LW prediction. Rate includes the signaling of the syntax element
motion_mode
LW is enabled for reference pictures in encoder preset 0, and in base layer pictures for encoder preset 1 to 5.
The configuration flag enable_local_warp_flag controls the encoder use of LW at the sequence level. At frame level, the use of LW is controlled by allow_warped_motion. At the block level, the use of LW is signaled by the syntax element motion_mode, which indicates the type of motion for a block: simple translation, OBMC, or warped motion.