inputdrv_videodecodingmodel - shekh/VirtualDub2 GitHub Wiki

VirtualDub Plugin SDK 1.2

Video decoder model

The video decoder model simulates the video decoder's conversion of samples to decoded frames. It is used by the host to determine ahead of time which samples will be required by the decoder without having to immediately decode the samples.

Operation

Video decoder models implement the IVDXVideoDecoderModel interface. The primary methods that driving the video decoder model operation are SetDesiredFrame(), which selects the frame to decode, and GetNextRequiredSample(), which returns all of the samples required to decode that frame. The video decoder model generally shares data structures with the parent video source in order to implement these methods.

For each sample that is required to decode the frame, the video decoder model updates the video decoder state in the model with the results of decoding that sample, returns the sample number to the caller. It is possible for the host to stop sending samples before the frame is fully decoded — the video decoder model must still reflect the changes to the video decoder up to that point, if it affects the decoding of further frames.

It is important to note that the video decoder model does not receive any sample data itself; it must predict the necessary samples solely by internal information. This is usually determined either by reading an index or doing a prescan on file open. The video decoder model thus allows the host to prefetch video samples before they are needed.

Key frame / delta frame decoding model (I/P frames only)

In a simple delta compression model, frames are composed of either key frames, which are self-contained, or delta frames, which are stored as differences from the previous frame. These are sometimes known as Intra (I) frames and Forward Predicted (P) frames.

If the decoder only has one frame buffer, then the decoding model is simple: decoding a sample updates the frame buffer with the results of that sample. The samples needed to decode a frame are the one for that frame and all samples prior to that, back to either the one after the last decoded frame or the last key frame, whichever is closer. In other words, the needed logic is as follows:

SetDesiredFrame(desired_sample) {
    target_sample = desired_sample;
    
    // find the nearest key frame
    next_sample = nearest_key(desired_sample);
    
    // use the last decoded frame if it is closer
    if (last_sample >= next_sample && last_sample <= desired_sample)
        next_sample = last_sample + 1;

    // check if we already have the frame
    if (next_sample > desired_sample)
        next_sample = -1;
}

GetNextRequiredSample() {
    if (next_sample < 0 || next_sample > target_sample)
        return -1;
    return next_sample++;
}

Note that if the desired frame has already been decoded — in that case no samples are required at all, and -1 is returned immediately. This is a special case where the client is required to send a dummy frame in order to inform the decoder to present one of its internally buffered frames.

If there are no delta frames, the decoder model simply requests the sample corresponding to each frame.

This decoder model is sufficiently common that it is provided automatically by the host when the mDecoderModel field of the VDXVideoSourceInfo structure is set to kDecoderModelDefaultIP.

The default I/P model does not work in VirtualDub 1.7.4 (API V1). It does work in 1.7.5 (API V2).

Sources with bidirectionally predicted frames (I/P/B frames)

If the video source contains bidirectional prediction, the decoder model becomes considerably more complicated, because the decoding order is not the same as the presentation order. This is required because each bidirectionally predicted (B) frame requires both a future and a past frame, and thus the future frame must be decoded out of order:

Frame type I B B P B B P B B P
Display (frame) order 0 1 2 3 4 5 6 7 8 9
Decoding (sample) order 0 3 1 2 6 4 5 9 7 8

Whereas a delta-frame decoder is able to use only one or two frame buffers, a B-frame decoder needs to use three buffers: one for the previous I/P frame (forward prediction), one for the next I/P frame (backward prediction), and one for a B-frame predicted between the first two.

The full logic for handling B-frame decoding is too complex to include here, but the basic gist is as follows:

  • If the desired frame is already present in an internal buffer, that internal buffer is presented.
  • For I and P frames, the decoding model requests only samples corresponding to I/P frames, and is the same as the I/P frame model except that it keeps the previous I/P frame as well as the one being decoded.
  • For B frames, the decoding model first requests samples until the previous I/P frame is present, then the next I/P frame, and finally the sample for the desired B frame.

Copyright (C) 2007-2012 Avery Lee.

⚠️ **GitHub.com Fallback** ⚠️