videofilt_gotchas - shekh/VirtualDub2 GitHub Wiki

VirtualDub Plugin SDK 1.2

Advice and gotchas

Here are some things to watch out for when writing your video filter:

In-place filter cropping

The fa->dst.offset field may inadvertently introduce a bug in your filter if you are writing it as in-place. By default, this field is initialized to zero, placing the destination bitmap to the start of the buffer. This is appropriate for two-buffer filters, but for an in-place filter it results in the destination bitmap being misplaced compared to the source bitmap. In most cases this isn't a problem, except when cropping is enabled. When cropping occurs on the source bitmap, this can cause the cropping to malfunction.

To avoid this problem, simply copy over the offset from the source bitmap to the output bitmap:

        fa->dst.offset = fa->src.offset;

Buffer overruns

When a scanline is 319 pixels (1276 bytes) in length, you must read or write no more than exactly 1276 bytes. You may not round this value up, write 1280 bytes, and hope there is additional unused memory at the end of the scanline. Doing so may cause a number of issues including corrupting adjacent buffers, destabilizing the application, or causing a crash. The same goes for alignment — you cannot assume that a scanline is aligned to 16 bytes just because it would be convenient. This may require some inconvenient fixup code for the odd bytes along the side.

There are a couple of exceptions. The first is that if you are running on a host that supports API V14 or higher, you can request aligned scanlines to simplify the situation (see CPU dependent optimization). The second is that you can bend the rules slightly if you are doing an advanced trick for optimizing unaligned accesses.

The trick is that you can safely read an unaligned word by reading the two aligned words that contains it. In particular, you may read all bytes in any 16 byte region aligned to a 16 byte boundary, or write all of those bytes as long as you do not actually change the values of bytes not belonging to the scanline. In other words, if a scanline spans 0136F004 to 0136F503, you can issue an aligned 16-byte read at 0136F000 without risking an access violation. Optimized algorithms that require aligned scanlines can often be used on unaligned scanlines simply by applying appropriate masks on the edges.

V14+ only: If FILTERPARAM_ALIGN_SCANLINES is set in paramProc(), you can safely write an integral number of 16 byte xmmwords even if this runs beyond the end of the scanline. For instance, if you are writing a 319 pixel wide, 32-bit scanline, you would normally only be able to write 319 * 4 = 1276 bytes, but instead you can write 80*16 = 1280 bytes. The contents of the bytes beyond the end of the scanline are ignored.

Pointer arithmetic (a.k.a. how not to crash on an upside-down image)

When treating the pitch for a bitmap/pixmap, you should always treat it as signed and use the standard C/C++ standard ptrdiff_t type. In particular, pixmaps can and often do have negative pitches, and using an unsigned type will cause crashes. In most cases you can get away with using a regular signed int, but it's simpler and actually faster just to use the correct type instead.

Alpha channel

The upper byte of each 32-bit pixel, the alpha channel, is unused. Its value is completely arbitrary on entry to the video filter and ignored on output. If you are porting image processing code from other sources, make sure it does not rely on the alpha byte to be set in any particular manner.

Runtime libraries

Make sure your filter doesn't depend on any runtime libraries that you don't ship with the filter binary. With Microsoft Visual C++, make sure you either statically link your filter to the C runtime library (CRT) or distribute the CRT DLL. This is particularly important for versions of VC++ beyond 6.0, which no longer use MSVCRT.DLL when the DLL version of the CRT is enabled. In VC++, the CRT linking mode is controlled in the Code Generation page of the compiler options in the project's settings. Statically linking filter DLLs to the CRT is the safest and most convenient model for distribution, and is recommended unless you are experienced with CRT distribution issues.

VirtualDub specific: Older versions of VirtualDub could run up against the operating system's limit for thread local storage (TLS) handles, of which one was required for each instance of the CRT loaded. These are consumed for each filter loaded when filters are statically linked to the CRT, and other DLLs that needed to load like video and audio codecs also consumed TLS slots. Because there are only 64 TLS slots available per process in Windows 95/98/NT4 and 80 in Windows 98, it was possible to have enough filters and codecs installed that DLLs would fail to load. This is mostly a non-issue now because the TLS slot limit was raised to 2088 in Windows 2000 and VirtualDub started dynamically loading and unloading filter DLLs starting with 1.5.0.

Don't open UI outside of configProc

You aren't guaranteed that the thread that startProc or runProc is called on is a UI thread. COM may not be initialized — particularly important for the shell APIs — and the thread may not even have a message pump. In fact, you aren't even guaranteed that two consecutive calls to runProc occur on the same thread, only that two threads won't run that function at the same time. Therefore, one thing that you should never do is attempt to create a window or dialog from those functions. Attempting to do so will likely destabilize the host process. UI should be opened only in configProc.

If you really must create UI in normally non-interactive entry points, the only safe way to do so is to launch a separate thread and perform all UI operations there. When doing so, you must make sure that this UI never attempts to block on a host thread, such as by calling SendMessage() on a host window, as that may cause a deadlock. If this UI can persist longer than filter instances, you must also hold a reference on the filter DLL so that it isn't unloaded by the OS when the host attempts to unload it.

Leave the floating point settings alone

The precision mode bits in the x87 floating point control word (FPUCW) belong to the application. So do the exception mask bits. And the SSE flush-to-zero (FTZ) and denormals-are-zero (DAZ) bits. Do you see a pattern here? Leave 'em alone and don't attempt to flip them in your filter unless you change them back before returning to the host. In fact, changing some of these bits and calling external code is also a violation of the Win32 calling interface. Don't do it, or at least, do it on a thread that you control.

The most common way that this rule is violated is via the compiler runtime. Microsoft Visual C++ shouldn't cause a problem here, even with /arch:SSE or /arch:SSE2, but older Borland C/C++ and Delphi runtimes attempted to change FPU exception and precision settings on initialization, and some versions of the Intel C/C++ runtime may also attempt to do so depending on compile settings. Make sure these mis-features are disabled in your filter's compilation settings.

The other way to violate is rule by accident is to attempt to initialize Direct3D, which by default changes the x87 FPU precision to 32-bit floating point for the current thread. You can avoid this by initializing Direct3D in a worker thread, which you will probably need to do anyway because of the threading issues noted above, even with D3DCREATE_MULTITHREADED. If you are initializing and shutting down Direct3D within a single call, another way to avoid this problem is to set D3DCREATE_FPU_PRESERVE. Typically the vertex processing load is so low in a video filter that this shouldn't introduce any noticeable performance penalty.

VirtualDub specific: VirtualDub aggressively checks for and corrects any detected violations in the above rules. In many cases, if your filter leaves FPU settings in an incorrect state, it will force them back to the correct values. For certain egregious violations, most notably leaving MMX active, it will also display a warning to the user that the filter is broken.

Secondary source frames

Source frames beyond the first only have a subset of valid fields. None of the base bitmap fields are valid on secondary frames; only the pixmap and pixmap layout can be used to access the image data. The frame number, frame timestamps, and cookie are valid.