Maintaining OOVPAs for Symbol detection - Cxbx-Reloaded/XbSymbolDatabase GitHub Wiki

Go to TL;DR section

Introduction

XbSymbolDatabase scans the contents of an XBE using so-called OOVPAs.

OOVPA stands for "Optimized (Offset, Value)-Pair Array".

It's a data-structure that was thought up by Aaron Robinson (also known as Caustik), the initiator of Cxbx back in 2003.

It's initial description can be read on http://www.caustik.com/cxbx/download/progress.htm, it says:

In order to efficiently locate a given chunk of assembly code (i.e. a High Level Function), a database of (offset,value) pairs can be used.

Offset represents the offset (in bytes) from the start of the function.

Value represents the byte value at that location.

With this datatype, we can locate the function by hand, and then write down important (offset,value) pairs.

This process is time consuming, but very rewarding.

Cxbx is able to successfully (and with no false identifications to date) identify High Level Functions inside an arbitrary XBE file.

This is due to the fact that, statistically, carefully chosen (offset,value) pairs are capable of uniquely identifying relocatable code.

The likelihood of falsely locating a function body is inversely proportional to the number of pairs combined with the rarity of those pairs.

Each OOVPA describes one unique function which originated from a specific version of a library.

XbSymbolDatabase uses an OOVPA to scan for the location of that function in an XBE.

OOVPAs are registered in library's OOVPATable.

In it's current state, XbSymbolDatabase contains one OOVPATable per library.

Scanning

Scanning for functions using OOVPAs goes roughly like this:

  • XbSymbolDatabase walks through a list of OOVPAs, and for each of these, the address range is determined and scanned through.
  • For each location in the address range, all byte offsets mentioned in the OOVPA are read from the executable and checked against the value that should be there according to the OOVPA.
  • If all checks are valid, the location is considered a match for the OOVPA and scanning continues with the next OOVPA.
  • If one or more values mismatch, it's a miss, and scanning continues through the rest of the address range.
  • If the entire range is checked without finding a match, the OOVPA (or rather, the function it describes) is considered not present in the executable.

XRefs

Apart from OOVPAs, XbSymbolDatabase contains a set of so-called XRef numbers (short for cross-reference numbers), which are unique ID's to identify a function by.

Some OOVPAs are defined including one of these XRef numbers.

When there's a match found for an OOVPA that has an XRef, the matching location is written to a list, indexed by the XRef number.

Once there's a location recorded for an XRef, it's final, meaning that XbSymbolDatabase will skip a scan with any other OOVPA mentioning that same XRef number.

Some OOVPAs contain an XRef that must be checked for, together with the (Offset, Value)-pairs.

This check requires the mentioned XRef to be previously recorded.

If during scanning, XRef isn't set yet, the OOVPA is skipped and retried in a later pass.

(As scanning is done in passes, repeating until no more XRef's are located.)

If this XRef IS set however, the code must reference this location to be valid.

If not, the OOVPA search continues looking through the executable.

Checking for an XRef means comparing the recorded location to the 4 bytes that are present on the mentioned offset.

This is compared as a direct (absolute) reference, and as an indirect (address-relative) reference - either way, if that matches the recorded location, the XRef check holds, and all other (Offset, Value) pairs are checked.

If the XRef check fails, the rest of the OOVPA is not checked, it's deemed a miss, scanning continues with the next address.

Maintaining OOVPAs

Each OOVPA must be unique from all other OOVPAs, otherwise, the same location could be matched to more than one function, which would lead to incorrectly placed patches, which leads to unpredictable behavior; Mostly crashes.

An OOVPA is formed by choosing a few offsets in the machine code of that function, and writing down their byte values, in such a way that no other function is identifiable with these offsetted bytes.

The function of OOVPA scans for can be different between library versions. To get reliable emulation, XbSymbolDatabase needs to contain unique OOVPA definitions that will match all existing versions of a function.

Sometimes, after a function changed in one version, it changes once more in another, later version.

In some rare cases, a function might even re-appear in a prior form!

In this case, the OOVPA for that re-appearance must not be copied over from an earlier version, but instead an alias must be registered.

(Aliases are simply #define function_new_version function_old_version)

Too Long, Didn't Read

To keep things simple.

  • OOVPA_SIG_HEADER_NO_XREF, since a signature is not requiring a reference to another OOVPA. However going down this path will give you a chance of false detection. It is recommend to keep it above 10 or 12 unique offset values. You can use the following method below:
OOVPA_SIG_HEADER_NO_XREF(/*Name of a function or address*/,
                         /*XDK version*/)
OOVPA_SIG_MATCH(
    // OV_MATCH( Offset, opcode value, ... ),
    // commented line here is a requirement, see https://github.com/Cxbx-Reloaded/XbSymbolDatabase/issues/146
);

For example:

OOVPA_SIG_HEADER_NO_XREF(DirectSoundCreate,
                         3936)
OOVPA_SIG_MATCH(
    // add eax, 8
    OV_MATCH(0x23, 0x83, 0xC0, 0x08),

    // push 0x1C
    OV_MATCH(0x34, 0x6A, 0x1C),

    // sbb eax, eax
    OV_MATCH(0x75, 0x1B, 0xC0),

    // retn 0x0C
    OV_MATCH(0x9B, 0xC2, 0x0C, 0x00),
    //
);
  • OOVPA_SIG_HEADER_XREF, doing this method will greatly decrease false detection over time. Plus ability to support earlier and later XDK builds unless something has changed over several XDK builds later.
OOVPA_SIG_HEADER_XREF(/*Name of a function or address*/,
                      /*XDK version*/,
                      /*Total of "XREF_ENTRY" used at the very top usage only.
                        It cannot be in random location or will screw up the scan method you expect it to do.*/)
OOVPA_SIG_MATCH(
    // XREF_ENTRY( Offset, XRefDataBaseOffset value ),
    //...

    // OV_MATCH( Offset, opcode value, ... ),
    //...
    // commented line here is a requirement, see https://github.com/Cxbx-Reloaded/XbSymbolDatabase/issues/146
);

For example:

OOVPA_SIG_HEADER_XREF(CDirectSoundBuffer_GetStatus,
                      3936,
                      XRefOne)
OOVPA_SIG_MATCH(
    // call [CMcpxBuffer::GetStatus]
    XREF_ENTRY(0x15, XREF_CMcpxBuffer_GetStatus),

    // push [esp+0x10]
    OV_MATCH(0x07, 0xFF, 0x74, 0x24, 0x10),
    // mov ecx, [eax+0x20]
    OV_MATCH(0x11, 0x8B, 0x48, 0x20),

    // retn 0x08
    OV_MATCH(0x2E, 0xC2, 0x08, 0x00),
    //
);