Advanced Reloc - ethteck/splat GitHub Wiki

Advanced: Manual relocation handling

Sometimes the disassembler is unable to determine the correct symbol to be referenced by a given function or data symbol. For cases like this you can manually tell the disassembler what symbol should be used on each specific instance and how it should be referenced in a reloc_addrs.txt file.

Assembly function or data symbol references other symbol by using relocations (relocs for short). This section will explain how to override the automatic relocations generated by the disassembler.

By default splat will read the reloc_addrs.txt file from the root of the project, if any, but it is possible to rename the file, move it to a different folder or even provide multiple files for organizational purposes. Please refer to the reloc_addrs_path for more information.

Syntax

To override relocs you need to have at least one reloc_addrs.txt file. Each line of this file corresponds to a reloc override entry. Empty lines are allowed and comments are used with //.

The format for defining an entry is:

rom:0x04B440 reloc:MIPS_HI16 symbol:BonusWait addend:-0x3

Each attribute is defined as follows:

  • rom: The rom address of the instruction or data reference you want to affect is located.
  • symbol: The symbol that you want to reference at this given place.
  • addend: Optional. A displacement into the symbol. It can be either positive or negative.
  • reloc: The relocation kind to be used. It mirrors the standard MIPS relocations. The following values are valid for this attribute:
    • Function relocs:
      • MIPS_HI16: Corresponds to the %hi reloc operator.
      • MIPS_LO16: Corresponds to the %lo reloc operator.
      • MIPS_GPREL16: Corresponds to the %gp_rel reloc operator.
      • MIPS_GOT16: Corresponds to the %got reloc operator.
      • MIPS_CALL16: Corresponds to the %call16 reloc operator.
      • MIPS_GOT_HI16: Corresponds to the %got_hi reloc operator.
      • MIPS_GOT_LO16: Corresponds to the %got_lo reloc operator.
      • MIPS_CALL_HI16: Corresponds to the %call_hi reloc operator.
      • MIPS_CALL_LO16: Corresponds to the %call_lo reloc operator.
      • MIPS_26: No direct operator. Used in jal (jump and link) and j (jump) instructions.
      • MIPS_PC16: No direct operator. Used in branch instructions.
    • Data relocs:
      • MIPS_32: Corresponds to .word.
      • MIPS_GPREL32: Corresponds to .gpword.
    • No reloc:
      • MIPS_NONE: Makes no reloc to be used at all, making the disassembler to use the raw value instead. Useful for fake positives on the symbol detector.

Examples

Next are a common patterns where providing manual relocs is useful.

Negative offsets

A common that can be seen on many C functions is to access a global array with an index substraction, making the compiler to emit an assembly access that ends up pointing to a different index than the one would expect.

Take for example the following C code.

int some_sym = 0;
int some_array[3] = {0};

int get_value(int index) {
    return some_array[index - 1];
}

Some compilers with some optimization flags enabled may optimize the generated assembly into something that doesn't need to actually perform the - 1 subtraction, as seen in the following example assembly:

sll     $t6, $a0, 0x2
lui     $v0, %hi(some_array - 0x4)
addu    $v0, $v0, $t6
jr      $ra
 lw      $v0, %lo(some_array - 0x4)($v0)

This then gets build and linked into a final binary. That binary won't have those explicit relocations, it will just have raw addresses, meaning that code will techincally reference the symbol "behind" the array we want to actually use, in this case we end up refering to some_sym instead.

A direct disassembly of this assembly would look like similar to the following assembly. Note there's no mention of some_array anywhere.

/* 0000 80000000 00047080 */ sll     $t6, $a0, 0x2
/* 0004 80000004 3C028000 */ lui     $v0, %hi(some_sym)
/* 0008 80000008 004E1021 */ addu    $v0, $v0, $t6
/* 000C 8000000C 03E00008 */ jr      $ra
/* 0010 80000010 8C420020 */  lw      $v0, %lo(some_sym)($v0)

While this assembly is very likely to build to a matching binary again, it may cause some issues under a few circunstances, like when a poject aims to achieve proper shiftability while it haven't been completely matched.

To fix this disassembly it is needed to provide reloc entries in a reloc_addrs.txt file like the following:

rom:0x0004 reloc:MIPS_HI16 symbol:some_array addend:-0x4
rom:0x0010 reloc:MIPS_LO16 symbol:some_array addend:-0x4

This tells the disassembler to reference some_array - 0x4 at the instruction at rom address 0x0004 (the lui instruction), and that it should use the %hi reloc operator to do so. A similar logic is used for the instruction at rom address 0x0010 (the lw), but instead we told it to use the %lw reloc operator instead. This generates an assembly like the following:

/* 0000 80000000 00047080 */ sll     $t6, $a0, 0x2
/* 0004 80000004 3C028000 */ lui     $v0, %hi(some_array - 0x4)
/* 0008 80000008 004E1021 */ addu    $v0, $v0, $t6
/* 000C 8000000C 03E00008 */ jr      $ra
/* 0010 80000010 8C420020 */  lw      $v0, %lo(some_array - 0x4)($v0)

Segment symbols in code

A common pattern seen on N64 projects is when the code references special symbols known as "segment symbols" to load some fragments or segments of the ROM into VRAM. N64 games do this because it isn't possible to load the whole ROM into VRAM, and also the N64 games lack a proper filesystem.

Segment symbols exist to describe addresses or sizes of specific regions of the ROM, like the start and end of the ROM addresses of a given segment, the expected start and end of the VRAM addresses of that segment, the size of the segment, etc.

Sadly the disassembler is unable to properly disambiguate these segment addresses from other symbols or even plain numbers, so disassembly of code referencing segment symbols tend to be far from optimal.

Take for example the following C code:

/* segment symbols */
u32 segment_menu_ROM_START[];
u32 segment_menu_ROM_END[];

void load_menu_segment(void *dst) {
    load_segment(segment_menu_ROM_START, (u32)segment_menu_ROM_END - (u32)segment_menu_ROM_START, dst);
}

After compiling this code, the generated assembly would look like the following:

addiu    $sp, $sp, -0x18
sw       $ra, 0x10($sp)
addu     $a2, $a0, $zero
lui      $a0, %hi(segment_menu_ROM_START)
addiu    $a0, $a0, %lo(segment_menu_ROM_START)
lui      $a1, %hi(segment_menu_ROM_END)
addiu    $a1, $a1, %lo(segment_menu_ROM_END)
jal      load_segment
 subu    $a1, $a1, $a0
lw       $ra, 0x10($sp)
addiu    $sp, $sp, 0x18
jr       $ra
 nop

But when this gets linked into a ROM those symbols get replaced with their raw numeric values, and since they point to rom addresses or vram addresses that are at the boundaries of each segment, the disassembler strugles symbolizing them, so it ends up symbolizing them into generic symbols, like the following:

/* 0000 80000000 27BDFFE8 */ addiu    $sp, $sp, -0x18
/* 0004 80000004 AFBF0010 */ sw       $ra, 0x10($sp)
/* 0008 80000008 00803021 */ addu     $a2, $a0, $zero
/* 000C 8000000C 3C040000 */ lui      $a0, %hi(D_000FB480)
/* 0010 80000010 24840000 */ addiu    $a0, $a0, %lo(D_000FB480)
/* 0014 80000014 3C050000 */ lui      $a1, %hi(D_00101A80)
/* 0018 80000018 24A50000 */ addiu    $a1, $a1, %lo(D_00101A80)
/* 001C 8000001C 0C000000 */ jal      load_segment
/* 0020 80000020 00A42823 */  subu    $a1, $a1, $a0
/* 0024 80000024 8FBF0010 */ lw       $ra, 0x10($sp)
/* 0028 80000028 27BD0018 */ addiu    $sp, $sp, 0x18
/* 002C 8000002C 03E00008 */ jr       $ra
/* 0030 80000030 00000000 */  nop

To fix the disassembly and make it use the proper segment symbols, we add more entries to the reloc_addrs.txt file. Note here we don't need to specify an addend, since we just want to refer to the symbol without any other calculation.

rom:0x000C reloc:MIPS_HI16 symbol:segment_menu_ROM_START
rom:0x0010 reloc:MIPS_LO16 symbol:segment_menu_ROM_START
rom:0x0014 reloc:MIPS_HI16 symbol:segment_menu_ROM_END
rom:0x0018 reloc:MIPS_LO16 symbol:segment_menu_ROM_END

Segment symbols in data

On the other side, segment symbols may be referenced in data structures like arrays or structs. Here you can read a bit more about what segment symbols are.

Take for example the following C code and its corresponding compiled assembly:

u32 segment_menu_ROM_START[];
u32 segment_menu_ROM_END[];

u32 *menu_addresses[] = {
    segment_menu_ROM_START, segment_menu_ROM_END,
}
.word segment_menu_ROM_START
.word segment_menu_ROM_END

But when we try disassembling a rom with this data, the disassembler won't be able to recognize these as segment symbols:

/* 0100 80000100 000FB480 */ .word D_000FB480 # It may use a D_ symbol
/* 0104 80000104 000FB480 */ .word 0x000FB480 # Or it may even fail completely to symbolize it

In this case we can use the MIPS_32 reloc in our reloc_addrs file to fix this kind of issue.

rom:0x0100 reloc:MIPS_32 symbol:segment_menu_ROM_START
rom:0x0104 reloc:MIPS_32 symbol:segment_menu_ROM_END