Hooks - morcSkyrim/SkyrimSE GitHub Wiki

Hooks

Still rather poorly written, proceed with caution

Required:

  • Basics of being able to read and write assembly
  • Decompiler of some form, IDA, Ghidra or Radare2, basically mandatory
  • Some knowledge of x64 calling conventions - not mandatory, but will make your life MUCH easier

One of the most useful tools for debugging and modifying Skyrim functions are hooks. Hooks involve writing over instructions at the specified memory location with a branch back to your code.

A method for implementing hooks is provided by CommonLib through the trampoline class (I'm guessing that this is referred to as a trampoline because our code is going to jmp here, then out again, but I'm not sure). The trampoline class provides us the with dynamic memory allocation for our functions, in addition to overwriting code at the source destination to jmp to our trampoline.

Trampoline Initialization

Trampoline initialization is managed by SKSE. Exactly what SKSE does and why is still unknown to me, but I'll do my best to provide an educated guess.

SKSE::AllocTrampoline(TrampolineSize);

This function allocates memory for the trampoline and should be called during the SKSEPlugin_Load function. Potential guesses as to why this must be called in SKSEPlugin_Load are:

  • During SKSEPlugin_Load, we can generate instruction memory instead of data memory on the heap. Skyrim probably throws errors if we move the ISP to data instead of instruction memory. (Need more reading on dlls and executables.)
  • Allocating the memory during the plugin load sequence keeps the trampoline close to the other instructions, this would reduce cache misses, as well as keeping the number of bytes for a given jmp smaller, given the smaller offset.

The next step is getting the trampoline singleton,

auto& trampoline = SKSE::GetTrampoline();

which is the object which we will use to write our functions.

Analyzing the Target Function

Before we can write to our trampoline or the target function, we must first assess what is happening at the target function location. To do this, we must use a decompiler. Then we need to assess what the byte alignment of the operations at the target function are, depending on what sort of functionality you would like to add with your hook.

If we assume that you would like to preserve the functionality of the function, then you must accurately assess the byte alignment of the instructions to ensure you can return to the original function after you do something in your trampoline. As an example, consider the following function

ulonglong __fastcall FUN_140889cb0(longlong param_1)

140889cb0        PUSH       RDI
140889cb2        SUB        RSP,0x30
140889cb6        MOV        qword ptr [RSP + local_18],-0x2

this function should be considered 6B aligned for our purposes. If we were to use a 5B alignment, the 0x30 portion of the subtract operation gets cut in half. This would mean that on returning to our function, all subsequent operations are potentially garbage, and will likely result in a crash. If we use 6B alignment, then the trampoline will align properly with the MOV instruction, and the function will operate correctly.

Writing our Trampoline

Now that you've allocated memory for your trampoline to use, you now must actually write code to your trampoline and overwrite the source function with a jmp instruction to your trampoline.

You can write code to the trampoline memory however you want, but CommonLib provides some relatively automated methods of doing so in combination with XBYAK, a JIT assembler. To write our code we first need to allocate memory from the trampoline object

auto& trampoline = SKSE::GetTrampoline();
auto result = trampoline.allocate(xbyakCode.getSize());

next, we need to copy our code to the memory we've just allocated on the trampoline

std::memcpy(result, xbyakCode.getCode(), xbyakCode.getSize());

Finally, we need to decide what we would like to use to write our hook, CommonLib provides us several options, in the form of write_branch5, write_branch6, write_call5, and write_call6. I'd generally recommend against using the call options, as they aren't really necessary. I've also written my own write_SafeBranch6 and write_SafeBranch5 functions to partially automate the process of writing trampolines. The only thing left to do is specify the function address using the Relocation library,

constexpr REL::ID funcOffset(50928);
trampoline.write_SafeBranch6(funcOffset.address(), (std::uintptr_t)result);

where 50928 is the offset corresponding to our previous function, FUN_140889cb0.

write_SafeBranch6

This function handles the setup of the landing for the trampoline, as well as the writing to the source location for 6B alignment.

void write_SafeBranch6(std::uintptr_t a_src, std::uintptr_t a_dst)
{
#pragma pack(push, 1)
	struct SrcAssembly
	{
		// jmp/call [rip + imm32]
		std::uint8_t opcode;  // 0 - 0xE9/0xE8
		std::int32_t disp;	  // 1
	};
	static_assert(offsetof(SrcAssembly, opcode) == 0x0);
	static_assert(offsetof(SrcAssembly, disp) == 0x1);
	static_assert(sizeof(SrcAssembly) == 0x5);

	struct TrampolineAssembly
	{
		// jmp [rip]
		std::uint32_t  srcOp1;	  // 0 - 0xFF
		std::uint16_t  srcOp2;  // 1 - 0x25
		std::uint8_t jmp;
		std::uint32_t disp;
	};
	static_assert(offsetof(TrampolineAssembly, srcOp1) == 0x0);
	static_assert(offsetof(TrampolineAssembly, srcOp2) == 0x4);
	static_assert(offsetof(TrampolineAssembly, jmp) == 0x6);
	static_assert(offsetof(TrampolineAssembly, disp) == 0x7);
	static_assert(sizeof(TrampolineAssembly) == 0xB);
#pragma pack(pop)

	TrampolineAssembly* mem = nullptr;
	if (const auto it = _6branches.find(a_dst); it != _6branches.end()) {
		mem = reinterpret_cast<TrampolineAssembly*>(it->second);
	}
	else {
		mem = allocate<TrampolineAssembly>();
		_6branches.emplace(a_dst, reinterpret_cast<std::byte*>(mem));
	}

	const auto disp = a_dst - (a_src + sizeof(SrcAssembly));
	if (!in_range(disp)) {	// the trampoline should already be in range, so this should never happen
		stl::report_and_fail("displacement is out of range"sv);
	}

	const auto disp_ret = a_src + 6 - (reinterpret_cast<std::uint64_t>(mem) + offsetof(TrampolineAssembly, disp) + 4);
	mem->srcOp1 = *reinterpret_cast<std::uint32_t*>(a_src);
	mem->srcOp2 = *reinterpret_cast<std::uint16_t*>(a_src + 4);
	mem->jmp = 0xE9;
	mem->disp = static_cast<std::uint32_t>(disp_ret);

	SrcAssembly assembly;
	assembly.opcode = 0xE9;
	assembly.disp = static_cast<std::int32_t>(disp);
	REL::safe_write(a_src, assembly);
}

To break down the function in parts,

struct SrcAssembly
{
  // jmp/call [rip + imm32]
  std::uint8_t opcode;  // 0 - 0xE9/0xE8
  std::int32_t disp;	  // 1
};
static_assert(offsetof(SrcAssembly, opcode) == 0x0);
static_assert(offsetof(SrcAssembly, disp) == 0x1);
static_assert(sizeof(SrcAssembly) == 0x5);

this struct defines the format we'll use to overwrite the source assembly. It's comprised of a jmp, and the displacement to modify the instruction register by.

struct TrampolineAssembly
{
  // jmp [rip]
  std::uint32_t  srcOp1;	  // 0 - 0xFF
  std::uint16_t  srcOp2;  // 1 - 0x25
  std::uint8_t jmp;
  std::uint32_t disp;
};
static_assert(offsetof(TrampolineAssembly, srcOp1) == 0x0);
static_assert(offsetof(TrampolineAssembly, srcOp2) == 0x4);
static_assert(offsetof(TrampolineAssembly, jmp) == 0x6);
static_assert(offsetof(TrampolineAssembly, disp) == 0x7);
static_assert(sizeof(TrampolineAssembly) == 0xB);

The trampoline assembly will define the terminating format of the trampoline assembly. Here, we see 6B dedicated to the instructions we'll copy from the source assembly, followed by a jmp instruction with the associated displacement.

One question would be why we use a return by jmp instead of a ret() on a call function, the motivation here is that the call modifies the stack pointer and jmp's do not. If the instructions which we write over are stack related, such as the push and pop operations common in the x64 calling convention, then the use of call and ret becomes extremely complicated.

Next, the amount of memory required by the termination of the trampoline is allocated, and the displacement between the source assembly and the start of the trampoline is calculated.

TrampolineAssembly* mem = nullptr;
if (const auto it = _6branches.find(a_dst); it != _6branches.end()) {
  mem = reinterpret_cast<TrampolineAssembly*>(it->second);
}
else {
  mem = allocate<TrampolineAssembly>();
  _6branches.emplace(a_dst, reinterpret_cast<std::byte*>(mem));
}

const auto disp = a_dst - (a_src + sizeof(SrcAssembly));
if (!in_range(disp)) {	// the trampoline should already be in range, so this should never happen
  stl::report_and_fail("displacement is out of range"sv);
}

The final part of the function starts with calculating the displacement between the end of the trampoline, and the source assembly + the 6 Byte offset. Then the instructions at the source are copied to the local trampoline. Finally, we overwrite the source location with the branch to our code.

const auto disp_ret = a_src + 6 - (reinterpret_cast<std::uint64_t>(mem) + offsetof(TrampolineAssembly, disp) + 4);
mem->srcOp1 = *reinterpret_cast<std::uint32_t*>(a_src);
mem->srcOp2 = *reinterpret_cast<std::uint16_t*>(a_src + 4);
mem->jmp = 0xE9;
mem->disp = static_cast<std::uint32_t>(disp_ret);

SrcAssembly assembly;
assembly.opcode = 0xE9;
assembly.disp = static_cast<std::int32_t>(disp);
REL::safe_write(a_src, assembly);
⚠️ **GitHub.com Fallback** ⚠️