Proof of Concept (PoC) - 9elements/LinuxBootSMM GitHub Wiki
This page describes the PoC implementation of MM payload. The design itself is described in dedicated page, and here we discuss less about the flow, and more about the implementation.
As mentioned in the design page, the loader has to:
- place the handler in the lower 4GB of the memory
- perform relocations and calculate offsets of the entrypoints
- fill out the header with offsets, signature and handler's size
- and fire out an SMI with the address where the loader placed the handler.
The handler is compiled as a separate binary and later included in an object that is visible to the loader:
SYM_DATA_START(mm_blob)
.incbin "drivers/firmware/google/mm_handler/handler.bin"
SYM_DATA_END_LABEL(mm_blob, SYM_L_GLOBAL, mm_blob_end)
SYM_DATA_START(mm_relocs)
.incbin "drivers/firmware/google/mm_handler/handler.relocs"
SYM_DATA_END(mm_relocs)
this object will then be copied to the so called shared buffer - an arbitrary address in the lower 4GB of memory:
blob_size = mm_payload_size_needed();
shared_buffer = (void *)__get_free_pages(GFP_DMA32, get_order(blob_size));
memcpy(shared_buffer, mm_blob, blob_size);
wbinvd();
the blob_size
is calculated by in the helper function and it is equal to mm_blob_end - mm_blob
. At this point the handler should be in the lower 4GB memory, and we can perform 16 and 32 bit relocations (based on arch/x86/realmode/init.c):
phys_base = __pa(shared_buffer);
real_mode_seg = phys_base >> 4;
rel = (u32 *)mm_relocs;
/* 16-bit segment relocations. */
count = *rel++;
while (count--) {
u16 *seg = (u16 *) (shared_buffer + *rel++);
*seg = real_mode_seg;
}
/* 32-bit linear relocations. */
count = *rel++;
while (count--) {
u32 *ptr = (u32 *)(shared_buffer + *rel++);
*ptr += phys_base;
}
now we can start assigning header values and calculate offsets:
mm_header = (struct mm_header *)shared_buffer;
mm_header->mm_signature = REALMODE_END_SIGNATURE;
mm_header->mm_blob_size = mm_payload_size_needed();
/* At this point relocations are done and we can do some cool
* pointer arithmetics to help coreboot determine correct entry
* point based on offsets.
*/
entry32_offset = mm_header->mm_entry_32 - (unsigned long)shared_buffer;
entry64_offset = mm_header->mm_entry_64 - (unsigned long)shared_buffer;
mm_header->mm_entry_32 = entry32_offset;
mm_header->mm_entry_64 = entry64_offset;
With all of this prepared, the loader can send out SMI to let coreboot know where to look for payloads' SMI handler. This is done using an inline assembly function, which currently works only if the kernel was compiled targeting x86_64 (although it would be fairly possible to write trigger_smi32
which would use 32bit registers instead of 64bit ones):
static int trigger_smi(u64 cmd, u64 arg, u64 retry)
{
u64 status;
u16 apmc_port = 0xb2;
asm volatile("movq %[cmd], %%rax\n\t"
"movq %%rax, %%rcx\n\t"
"movq %[arg], %%rbx\n\t"
"movq %[retry], %%r8\n\t"
".trigger:\n\t"
"mov %[apmc_port], %%dx\n\t"
"outb %%al, %%dx\n\t"
"cmpq %%rcx, %%rax\n\t"
"jne .return_changed\n\t"
"pushq %%rcx\n\t"
"movq $10000, %%rcx\n\t"
"rep nop\n\t"
"popq %%rcx\n\t"
"cmpq $0, %%r8\n\t"
"je .return_not_changed\n\t"
"decq %%r8\n\t"
"jmp .trigger\n\t"
".return_changed:\n\t"
"movq %%rax, %[status]\n\t"
"jmp .end\n\t"
".return_not_changed:"
"movq %%rcx, %[status]\n\t"
".end:\n\t"
: [status] "=r"(status)
: [cmd] "r"(cmd), [arg] "r"(arg), [retry] "r"(retry),
[apmc_port] "r"(apmc_port)
: "%rax", "%rbx", "%rdx", "%rcx", "%r8");
if (status == cmd || status == PAYLOAD_MM_RET_FAILURE)
status = PAYLOAD_MM_RET_FAILURE;
else
status = PAYLOAD_MM_RET_SUCCESS;
return status;
}
The interface on the boot time is responsible for verifying that the driver placed the handler in given memory region, and copying the handler to SMRAM. It will look for a header at the received address, verify the shared signature and whether payload's handler fits the SMRAM region:
struct mm_header *mm_header;
mm_header = (void *)mm_address;
if (mm_header->mm_signature != REALMODE_END_SIGNATURE) {
printk(BIOS_WARNING, "MM signature mismatch! Bootloader: %x, Payload: %x.\n",
REALMODE_END_SIGNATURE, mm_header->mm_signature);
return PAYLOAD_MM_RET_FAILURE;
}
if (mm_header->mm_blob_size > payload_mm_region_size) {
printk(BIOS_WARNING, "Payload wants to reserve more space that available!\n");
return PAYLOAD_MM_RET_FAILURE;
}
The header has the following structure:
struct mm_header {
u32 text_start;
u32 mm_entry_32;
u32 mm_entry_64;
u32 mm_signature;
u32 mm_blob_size;
};
If the header was found, the handler will be copied to SMRAM:
memcpy((void *)payload_mm_region_base, (void *)mm_address, payload_mm_blob_size);
printk(BIOS_DEBUG, "Payload MM loaded core module to 0x%lx.\n", payload_mm_region_base);
mm_entrypoint_address = payload_mm_get_entrypoint(payload_mm_region_base, payload_mm_region_size);
if (!mm_entrypoint_address)
return PAYLOAD_MM_RET_FAILURE;
return PAYLOAD_MM_RET_SUCCESS;
The payload_get_entrypoint()
is relies on the information from the header, however, to make sure that handler was indeed copied to SMRAM, it will look for the header at the base address of the dedicated SMRAM region:
struct mm_header *mm_header = (void *)region_base;
if (mm_header->mm_signature != REALMODE_END_SIGNATURE) {
printk(BIOS_WARNING, "MM signature mismatch! Bootloader: %x, Payload: %x.\n",
REALMODE_END_SIGNATURE, mm_header->mm_signature);
return 0;
}
if (!ENV_X86_64)
return region_base + mm_header->mm_entry_32;
return region_base + mm_header->mm_entry_64;
the entry point is, depending on the bitness of coreboot, either the SMRAM payload region base address + offset calculated by the loader.
On the normal SMI event, that is after the boot was finished, the role of coreboot is reduced to calling payloads' handler with appropriate arguments. In general, the passed arguments have following structure:
struct lb_entry_context {
uint16_t command;
uint32_t pm1_cnt;
uint16_t acpi_base;
};
these are then passed with the call:
void (*mm_entrypoint)(struct lb_entry_context) = (void *)(uintptr_t)mm_entrypoint_address;
mm_entrypoint(lb_entry_context);
For the time being, the Linux-owned payload is written fully in x86 assembly - combining it with C code would introduce unnecessary complexity which is not need for the currently implemented features (see porting page). Depending on the coreboot bitness, different entry points are called (see above). In both cases, handler preserves current state of the registers on the stack:
push %esp
// ebx, esi, edi and ebp are going to be preserved (see comment in smm_stub.S if target is x86_64)
push %ebx
push %esi
push %edi
push %ebp
push %eax
in case of 64bit coreboot, it becomes more fun, since we are preserving cr registers as well (which is essentially setting up the ground for the call to C code):
pushq %rsp
pushq %rbp
pushq %rbx
pushq %r12
pushq %r13
pushq %r14
pushq %r15
movq %cr3, %rax
pushq %rax
movq %cr4, %rbx
pushq %rbx
or $0x640, %rbx
movq %rbx, %cr4
movq %cr0, %rbx
pushq %rbx
or $0x22, %rbx
mov %rbx, %cr0
movq %rsp, %r12
andq $~0xF, %rsp
subq $0x200, %rsp
fxsave64 (%rsp)
In both cases, the state must be restored before giving control back to coreboot.
Since both coreboot and Linux are using the same ABI (Sys V), it makes things straight forward to handle the arguments.
Calling mm_entry_32
from coreboot pushes lb_entry_context
to the stack and increments esp by 4, so now our stack looks like this:
| third arg |
| second arg |
| first arg |
| return address |
| stack pointer | <- esp
Then we push all the registers (see above) and hence our stack looks like this now:
| third arg |
| second arg |
| first arg |
| return address |
| esp (preserved)|
| ebx |
| esi |
| edi |
| ebp |
| eax |
| stack pointer | <- esp
So, now to get the entry we need, we do (9 * 4)(esp) to get third argument (ACPI base address), (8 * 4)(esp) to get second argument (PM1_CNT byte), and (7 * 4) to get the command (for more details see [1] Ch.3, p.5). Thus:
cmpl $MM_ACK, 28(%esp)
je ack32
cmpl $MM_ACPI_ENABLE, 28(%esp)
je acpi_enable32
cmpl $MM_ACPI_DISABLE, 28(%esp)
je acpi_disable32
cmpl $MM_STORE, 28(%esp)
je mm_store32
jmp restore_cb_state
In case of 64-bit coreboot, the process is less complicated. As described in Sys V ABI for x86_64 [2] Ch.3.2.2, p.20, the function arguments will be stored in from left to right in rdi, rsi, rdx, rcx, r8 and r9. Now taking a look at the struct that we pass along (see above), we will only care about rdi and rsi. First two elements fits in the rdi. Since command
is an 16-bit integer and it is the first passed argument, we know it will be in di. Thus:
cmp $MM_ACK, %di
je ack
cmp $MM_ACPI_DISABLE, %di
je acpi_disable
cmp $MM_ACPI_ENABLE, %di
je acpi_enable
cmp $MM_STORE, %di
je mm_store
jmp restore_cb_state64
Regardless of the bitness, enabling/disabling ACPI suffices to writing PM1_CNT & ~SCI_EN
byte to ACPI_BASE_ADDRESS
port. These are SoC dependent, which is described more in depth in the page about porting. The difference here lies (again) in passing the arguments. With 32-bit coreboot:
// PM1_CNT & ~SCI_EN
mov 32(%esp), %ax
add $MM_ACPI_ENABLE, %ax
// ACPI_BASE_ADDR
mov 36(%esp), %dx
out %ax, %dx
since command
is stored in 28(%esp) then next element (pm1_cnt
) must be (28 + 4)(%esp) and so on.
In case of 64-bit coreboot we have:
shr $32, %rdi
// PM1_CNT & ~SCI_EN
mov %di, %ax
add $MM_ACPI_ENABLE, %ax
// si contains ACPI_BASE_ADDR
mov %si, %dx
out %ax, %dx
here we first stash the command
from rdi by moving rdi 32bits to the right. Then we know that pm1_cnt
is in di. As mentioned above, third element of the struct will be in rsi, and since it is 16bit integer, we know it is also in si.
[1] System V Application Binary Interface, Intel386 Architecture Processor Supplement, Fourth Edition
[2] System V Application Binary Interface, AMD64 Architecture Processor Supplement, Draft Version 0.99.6