from a bootable device, reads the boot loader (e.g. the "boot" sector 0 in HDD) into memory
jump to the memory (i.e. execute boot loader) -- bootloader will read the kernel image into the memory.
ELF
starts with fixed-length ELF header, followed by variable-length program header
program header lists each of program sections to be loaded
program sections are:
.text: The program's executable instructions.
.rodata: Read-only data, such as ASCII string constants produced by the C compiler. (We will not bother setting up the hardware to prohibit writing, however.)
.data: The data section holds the program's initialized data, such as global variables declared with initializers like int x = 5;.
When linker computes a memory layout, it reserved space for uninitialized global vars (i.e. .bss) that follows .data in memory.
LMA (load address) vs VMA (link address) in .text section
LMA: memory address at which the section should be loaded into memory
VMA: the memory address from which the section expects to execute
unless PIC (position-independent code). which does not contain any absolute addresses, LMA should be respected
typically, VMA and LMA are the same
How Boot Loader "LOADs" Kernel
voidbootmain(void)
{
structProghdr*ph, *eph;
// read 1st page off diskreadseg((uint32_t) ELFHDR, SECTSIZE*8, 0);
// is this a valid ELF?if (ELFHDR->e_magic!=ELF_MAGIC)
goto bad;
// load each program segment (ignores ph flags)ph= (structProghdr*) ((uint8_t*) ELFHDR+ELFHDR->e_phoff);
eph=ph+ELFHDR->e_phnum;
for (; ph<eph; ph++)
// p_pa is the load address of this segment (as well// as the physical address)readseg(ph->p_pa, ph->p_memsz, ph->p_offset);
// call the entry point from the ELF header// note: does not return!
((void (*)(void)) (ELFHDR->e_entry))();
bad:
outw(0x8A00, 0x8A00);
outw(0x8A00, 0x8E00);
while (1)
/* do nothing */
Kernel
load addresses and link addresses are not the same in kernel
kernel tells the boot lader to load it into memory at a low address (1MB) but it expects to execute from a high address
Before call: during execution of a function; caller needs to call another function
| |
+-----------------+
| saved %ebp | <---- %ebp
+-----------------+
| local var |
+-----------------+
| local var |
+-----------------+
| callee-save Rs | <---- %esp STACK FRAME for current function
+-----------------+-------------------------------------------------------
| |
+-----------------+
CALLER: BEFORE the CALL
Caller push arguments & caller-save registers: Frst argument comes last
caller:
pushl %ebp # make new call frame
movl %esp, %ebp
pushl 3 # push arguments
pushl 2
pushl 1
call callee # call callee
add %esp, 12 # remove arguments from frame
add %eax, 5 # use result (%eax contains return value)
popl %ebp # restore old call frame
ret # return
| |
+-----------------+
| saved %ebp | <---- %ebp
+-----------------+
| local var |
+-----------------+
| local var |
+-----------------+
| callee-save Rs | STACK FRAME for current function
+-----------------+ - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| arg n | STACK FRAME for current function
+-----------------+ extended for call
| ... |
+-----------------+
| arg 0 | <---- %esp STACK FRAME for current function
+-----------------+--------------------------------------------------------
CALLER: CALL instruction
CALL instruction is executed: x86 will push the %eip (i.e. address of next instruction after "call foo") into stack and then jump to CALLEE
| |
+-----------------+
| saved %ebp | <---- %ebp
+-----------------+
| local var |
+-----------------+
| local var |
+-----------------+
| callee-save Rs | STACK FRAME for current function
+-----------------+ - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| arg n | STACK FRAME for current function
+-----------------+ extended for call
| ... |
+-----------------+
| arg 0 |
+-----------------+
| return %eip | <---- %esp STACK FRAME for current function
+-----------------+-------------------------------------------------------
CALEE: CALLEE starts
Call executed at text section: at this point following contract between CALLER and CALLEE
at entry to a function (i.e. just after call):
%eip points at first instruction of function
%esp+4 points at first argument
%esp points at return address
after ret instruction:
%eip contains return address
%esp points at arguments pushed by caller
called function may have trashed arguments
%eax (and %edx, if return type is 64-bit) contains return value (or trash if function is void)
%eax, %edx (above), and %ecx may be trashed
%ebp, %ebx, %esi, %edi must contain contents from time of call
Terminology:
%eax, %ecx, %edx are "caller save" registers
%ebp, %ebx, %esi, %edi are "callee save" registers
CALLEE: prologue
Function prologue: caller do this upon entry
pushl %ebp # save frame pointer into stack
# so that new value can be set to %ebp
movl %esp, %ebp # set new frame pointer;
# current esp becomes ebp
# above two instructions = "enter $0, $0"
subl $80, %esp # allocate stack space (local vars)
pushl %edi # save callee-save registers
pushl %esi # save callee-save registers
pushl %ebx # (in case they are used in function)
Callee executes prologue; prologue updates the stack as follows
| |
+-----------------+
| saved %ebp | <---- [OLD %ebp]
+-----------------+
| local var |
+-----------------+
| local var |
+-----------------+
| callee-save Rs | STACK FRAME for current function
+-----------------+ - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| arg n | STACK FRAME for current function
+-----------------+ extended for call
| ... |
+-----------------+
| arg 0 |
+-----------------+
| return %eip | <---- [OLD %esp] STACK FRAME for current function
+-----------------+-------------------------------------------------------
| OLD %ebp | <---- %ebp (pushl OLD %ebp; and then curr %esp becomes %ebp)
+-----------------+
| local var |
+-----------------+
| local var |
+-----------------+
| callee-save Rs | <---- %esp
+-----------------+
CALLEE: epliogue
Function epilogue: callee do this before return
movl %edi, %eax # set up return value
popl %ebx # restore callee-save registers
popl %esi # restore callee-save registers
popl %edi # restore callee-save registers
movl %ebp, %esp # restore stack pointer
popl %ebp # restore frame pointer
# above two instructions = "leave"
return # pop return address
CALLEE: before return
Callee executes prologue; prologue updates the stack as follows
pop calle--save registers
| |
+-----------------+
| saved %ebp | <---- [OLD %ebp]
+-----------------+
| local var |
+-----------------+
| local var | STACK FRAME for current function
+-----------------+ - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| arg n | STACK FRAME for current function
+-----------------+ extended
| ... |
+-----------------+
| arg 0 |
+-----------------+
| return %eip | <---- [OLD %esp] STACK FRAME for current function
+-----------------+-------------------------------------------------------
| OLD %ebp | <---- %ebp (pushl OLD %ebp; and then curr %esp becomes %ebp)
+-----------------+
| local var |
+-----------------+
| local var | <---- %esp
+-----------------+
"leave": restore stack poitner and frame pointer
| |
+-----------------+
| saved %ebp | <---- %ebp
+-----------------+
| local var |
+-----------------+
| local var | STACK FRAME for current function
+-----------------+ - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| arg n | STACK FRAME for current function
+-----------------+ extended
| ... |
+-----------------+
| arg 0 |
+-----------------+
| return %eip | <---- %esp
+-----------------+
return: pop return $eip
| |
+-----------------+
| saved %ebp | <---- %ebp
+-----------------+
| local var |
+-----------------+
| local var | STACK FRAME for current function
+-----------------+ - - - - - - - - - - - - - - - - - - - - - - - - - - - -
| arg n | STACK FRAME for current function
+-----------------+ extended
| ... |
+-----------------+
| arg 0 | <---- %esp
+-----------------+
CALLER: after return
adjust %esp for arguments
use return value inside %eax
Example
C code
int main(void) { return f(8)+1; }
int f(int x) { return g(x); }
int g(int x) { return x+3; }
uintptr_t: opaque virtual address; only virtual addresses can be dereferenced
this is because, "dereferencing" means going through MMU for address translation; physical addresses are OUTPUT of MMU not INPUT of MMU
physaddr_t: physical address; physical addresses can never be dereferenced
Page tables
32-bit address can have up-to-4GB byte-addressable memory.
let the size of a page be 4KB
i.e. there are 4GB/4KB = 1M pages in 4GB memory
simple 1-dimensional page table:
for each of 2^20 = 1M pages, create one table entry.
there are 1M page table entry, where each entry occupies 4B (total 4MB) -- 4B points the physical address where the page starts
2-level paging scheme
fragment the giant 1M-entry page table into many page tables (1024 tables) and one page directory (1024 entries).
Why UVPT?
We'd like to get the giant conceptual page-table back in some way -- processes in JOS are going to look at it to figure out what's going on in their address space. But how?
Luckily, the paging hardware is great for precisely this -- putting together a set of fragmented pages into a contiguous address space. And it turns out we already have a table with pointers to all of our fragmented page tables: it's the page directory!
So, we can use the page directory as a page table to map our conceptual giant 2^22-byte page table (represented by 1024 pages) at some contiguous 2^22-byte range in the virtual address space. And we can ensure user processes can't modify their page tables by marking the PDE entry as read-only.
Puzzle: do we need to create a separate UVPD mapping too?
CR3 points at the page directory.
The PDX part of the address indexes into the page directory to give you a page table.
The PTX part indexes into the page table to give you a page, and then you add the low bits in.
But the processor has no concept of page directories, page tables, and pages being anything other than plain memory.
So there's nothing that says a particular page in memory can't serve as two or three of these at once.
The processor just follows pointers: pd = lcr3(); pt = *(pd+4*PDX); page = *(pt+4*PTX);
Diagramatically, it starts at CR3, follows three arrows, and then stops.
If we put a pointer into the page directory that points back to itself at index V, as in
then when we try to translate a virtual address with PDX and PTX equal to V, following three arrows leaves us at the page directory. So that virtual page translates to the page holding the page directory. In Jos, V is 0x3BD, so the virtual address of the UVPD is (0x3BD<<22)|(0x3BD<<12).
Now, if we try to translate a virtual address with PDX = V but an arbitrary PTX != V, then following three arrows from CR3 ends one level up from usual (instead of two as in the last case), which is to say in the page tables. So the set of virtual pages with PDX=V form a 4MB region whose page contents, as far as the processor is concerned, are the page tables themselves. In Jos, V is 0x3BD so the virtual address of the UVPT is (0x3BD<<22).
So because of the "no-op" arrow we've cleverly inserted into the page directory, we've mapped the pages being used as the page directory and page table (which are normally virtually invisible) into the virtual address space.