LAB Assignment 2 - seporaitis/xv6-public GitHub Wiki
In this lab, you will write the memory management code for your operating system. Memory management has two components.
The first component is a physical memory allocator for the kernel, so that the kernel can allocate memory and later free it. Your allocator will operate in units of 4096 bytes, called pages. Your task will be to maintain data structures that record which physical pages are free and which are allocated, and how many processes are sharing each allocated page. You will also write the routines to allocate and free pages of memory.
The second component of memory management is virtual memory, which maps the virtual addresses used by kernel and user software to addresses in physical memory. The x86 hardware's memory management unit (MMU) performs the mapping when instructions use memory, consulting a set of page tables. You will modify JOS to set up the MMU's page tables according to a specification we provide.
In the file
kern/pmap.c
, you must implement code for the following functions (probably in the order given).
boot_alloc()
mem_init()
(only up to the call to check_page_free_list(1))page_init()
page_alloc()
page_free()
check_page_free_list()
andcheck_page_alloc()
test your physical page allocator. You should boot JOS and see whethercheck_page_alloc()
reports success. Fix your code so that it passes. You may find it helpful to add your ownassert()
s to verify that your assumptions are correct.
Trivially return current nextfree
and increment the pointer by n
.
static void *
boot_alloc(uint32_t n)
{
static char *nextfree; // virtual address of next byte of free memory
char *result;
// Initialize nextfree if this is the first time.
// 'end' is a magic symbol automatically generated by the linker,
// which points to the end of the kernel's bss segment:
// the first virtual address that the linker did *not* assign
// to any kernel code or global variables.
if (!nextfree) {
extern char end[];
nextfree = ROUNDUP((char *) end, PGSIZE);
}
// Allocate a chunk large enough to hold 'n' bytes, then update
// nextfree. Make sure nextfree is kept aligned
// to a multiple of PGSIZE.
//
result = nextfree;
if (n == 0) {
return result;
}
nextfree = ROUNDUP((char *)(nextfree + n), PGSIZE);
return result;
}
Straightforward allocation of struct PageInfo
array of npages
pages, using boot_alloc()
.
Also, don't forget to comment out the panic
gate.
void
mem_init(void)
{
// ...
// Remove this line when you're ready to test this function.
//panic("mem_init: This function is not finished\n");
// ...
//////////////////////////////////////////////////////////////////////
// Allocate an array of npages 'struct PageInfo's and store it in 'pages'.
// The kernel uses this array to keep track of physical pages: for
// each physical page, there is a corresponding struct PageInfo in this
// array. 'npages' is the number of physical pages in memory. Use memset
// to initialize all fields of each struct PageInfo to 0.
pages = (struct PageInfo *) boot_alloc(sizeof(struct PageInfo) * npages);
memset(pages, 0, sizeof(struct PageInfo) * npages);
// ...
}
There are two parts to this:
- Starting implementation marks all pages as free by putting them
into
page_free_list
. This means the task must exclude certain pages from being put intopage_free_list
. - Kernel is located at the start of
EXTPHYSMEM
so the code has to find out where it ends and skip those pages as well.
void
page_init(void)
{
uint32_t kernend = (((uint32_t) boot_alloc(0)) - KERNBASE) >> PGSHIFT;
size_t i;
for (i = 0; i < npages; i++) {
// include [1; npages_basemem)
// include [kernend; ...)
if ((i > 0 && i < npages_basemem) || i > kernend) {
pages[i].pp_ref = 0;
pages[i].pp_link = page_free_list;
page_free_list = &pages[i];
}
}
}
At this point, running make qemu-nox
will do either of two things:
- fail with triple fault or restart endlessly, which means there was
an error in
page_init
that didn't set uppage_free_list
properly; -
panic:
message will appear in the output, giving a clue as to what should be implemented next (because one ofassert
s incheck_page_alloc()
will be failing).
Picks the head of page_free_list
, to be returned and moves
page_free_list
pointer to the next element.
struct PageInfo *
page_alloc(int alloc_flags)
{
struct PageInfo *page;
if (!page_free_list)
return NULL;
page = page_free_list;
page_free_list = page->pp_link;
page->pp_link = NULL;
if ((alloc_flags & ALLOC_ZERO) > 0) {
memset((void *)page2kva(page), '\0', PGSIZE);
}
return page;
}
Puts the pp
argument back to the head of page_free_list
.
void
page_free(struct PageInfo *pp)
{
if (pp->pp_ref > 0)
panic("page_free: tried to free a page that is has reference count > 0!\n");
if (pp->pp_link != NULL)
panic("page_free: tried to free a page that is linked!\n");
pp->pp_link = page_free_list;
page_free_list = pp;
}
At this point check_page_alloc() succeeded!
should be present in
make qemu-nox
output. Below it, there will be a panic from
check_page()
, which is not part of this exercise.
Look at chapters 5 and 6 of the Intel 80386 Reference Manual, if you haven't done so already. Read the sections about page translation and page-based protection closely (5.2 and 6.4). We recommend that you also skim the sections about segmentation; while JOS uses paging for virtual memory and protection, segment translation and segment-based protection cannot be disabled on the x86, so you will need a basic understanding of it.
Notes on address translation, page translation, segment translation and protection.
Assuming that the following JOS kernel code is correct, what type should variable
x
have,uintptr_t
orphysaddr_t
?
mystery_t x; char* value = return_a_pointer(); *value = 10; x = (mystery_t) value;
It should be uintptr_t
, because as the lab description suggests:
The JOS kernel sometimes needs to read or modify memory for which it knows only the physical address. For example, adding a mapping to a page table may require allocating physical memory to store a page directory and then initializing that memory. However, the kernel, like any other software, cannot bypass virtual memory translation and thus cannot directly load and store to physical addresses. One reason JOS remaps all of physical memory starting from physical address 0 at virtual address
0xf0000000
is to help the kernel read and write memory for which it knows just the physical address. In order to translate a physical address into a virtual address that the kernel can actually read and write, the kernel must add0xf0000000
to the physical address to find its corresponding virtual address in the remapped region. You should useKADDR(pa)
to do that addition.
In the file
kern/pmap.c
, you must implement code for the following functions.
pgdir_walk()
boot_map_region()
page_lookup()
page_remove()
page_insert()
check_page()
, called frommem_init()
, tests your page table management routines. You should make sure it reports success before proceeding.
Pretty straightforward routine:
- determines page directory index (
pdx
) from the linear address (va
); - if it does not exist, either returns
NULL
or allocates a page for page table; - if it exists - loads it;
- returns the pointer to corresponding page table entry (
pte[ptx]
).
pte_t *
pgdir_walk(pde_t *pgdir, const void *va, int create)
{
uint32_t pdx = PDX(va);
uint32_t ptx = PTX(va);
struct PageInfo *pp;
pte_t *pte;
if (pgdir[pdx] == 0) {
if (create) {
pp = page_alloc(ALLOC_ZERO);
if (!pp) {
return NULL;
}
pp->pp_ref++;
pgdir[pdx] = page2pa(pp) | PTE_W | PTE_U | PTE_P;
pte = (pte_t *)page2kva(pp);
} else {
return NULL;
}
} else {
pte = (pte_t *)KADDR(PTE_ADDR(pgdir[pdx]));
}
return &pte[ptx];
}
Just walk through [va; va + size)
in PGSIZE
increments filling in
page table entries.
static void
boot_map_region(pde_t *pgdir, uintptr_t va, size_t size, physaddr_t pa, int perm)
{
pte_t *pte;
int offset;
for (offset = 0; offset < size; offset += PGSIZE) {
pte = pgdir_walk(pgdir, (void *) (va + offset), 1);
*pte = (pa + offset) | perm | PTE_P;
}
}
Retrieve page table entry (pte
) and get the page from the last 20
bits.
struct PageInfo *
page_lookup(pde_t *pgdir, void *va, pte_t **pte_store)
{
pte_t *pte = pgdir_walk(pgdir, va, 0);
if (!pte)
return NULL;
if (pte_store)
*pte_store = pte;
return pa2page(*pte);
}
Lookup the page (pp
) and it's associated page table entry
(pte
). If entry is valid - clear the entry, decrement reference
counter (freeing page, if it's not used anymore) and invalidate TBL
entry.
void
page_remove(pde_t *pgdir, void *va)
{
pte_t *pte_store[1], *pte;
struct PageInfo *pp = page_lookup(pgdir, va, pte_store);
if (!pp)
return;
pte = pte_store[0];
if (pte) {
*pte = 0;
page_decref(pp);
tlb_invalidate(pgdir, va);
}
}
First checks if page table entry exists, and removes the page if it's not the same page as the one we're currently inserting.
The tricky bit (for me, at least, was) to increment the reference counter only when the page table entry is updated with a different page.
If it did that for the same page (the mentioned corner case) - it
would double count and one of asserts in check_page
would fail.
int
page_insert(pde_t *pgdir, struct PageInfo *pp, void *va, int perm)
{
pte_t *pte, *pte_store[1];
pte = pgdir_walk(pgdir, va, 0);
if (pte) {
if (*pte) {
if (PTE_ADDR(*pte) != PTE_ADDR(page2pa(pp))) {
page_remove(pgdir, va);
}
}
} else {
pte = pgdir_walk(pgdir, va, 1);
if (!pte)
return -E_NO_MEM;
}
if (PTE_ADDR(*pte) != PTE_ADDR(page2pa(pp)))
pp->pp_ref++;
*pte = page2pa(pp) | perm | PTE_P;
tlb_invalidate(pgdir, va);
return 0;
}
At this point make qemu-nox
should show check_page() succeeded!
.
Fill in the missing code in
mem_init()
after the call tocheck_page()
.Your code should now pass the
check_kern_pgdir()
andcheck_page_installed_pgdir()
checks.
If stuck, this can be reverse engineered from looking at assertions in
check_kern_pgdir()
.
Also, it is possible to use mon_backtrace(0, NULL, NULL)
in the code
where panic
is called (e.g. pa2page
) if there's a need to check
where the panic is coming from.
void
mem_init(void)
{
uint32_t cr0;
size_t n, i;
// ...
//////////////////////////////////////////////////////////////////////
// Map 'pages' read-only by the user at linear address UPAGES
// Permissions:
// - the new image at UPAGES -- kernel R, user R
// (ie. perm = PTE_U | PTE_P)
// - pages itself -- kernel RW, user NONE
// Your code goes here:
boot_map_region(kern_pgdir,
UPAGES,
ROUNDUP(npages * sizeof(struct PageInfo), PGSIZE),
PADDR(pages),
PTE_U | PTE_P);
//////////////////////////////////////////////////////////////////////
// Use the physical memory that 'bootstack' refers to as the kernel
// stack. The kernel stack grows down from virtual address KSTACKTOP.
// We consider the entire range from [KSTACKTOP-PTSIZE, KSTACKTOP)
// to be the kernel stack, but break this into two pieces:
// * [KSTACKTOP-KSTKSIZE, KSTACKTOP) -- backed by physical memory
// * [KSTACKTOP-PTSIZE, KSTACKTOP-KSTKSIZE) -- not backed; so if
// the kernel overflows its stack, it will fault rather than
// overwrite memory. Known as a "guard page".
// Permissions: kernel RW, user NONE
// Your code goes here:
boot_map_region(kern_pgdir,
KSTACKTOP - KSTKSIZE,
ROUNDUP(KSTKSIZE, PGSIZE),
PADDR(bootstack),
PTE_W | PTE_P);
//////////////////////////////////////////////////////////////////////
// Map all of physical memory at KERNBASE.
// Ie. the VA range [KERNBASE, 2^32) should map to
// the PA range [0, 2^32 - KERNBASE)
// We might not have 2^32 - KERNBASE bytes of physical memory, but
// we just set up the mapping anyway.
// Permissions: kernel RW, user NONE
// Your code goes here:
boot_map_region(kern_pgdir,
KERNBASE,
ROUNDUP(-KERNBASE-1, PGSIZE),
0,
PTE_W | PTE_P);
// ...
}
At this point make qemu-nox
should show all checks succeeding and
make grade
should report Score: 70/70
.
2. What entries (rows) in the page directory have been filled in at this point? What addresses do they map and where do they point? In other words, fill out this table as much as possible:
Based on inc/memlayout.h
and inspecting kern_pgdir
in gdb
:
Entry | Base Virtual Address | Points to (logically) |
---|---|---|
1023 | 0xffc00000 | Kernel space |
. | . | . |
960 | 0xF0000000 | Pages Array |
959 | 0xEFC00000 | Kernel Stack |
958 | 0xEF800000 | ? (NOTE: don't know!) |
957 | 0xEF400000 | Current Page Table |
956 | 0xEF000000 | Pages Array |
955 | 0xEEC00000 | Empty |
. | . | . |
2 | 0x00800000 | Empty |
1 | 0x00400000 | Empty |
0 | 0x00000000 | Empty |
3. We have placed the kernel and user environment in the same address space. Why will user programs not be able to read or write the kernel's memory? What specific mechanisms protect the kernel memory?
Because of CPU memory protection and the fact that
kernel addresses have not their user bit (PTE_U
) set in page table.
4. What is the maximum amount of physical memory that this operating system can support? Why?
2^32
or 4GBytes because the way
80386 page tables work:
A page table of the second level addresses up to 1K pages. All the tables addressed by one page directory, therefore, can address 1M pages (2^20). Because each page contains 4K bytes (2^12 bytes), the tables of one page directory can span the entire physical address space of the 80386 (2^20 x 2^12 = 2^32).
In reality is a little bit less because of overheads: page table, kernel stack, pages array, kernel space, etc.
5. How much space overhead is there for managing memory, if we actually had the maximum amount of physical memory? How is this overhead broken down?
Page directory size is 1 page, that pointers to 1024 page tables. Page
directory size is 4K bytes, each page table also takes a page. So in
total it is 4KBytes * 1024 + 4KBytes
- a little above 4MBytes.
6. Revisit the page table setup in
kern/entry.S
andkern/entrypgdir.c
. Immediately after we turn on paging,EIP
is still a low number (a little over 1MB). At what point do we transition to running at anEIP
aboveKERNBASE
? What makes it possible for us to continue executing at a lowEIP
between when we enable paging and when we begin running at anEIP
aboveKERNBASE
? Why is this transition necessary?
We transition in kern/entry.S
:
# Now paging is enabled, but we're still running at a low EIP
# (why is this okay?). Jump up above KERNBASE before entering
# C code.
mov $relocated, %eax
jmp *%eax
The initial, static, kernel page directory enables us to do that,
because it maps the same [0; 4MB)
physical addresses to [0; 4MB)
and [KERNBASE; KERNBASE + 4MB)
virtual in kern/entrypgdir.c
:
__attribute__((__aligned__(PGSIZE)))
pde_t entry_pgdir[NPDENTRIES] = {
// Map VA's [0, 4MB) to PA's [0, 4MB)
[0]
= ((uintptr_t)entry_pgtable - KERNBASE) + PTE_P,
// Map VA's [KERNBASE, KERNBASE+4MB) to PA's [0, 4MB)
[KERNBASE>>PDXSHIFT]
= ((uintptr_t)entry_pgtable - KERNBASE) + PTE_P + PTE_W
};
The transition is necessary, because after the real paging table is set up, low (virtual) memory addresses are mapped to empty space. So if the jump wouldn't happen - kernel would crash.
Challenge! Extend the JOS kernel monitor with commands to:
- Display in a useful and easy-to-read format all of the physical page mappings (or lack thereof) that apply to a particular range of virtual/linear addresses in the currently active address space. For example, you might enter 'showmappings 0x3000 0x5000' to display the physical page mappings and corresponding permission bits that apply to the pages at virtual addresses 0x3000, 0x4000, and 0x5000.
- Explicitly set, clear, or change the permissions of any mapping in the current address space.
- Dump the contents of a range of memory given either a virtual or physical address range. Be sure the dump code behaves correctly when the range extends across page boundaries!
- Do anything else that you think might be useful later for debugging the kernel. (There's a good chance it will be!)
Takes input start
address and, optionally, end
address and prints
mappings for PGSIZE
chunks of memory.
int
mon_showmappings(int argc, char **argv, struct Trapframe *tf)
{
if (argc < 2) {
cprintf("showmappings <start> [<end>]\n");
return 1;
}
uint32_t start = (uint32_t)strtol(argv[1], NULL, 0);
uint32_t end = start + PGSIZE;
if (argc > 2)
end = (uint32_t)strtol(argv[2], NULL, 0) + PGSIZE;
if (start == 0 && end == 0) {
cprintf("Couldn't parse start and end addresses: %d %d.\n", start, end);
return 1;
}
cprintf("|%-5s|%-19s|%-19s|%-6s|%-11s|%-11s|\n",
"PDX",
"Virtual Address",
"Physical Address",
"User",
"Writable",
"Present");
uint32_t i;
pte_t *pte;
for (i = start; i < end; i += PGSIZE) {
pte = pgdir_walk(kern_pgdir, (void *)i, 0);
if (pte == NULL) {
cprintf("| %4d| 0x%08x | %-18s| %-5s| %-10s| %-10s|\n",
PDX(i),
i,
"Unmapped",
".",
".",
".");
continue;
}
cprintf("| %4d", PDX(i));
cprintf("| 0x%08x ", i);
cprintf("| 0x%08x ", PTE_ADDR(*pte));
cprintf("| %-5s", (*pte & PTE_U) ? "Yes" : "No");
cprintf("| %-10s", (*pte & PTE_W) ? "Yes" : "No");
cprintf("| %-10s", (*pte & PTE_P) ? "Yes" : "No");
cprintf("|\n");
}
return 0;
}
Sample output:
K> showmappings 0xefff7000 0xf0000000
|PDX |Virtual Address |Physical Address |User |Writable |Present |
| 959| 0xefff7000 | 0x00000000 | No | No | No |
| 959| 0xefff8000 | 0x0010f000 | No | Yes | Yes |
| 959| 0xefff9000 | 0x00110000 | No | Yes | Yes |
| 959| 0xefffa000 | 0x00111000 | No | Yes | Yes |
| 959| 0xefffb000 | 0x00112000 | No | Yes | Yes |
| 959| 0xefffc000 | 0x00113000 | No | Yes | Yes |
| 959| 0xefffd000 | 0x00114000 | No | Yes | Yes |
| 959| 0xefffe000 | 0x00115000 | No | Yes | Yes |
| 959| 0xeffff000 | 0x00116000 | No | Yes | Yes |
| 960| 0xf0000000 | 0x00000000 | No | Yes | Yes |
Explicitly sets flags on a page table entry:
int
mon_setflags(int argc, char **argv, struct Trapframe *tf)
{
if (argc < 4) {
cprintf("setflags <va> <user> <writable> <present>");
return 1;
}
uint32_t va = (uint32_t)strtol(argv[1], NULL, 0);
uint32_t perms = 0;
if ((uint32_t)strtol(argv[2], NULL, 0) > 0) {
perms |= PTE_U;
}
if ((uint32_t)strtol(argv[3], NULL, 0) > 0) {
perms |= PTE_W;
}
if ((uint32_t)strtol(argv[4], NULL, 0) > 0) {
perms |= PTE_P;
}
pte_t *pte = pgdir_walk(kern_pgdir, (void *)va, 0);
if (pte == NULL) {
cprintf("Page table entry for VA 0x%x does not exist.\n", va);
return 1;
}
*pte = PTE_ADDR(*pte) | perms;
return 0;
}
Prints out requested amount of virtual or physical memory bytes.
int
mon_dumpmemory(int argc, char **argv, struct Trapframe *tf)
{
if (argc < 3) {
cprintf("%s <addr> <amount>\n", argv[0]);
return 1;
}
uint32_t *ptr = NULL;
uint32_t amount = (uint32_t)strtol(argv[2], NULL, 0);
if (strcmp(argv[0], "dumpphysical") == 0) {
physaddr_t addr = (physaddr_t)strtol(argv[1], NULL, 0);
ptr = KADDR(addr);
} else if (strcmp(argv[0], "dumpvirtual") == 0) {
ptr = (void *)strtol(argv[1], NULL, 0);
} else {
cprintf("Unknown command!\n");
return 1;
}
uint32_t i;
for (i = 0; i <= amount; i += 1) {
cprintf("0x%08x: %08x\n", ptr + i, *(ptr+i));
}
return 0;
}
Prints page directory flags for specified linear address:
int
mon_getpdflags(int argc, char **argv, struct Trapframe *tf)
{
if (argc < 2) {
cprintf("getpdflags <va>\n");
return 1;
}
uint32_t va = (uint32_t)strtol(argv[1], NULL, 0);
pde_t pde = kern_pgdir[PDX(va)];
cprintf("Page directory flags for VA 0x%x:\n", va);
cprintf(" - User: %s\n", (pde & PTE_U) ? "Yes" : "No");
cprintf(" - Writable: %s\n", (pde & PTE_W) ? "Yes" : "No");
cprintf(" - Present: %s\n", (pde & PTE_P) ? "Yes" : "No");
return 0;
}
Sets page directory flags for specified linear address:
int
mon_setpdflags(int argc, char **argv, struct Trapframe *tf)
{
if (argc < 4) {
cprintf("setpdflags <va> <user> <writable> <present>\n");
return 1;
}
uint32_t va = (uint32_t)strtol(argv[1], NULL, 0);
uint32_t perms = 0;
if ((uint32_t)strtol(argv[2], NULL, 0) > 0) {
perms |= PTE_U;
}
if ((uint32_t)strtol(argv[3], NULL, 0) > 0) {
perms |= PTE_W;
}
if ((uint32_t)strtol(argv[4], NULL, 0) > 0) {
perms |= PTE_P;
}
pde_t *pde = &kern_pgdir[PDX(va)];
if (pde == NULL) {
cprintf("Page directory entry or VA 0x%x does not exist.\n", va);
return 1;
}
*pde = (*pde & ~0x3FF) | perms;
return 1;
}