VM - brown-cs1690/handout GitHub Wiki
Virtual Memory
Introduction
At this point, your Weenix contains a threading library, some thin wrappers around device drivers, and basic file system support with a caching layer. By the end of this assignment, Weenix will be a full operating system. With the addition of virtual memory, your kernel will start managing user address spaces, running user-level code, and servicing system calls.
This assignment is substantial, and also very prone to difficult bugs. Before you begin, make sure the rest of your kernel is functioning exactly as you expect. You will undoubtedly uncover bugs in old code throughout the course of this assignment, but minimizing the number you find before you start will be helpful. Make sure to start early and ask questions frequently; it is very easy to get lost in this assignment.
Remember to turn the VM
project on in Config.mk
and make clean
your project before you try to run your changes.
Because VM bugs can spring up in code you wrote months ago, this is where you will probably find out whether or not your implementations of the previous assignments are up to par. We would like to point out that there are several Weenix- and OS-specific debugging tools and techniques in Appendix Debugging which will be extremely useful if you have not been using them so far.
For additional wiki information, also see the Files and Memory Wiki.
Finally, use the FAQ on Edstem (to be released).
Virtual Memory Maps
The first thing you should do in this assignment is write the code for managing a process’ virtual address space. The virtual address space for a process (also known as its “memory map”) is stored as a linked list of virtual memory areas (also referred to as “memory regions”), each of which correspond to some memory object which provides pages of memory to the process on demand. As you have likely already realized, this means that everything from files to disks can be mapped into the address space of Weenix processes, and it should now make even more sense why we used memory objects extensively in the last assignment instead of reading and writing directly to disk. Of course, some memory areas will not correspond to existing data (the stack and heap, for instance). We will explain how that works in the section on Anonymous Objects.
In order to manage address spaces, you must maintain each process’ list of virtual memory areas. Each memory region is essentially a range of virtual page numbers, a memory object, and the page offset into that memory object that the first page of the virtual memory area corresponds to. These numbers are all stored at page resolution instead of byte (address) resolution because x86 paging manages memory at page granularity. Therefore, defining sub-page level permissions wouldn't make sense since they would not be enforced by the memory management unit (MMU).
For example, a memory region could span from page 2000 to 2005, be backed by a memory object corresponding to a file, and have offset 3. In this example, the virtual page 2000 (which is at address 2000*PAGE_SIZE) should contain the content of the 3rd page of the file (bytes 3*PAGE_SIZE to 4*PAGE_SIZE - 1).
You must keep the areas sorted by the start of their virtual page ranges and ensure that no two ranges overlap with each other. There will be several edge cases (which are better documented in the code) where you will have to unmap a section of a virtual memory area, which could require splitting an area or truncating the beginning or end of an area.
Note: While there is very little conceptually difficult code to write in this
section of the assignment, off-by-one bugs are extremely common and
become very difficult to track down later on, so unit-testing this code
is a good idea (see kernel/test/vmtest.c
).
Page Fault Handler
After your memory maps are working, you will need a way to load the data into memory when a process attempts to access it. This is done by the page fault handler. The page fault handler is triggered by a processor interrupt when a process attempts to access an address for which it has no lookup entry in the page table or the permissions on that entry do not allow the type of access that is being attempted.
At this point in the project, any page faults that have occurred have resulted in a kernel panic. That is because those were kernel-level page faults, which Weenix does not support. Essentially, Weenix requires the entire kernel address space to reside in memory at all times. This functionality is written into a wrapper for the page fault handler you will write which short-circuits kernel page faults. More details on how this function works can most easily be found in the code.
The combination of the page fault handler and the virtual memory maps
should be enough to get a very simple page fault to occur in a userland
program. At this point, you can set up a userland program to run from
inside the init
process by running kernel_execve()
, passing the path
to any program on your (virtual) disk as an argument. Similar to the
exec()
system call, this will replace the memory map of the current
process to set up another program to run, but it will be better than
exec()
in this case because the setup is done exclusively in kernel
space, so it can be used before you have a fully functional userspace.
When the program begins, it should cause a page fault to be generated.
This is your first step towards having a functional userland.
A simple implementation of the page fault handler will be enough to
start with, but eventually, this will be a relatively logic-heavy
function. Make sure to check the stencil comments to fully familiarise yourself with all functionality. Note that in your implementation, given the right conditions, the page fault handler may kill the current process with an exit status of EFAULT
,
but if Weenix supported UNIX signals it would send a SIGSEGV
signal instead.
If the page fault handler finds a virtual memory area, Weenix must search for the missing page and map it into the page table of the current process so that the access can be retried using the virtual address of the page that’s being added and the physical address where it resides. Fetching the missing page will require a lot of help from the page frame caching system, namely for looking up the page and dirtying it if the access is a write. This, in turn, will rely on two new types of memory objects you will need to implement (anon objects and shadow objects are covered in their separate sections below).
The Memory Management Unit
In order to map the virtual address to its corresponding page of memory, you will need to use the page table functions. A good portion of memory management is done for you, but you will have to fill in page table entries when page faults occur, flush the translation lookaside buffer (TLB) when necessary, and manage copy-on-write pages yourself.
Anonymous Objects
So far, you have used the memory objects of your block device and files to fill page frames as you needed data from disk, but it does not make sense to back some virtual memory areas, such as a process’ stack, with data on disk. What you often want is objects which initialize pages by filling them with zeroes. These are known as anonymous objects since they are not backed by any persistent data (which would have a filename associated with it). Anonymous objects are relatively simple to implement, so look for a better description of how they work in the code comments.
Notably, anonymous objects cannot be paged out in Weenix. The designers chose not to implement this feature because memory pressure will rarely be an issue in your operating system, and implementing a swap space is not terribly interesting or vital to implement as a result.
System Calls
System calls are the only way user processes can communicate with the
Weenix kernel directly. The way that system calls are generated from
user space is by generating a software interrupt (using the x86 int
instruction) with the arguments to the system call stored in the
registers or on the stack. This causes an interrupt in the kernel, where
the number in an agreed-upon register designates which system call is
being used, and then the corresponding system call function is actually
called to handle the request after the arguments have been parsed out of
their registers. Most of the system calls have already been written for
you, however, in order to give you some understanding of the process
involved, you will need to write a few yourself.
Kernel System Call Interface
You will need to implement the kernel targets for read()
, write()
,
and getdents()
. While most of what you need to do should be pretty
self-explanatory after reading through other system call
implementations, you must also write two helper functions to check
accesses to user memory from within the kernel.
Accessing User Memory
The code to handle traps and access user memory from the kernel has been
written for you. However, many of these functions need to check to see
if a region of user memory is a valid section of the process’ address
space. To check this, you will need to implement addr_perm()
and
range_perm()
, which will rely on your virtual memory map code.
These checks are performed in copy_from_user()
and copy_to_user()
which
is why you should use these functions instead of accessing user memory
directly from the kernel.
Running Userland Programs
Once you have implemented the page fault handler, anonymous objects, and
write()
, you should be able to run a variety of simple user-level
programs. Of course, the first you should try to get running should also
be one of the simplest, so we recommend hello
, which should print
“Hello, world!” to the screen.
For this, you can use kernel_execve
in initproc_run
and pass in /usr/bin/hello
, something like this:
char *argv[2] = {"hello", NULL};
char *envp[1] = {NULL};
kernel_execve("/usr/bin/hello", argv, envp);
To run correctly, this will require a
mostly functional page fault handler to fill in pages as the process
attempts to access them; otherwise, the operating system will probably
go into an infinite loop, trying to access the same address over and
overusing the page fault handler, but never adding the correct entry to
the page table. Some other simple programs that you should be able to
run are args
and uname
.
If you are having trouble getting hello
to run and suspect that your
anonymous object or write()
implementations might be at fault, you
should try the segfault
program instead and ensure that it exits with
a status of EFAULT
. If it doesn’t (if, for instance, you run into the
infinite loop problem described above), this means the bug is probably
in your page fault handler.
After getting hello
or segfault
running, congratulations! You’ve
just gotten your first userland program working! Celebration techniques
are myriad, but we recommend dancing around a bit, and maybe taking a
shower.
At this point, it will be useful to look at the appendix covering how to inspect the progress of a user-level process using a debugger. Although you may not need it yet, we assume that you will want it very soon.
VM-Related System Calls
After you get some initial test programs running, you can start to think
about implementing a variety of VM-related system calls. For the
functions in this section, we recommend that you check out the
documentation in the man
pages for more information.
mmap()
and munmap()
are the most simple and obvious of the functions
in this category. They allow user processes to map files into memory,
create private or shared memory regions, and remove areas of their
address space. The majority of these functions will end up being
error-checking, since you wrote the main logic for them in the virtual
memory map code. Note that the Weenix memtests expect you to use the
VMMAP_DIR_HILO
flag.
brk()
is similar in conceptual difficulty. Calling brk()
changes the
length of the memory region acting as the heap, but the pointer passed
as an argument to brk()
is not required to lie on a page boundary, and
the beginning of the heap sometimes starts halfway through the last page
of another memory region. This means that the edge cases for brk()
can
be a bit annoying, but there’s nothing conceptually difficult to grasp
here. There are some robust user-level tests for much of this
functionality, so rather than spending a lot of time getting it right
before testing, we recommend starting with something naive and gradually
fixing it to pass the tests after you can run them in userland.
fork()
Although it is also a VM-related system call, fork()
is an entirely
different animal from mmap()
and friends. A good implementation of the
previous sections is essential; fork()
is complicated enough without
having to debug the rest of your VM code at the same time. The man
pages, while useful as always, will not be as helpful for fork()
as
for the other system calls, so the core documentation for fork()
is
given in the stencil comments.
Bugs in the virtual memory portion of fork()
tend to cause bizarre
behavior: user process memory may not be what it ought to be, so almost
anything can happen. The user process may end up executing what should
be data, jumping into the middle of a random subroutine, etc. These
sorts of bugs are very difficult to track down. For this reason, you
should code more defensively than you may be used to. Assert everything
you can, panic()
at the first sign of trouble, and include apparently
unnecessary sanity checks.
Above all, be sure you really understand the algorithm before you start coding. If you try to implement it before you understand what you are trying to do, you will write buggy code. In all likelihood, you will then forget that you have written buggy code, and waste time debugging code that you should have thrown away. We know this because it has happened to us.
Follow the steps in the stencil comments very carefully. In addition, you will need to modify previously written functions with some added VM functionality:
-
When you create the child process, copy the
vmmap_t
from the parent process into the child usingvmmap_clone()
. Remember to increase the reference counts on the underlying memory objects. -
For each private mapping in the parent process, point the virtual memory areas of the child and parent processes to two new shadow objects, which in turn should point to the original underlying memory object. This is how you know that pages corresponding to this mapping are copy-on-write. Be careful with reference counts. Also note that for shared mappings, there is no need to make a shadow object.
-
Unmap the userland page table entries and flush the TLB using
pt_unmap_range()
andtlb_flush_all()
. This is necessary because the parent process might still have some entries marked as “writable”, but since we are implementing copy-on-write we would like access to these pages to cause a trap to our page fault handler so it can dirty the page, which will invoke the copy-on-write actions. -
Copy the file table of the parent into the child. Remember to use
fref()
here. -
Set the child’s working directory to point to the parent’s working directory. Once again, don’t forget reference counts.
-
Make the child thread runnable, which will add it to the run queue.
-
You should revisit your implementation of
proc_cleanup()
to make sure that your implementation is releasing all resources it should.
At this stage, your implementation of fork()
is (unfortunately) still slightly less correct.
This is because memory objects will be shared by the child and parent processes when we want the child to be performing completely independent execution.
Still, this is ok as long as we don’t
care what happens to whichever process (parent or child) wakes up last
from the syscall.
To fix the issue with memory objects, we'll have to deal with a bit more memory management: shadow objects.
Shadow Objects
Anonymous objects are easy to implement, however, you will also need a
much more sophisticated form of memory object called a shadow object to
implement fork()
. These will be used to implement copy-on-write for
privately-mapped blocks that are accessed after forking. Because of how
involved shadow objects must be, you should refer to lecture slides or
the book for more general information about how and why they are used.
The rest of this section will only cover how to implement them in
Weenix.
To implement shadow objects, it will be extremely helpful to understand how the methods of memory objects are called during a page lookup or dirty operation. If you don’t remember this well from the last assignment, we recommend that you go back and either re-read the relevant sections of the last assignment or search through the code paths in question and draw a graph showing what functions in the page frame system call what functions in the related memory objects.
The main difference between shadow objects and other types of memory
management objects is that shadow objects can be part of arbitrarily
long chains of memory objects. Therefore, many calls to shadow objects
will be rerouted to the object that is being shadowed, or occasionally
to the root object in a tree of memory objects, which cannot be a shadow
object. At a high level, this is similar to how file memory objects
forward requests to the disk memory object, but in practice, it ends up
seeming a lot more recursive when implementing shadow objects since
there is no translation layer as there was between file block numbers
and disk block numbers. However, shadow objects are still responsible
for storing some data and, more importantly, causing copy-on-write to
work after a fork()
has taken place.
One potential problem with shadow objects is that the chains must be cleaned up when the process that creates them exits to avoid temporary memory leaks. Ideally, this could happen at process exit. A process exiting might cause a shadow object’s refcount to drop to one, at which point the pages attached to the object could be reassigned to the single shadow object beneath it, and the object itself could be deallocated. However, this would require the shadow object to know what its remaining child is and, at the moment, shadow objects do not maintain a list of their children.
This apparent design flaw leaves another avenue for shadow chain
cleanup. The method you will implement is to collapse shadow chains during fork()
.
This requires a relatively easy traversal of the forking process’ object
chains, where you shift the pages from any objects with a single reference up to their child process’s memory object, and then deallocate the
now-useless intermediate objects. Consider where this should be implemented
inside your final fork logic.
Odds and Ends
Finally, there are a number of other functions which you might remember
seeing in earlier assignments spread throughout the kernel which you
need to find and either write or update. These functions are all fairly
small, but if you miss one, some things will break. Two examples are
chardev_file_mmap()
and zero_mmap()
. Once you get the last of
these finished, you should be able to test your kernel with any binary
file you find on the Weenix file system.
Testing
Testing your code at this point becomes rather difficult, since you must
be able to create data and text in userland and execute it. This is an
order of magnitude more difficult than creating kernel-mode threads as
you have in past assignments. Thankfully most of the gory details have
been taken care of for you (take a look at kernel/api/elf32.c
and
user/ld-weenix/
if you are a masochist).
Userland Tests
Once you have functioning userland execution and a working fork()
function, you are ready to complete your Weenix system by running the
userland binaries we provide for you. All you need to do is call
kernel_execve()
in your init process. You should execute the binary
/sbin/init
, which should start 3 shells (one in each terminal window).
These shells will allow you to execute any of the provided binaries
(roughly in order of difficulty):
-
/usr/bin/segfault
- Even simpler than hello, this should just segfault on address0x0
. Good if you’re having a lot of trouble getting hello to run. When run through the userland shell the shell should print "sh: child process accessed invalid memory". When run directly via kernel_execve(), no output is expected. -
/usr/bin/hello
- A simple “Hello world!” test. Getting this to execute properly should be a big step in VM.
Note: You can test /usr/bin/segfault
and /usr/bin/hello
before getting /sbin/init
to work.
For /usr/bin/segfault
: set a breakpoint on do_exit
, if segfault is working correctly, then the address that caused the pagefault should be 0x0. And the previous frame if you backtrace should be handle_pagefault
.
For /usr/bin/hello
: set a breakpoint on do_exit
, you should be able to see "Hello, World!" being printed.
-
/usr/bin/args
- Prints command arguments. -
/usr/bin/forktest
- Simple program which forks, and prints out everything of note. The waitpid called from the child process should return -1 and ERRNO=10 (ECHILD) since the child process won't have any child processes of its own. -
/bin/uname
- Prints system information. -
/bin/stat
- Prints information about a file. -
/usr/bin/kshell
- Traps into the kernel and starts a kshell. -
/bin/ls
- List the contents of a directory. -
/sbin/halt
- Kills all processes and shuts the system down. -
/usr/bin/wc
- Counts characters, words and lines. -
/bin/hd
- Dumps input in hexadecimal. -
/bin/sh
- The shell itself. Yay subshell fun! -
/usr/bin/spin
- Executes “while(1);
”. Without user-mode preemption, the process will never yield and spin forever. Note that you should still be able to type text as the line discipline code is driven by interrupts and thus is independent of the scheduler. -
/usr/bin/forkbomb
- A forkbomb test which should theoretically run forever. -
/usr/bin/stress
- A test to stress various parts of your system and then run a forkbomb. -
check
- Contains checks for various test cases (this is a shell built-in command). -
/usr/bin/vfstest
- Lots of VFS tests (error conditions, etc.). -
/usr/bin/memtest
- Lots of memory management tests (mmap and brk). -
/usr/bin/eatinodes
- Devours filesystem resources. Keeps on creating files until it fails with "could not open file %d: No space left on device.". -
/usr/bin/eatmem
- Devours kernel memory. -
/bin/ed
-ed
is the standard text editor. This is not a text editor that is easy to use, nor does it display much useful input about the file you're editing. If this doesn't work as you would expect, see this.
The shell also has a bunch of builtins. Type help
in a shell to see a
list of them. In particular, repeat
and parallel
can be very useful
for stress testing your kernel.
Note that in a default configuration, some tests in /usr/bin/s5fstest
are
expected to fail if DYNAMIC
is disabled. This is because the statically linked
binaries take up too much space on the disk.
To be able to run this test, set DISK_BLOCKS=3072
in Config.mk
.
Dynamic Linking
Once you feel everything is in good shape, enable the DYNAMIC
variable in Config.mk
and recompile the project from scratch. This will cause your userland
libraries to be dynamically linked, which puts much more stress on your
VM (especially mmap()
code). This essentially adds another layer of
indirection between the executable being run and the library calls it’s
attempting to run, where ld-weenix
can link the library calls that are
used into the original binary at runtime. Unfortunately, this also makes
it even more difficult to debug what the user process is doing; if you
are interested in setting breakpoints in the user process with dynamic
linking, check out the appendix for more information.
Turning dynamic linking on will make the above tests even more thorough.
For instance, the test using forkbomb
and eatmem
simultaneously is
notoriously hard to get right in the presence of dynamic linking.
Please note that if DYNAMIC
is off, any tests validating that Weenix runs out of disk blocks will fail!
Conclusion
We hope you’ve enjoyed working on Weenix and that you learned a lot. Good luck finishing your project, and don’t forget to read the appendix if you run into problems or need more ideas about where to look!