Debugging - brown-cs1690/handout GitHub Wiki
As you begin to develop your kernel, you will undoubtedly find problems in the code you have written. We have collected some techniques here which we found useful or which are largely undocumented elsewhere because they are fairly advanced or are specific to working on Weenix in particular.
These are the simplest debugging techniques, which can be used at any stage of the project. They require the least amount of learning, but they also are not nearly as powerful as the debugging techniques listed later on.
By far the easiest technique for debugging is printing things out during
the execution of your program. Of course, printing inside the kernel
requires significant overhead (as you will learn when you write the TTY
driver). Luckily, inside the support code we have provided printing
methods which use the serial port of the virtual machine to print to a
terminal outside of the computer. You can think of this as being like a
printer connected to the serial port if you wish. The way to use this is
through the dbg
or dbgq
macro.
#define dbg(dbgmodes, printfargs)
#define dbgq(dbgmodes, printfargs)
The difference between the two is that dbg
additionally displays the
file and line number information describing where the debug statement is
located, but dbgq
displays only the debugging message.
Debug messages are organized into many debug modes, each of which can be
separately hidden or color-coded in the debug output. You can find a
list of debug modes in kernel/include/util/debug.h
. The string names
next to each mode are used to configure which debug modes are displayed
(the default can be set via INIT_DBG_MODES
in kernel/util/debug.c
and updated
while Weenix is being debugged via the dbg command), and there is also a
special string name which refers to all debug modes. For example:
dbg(DBG_THR, "Creating a kernel thread "
"with stack at 0x%p\n", stack);
In the snippet above, DBG_THR
is the mode macro for the threading
subsystem. This debug message would print something like:
kthread.c:123 kthread_stack_create(): Creating a kernel thread
with stack at 0x804800
It is also possible to define a debugging message which is part of multiple modes by taking their bitwise or.
dbg(DBG_THR | DBG_VM, "Creating a kernel thread with stack at "
"0x%p\n", stack);
It’s not hard to add your own debugging modes, either. This is helpful if you decide to implement extra features and would like an easy way to separate their debug output from the normal subsystems without turning all of the debug modes off.
There are several info functions in the kernel which are provided
exclusively for visualizing information in the debug console. To call
one of these, use dbginfo()
, like so:
dbginfo(DBG_VMMAP, vmmap_mapping_info, curproc->p_vmmap);
You can substitute in other functions besides vmmap_mapping_info
-
there are a bunch of functions ending with “_info
” in Weenix which can
be passed here.
Another simple but widely-applicable approach is to intentionally crash your program when some condition is not met, which is known as “asserting” that condition’s truth. This allows you to know what the failure point was, which is sometimes helpful for figuring out what caused the failure in the first place, but mostly it’s helpful just for letting you know that there was a failure instead of failing silently.
We provide assert functionality with the KASSERT()
macro, which tests
the condition you pass it and prints out an error message on crash that
tells you the condition that failed and where in the codebase the
KASSERT()
was. A trick you may find helpful is that any string will be
evaluated to true in C (because it’s just a pointer) so you can do
things like this:
KASSERT(a == 1 && "A should be equal to 1");
This may be more helpful than not using a string, since the message will be printed when the assertion fails. Of course, you can also just put a comment next to the assertion in the code.
Because assertions are helpful for checking things that should always be true, we recommend placing them at the beginning of any function whose arguments must be of a certain form to ensure that the caller is checking and filtering all error cases correctly (this is especially helpful for syscalls). Don’t confuse this with error-checking that you need to do, though – if you use assertions to check for error conditions you are supposed to handle, you will needlessly cause your kernel to panic!
Although the techniques above are very useful, it will quickly become
apparent that a more robust way to debug the kernel is frequently
necessary. Luckily, debuggers provide us with an excellent way to do
this. gdb
is the definitive debugger for C code on Unix systems, and
the rest of this appendix will center on its usage. Most of the
following information is applicable to all stages of Weenix development,
except where noted.
Before you read more advanced or Weenix-specific debugging
tips, make sure you can use the following gdb
features (most important
ones listed first, more specialized ones later). Check out this helpful
list of frequently used commands here.
We have also compiled the most frequently used commands here:
-
r
: After runninggdb <executable>
, to run the program you user
. If there are arguments to the program, they would be added like so:r <args>
. -
b [function name / filename:line number]
: Sets a breakpoint to stop execution at a specified line or function name. You can then step through execution usings
andn
. To resume program execution to the next breakpoint or to program termination, you can usec
. -
step
ors
: Executes one line of code, if the line corresponds to a function call, it steps into the function call, as opposed ton
: -
next
orn
: Also executes a line of code, but will skip over any function calls. To see the execution happening, i.e. being able to see the src code, you can use -
layout src
If you don't want to stop execution at a particular point in the program because you want to track down why your program is terminating prematurely (due to seg fault, assertion failure, etc.), a helpful command to use would be bt
.
-
bt
: Prints out the stack trace at a given execution point. You can switch into a particular stack frame using the frame command. -
frame <stack id>
orf <stack id>
: Switches into a particular stack frame, the<stack id>
argument will be what is printed on the left side of the stack trace frombt
in increasing order.
Once you have stopped execution at a point in time and want to examine the state more closely, you can print variables and addresses using p <whatever you would like to print>
.
In order to run Weenix under gdb, you must use the following command: ./weenix -n -d gdb
.
-
Compiling with symbols.
-
Breakpoints, stepping, continuing.
-
Viewing and traversing the call stack.
-
Inspecting variables.
-
Inspecting the contents at a particular memory address.
-
Using watchpoints.1
-
Conditional breakpoints.2
-
Inspecting the contents of registers.
Once you understand these basics, you should be able to debug pretty much any simple user-level program that you have the code for. However, there are a few more specific topics that are helpful in some cases.
Sometimes, you find yourself repeatedly executing several commands to get into a particular state where you can debug something. If you find yourself typing the same thing over and over, you can temporarily add these commands to the end of the init.gdb
file in the project root directory, and they will be run automatically every time you launch Weenix under gdb!
Although this does not relate directly to Weenix, you will frequently be
debugging multithreaded programs. Debugging in the presence of multiple
threads is usually the same as debugging single-threaded programs, but
the place where it differs is in cases like deadlock. If you are
debugging a program in gdb and it deadlocks, you can hit Ctrl-C
to
pause the program. Then, you can inspect the stack trace using bt
as usual. If you need to check the other threads (perhaps the first one you were given wasn’t the one in a deadlock, or maybe you want to see what other
threads are contributing to the deadlock) you can see a list of the currently running threads
using
-
info threads
: on the left side, gdb will assign each thread an ID that you can use with -
thread [thread ID]
: where [thread ID] is the ID specified by gdb. This allows you to switch into the thread's context. And you can print out the stack trace of that particular thread. If you would like to print out a stack trace for all the threads you can use the command -
thread apply all bt
: which produces a backtrace for all the threads in the program.
Another note about multiple threads (particularly if you might be
canceling them) is that you must be careful what calls you make during
critical sections of code. For instance, if you want to ensure that a
call to cancel some thread doesn’t take effect until after a certain
section of code is complete, there is a set of standard library calls
which make system calls which might act as a cancellation point. Look in
the man
page for pthread_cancel()
for a list of these library calls.
Note that printf()
is one of them, so you might want to use gdb
to
debug instead of printing out messages to tell you when certain events
happen.
During VFS, you'll likely come across ref counting errors, where either objects have too many references (and can never be cleaned up) or too few references (and are cleaned up too early, and their memory refers to junk). It's helpful to be able to assert pre and post-conditions about ref counts in certain functions that have tricky ref counting logic.
To this end, we've provided two macros, REF_CONTRACT_INIT
and REF_CONTRACT_FINI(delta)
.
The former records (say, to some local variable x
) the total number of ref counts the
current thread has made, and the latter checks the the current number of ref counts
made by the thread is x + delta
. You can use these at the start and end of functions
to assert that ref counts are changing in the way that you expect them to.
If you need to know what exact instructions are being run or where
exactly in your code you are executing, you should probably disassemble
your program. The easiest way to do this inside gdb
is to use the
disas
command, although you can also use a command like x/100i $eip
to print the next 100 instructions starting at the address pointed to by
your instruction pointer. You can also use the objdump
tool (separate
from gdb
) to disassemble the entire contents of an executable.
Running the command weenix -d qemu
will launch the QEMU monitor. From
here, you have direct access to more system level information than you
would in gdb. Notably, the xp
command will let you examine physical
addresses, much like how x will still allow you to examine virtual
addresses. This can be useful in VM debugging to ensure the contents of
a physical address are what they should be. Running the help
command
produces a full list of commands.
Finally, there are some times when the approaches above are just not enough. These tips are for Weenix in particular, although they can easily be adapted for use in debugging many other kernels as well. These techniques will only be relevant when working on the virtual memory system of Weenix, since they are mainly for debugging problems in userland processes.
The most obvious thing to do when debugging a page fault is to look at the address the fault is happening on. However, this doesn’t tell you the context under which the page fault was generated, such as the stack or the section of code that was running. To find these, the easiest way is to inspect the stored context of the process that generated the pagefault.
To do this, use the following in gdb:
break dump_registers
- when you start GDB
continue
- run until you hit a crash
next
- sometimes you may need to step intodump_registers
restore_regs regs
- restore the registers from the saved context on the stack
bt
- to see the backtrace of what led to the pagefault
This will allow you to see the address of the instruction
which was running (and perhaps more importantly, the name of its
containing function), the stack pointer, and any other registers which
might be needed to figure out what the processor was doing, as if you froze
program execution right before the page fault. Running backtrace
in gdb will also
provide the full function call stack, as you would expect. This trick
is especially useful when you also load in the symbol file for the
user-level process, as the next section will show.
It is a little bit like black magic, but you can actually debug user
processes from inside the kernel debugger if you load in the symbol file
for the user process into the correct location in memory.
A gdb script is provided to do this for you.
Running new-userland hello
in gdb will load the symbols for
/usr/bin/hello
and set a breakpoint at the main
function.
To do this manually, you can use objdump
(in your ordinary shell) to get
some information about the user process.
$ objdump --headers --section=".text" user/simple/hello
user/simple/hello: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
6 .text 00000064 08048208 08048208 00000208 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
The relevant information that this gives us is the starting address of
the text section, 0x08048208
(the VMA
column). So, we can open up
Weenix with gdb
attached, and add in the relevant symbol file for our
userland process.
Breakpoint 3, bootstrap (arg1=0, arg2=0x0) at main/kmain.c:121
121 {
(gdb) add-symbol-file user/simple/hello 0x08048208
add symbol table from file "user/simple/hello" at
.text_addr = 0x8048208
(y or n) y
Reading symbols from user/simple/hello...done.
(gdb) b main
Breakpoint 4 at 0x8048208: file ./hello.c, line 12.
(gdb) c
Continuing.
...
Breakpoint 4, main (argc=1, argv=0x8047eec) at ./hello.c:12
12 {
(gdb) list
7
8 #include <unistd.h>
9 #include <fcntl.h>
10
11 int main(int argc, char **argv)
12 {
13 open("/dev/tty0", O_RDONLY, 0);
14 open("/dev/tty0", O_WRONLY, 0);
15
16 write(1, "Hello, world!\n", 14);
Now, you should be able to set breakpoints in the userland process.
It is worth noting, however, that this does not prevent you from also
putting breakpoints in kernel code. gdb
is intentionally dumb about
how breakpoints work - whenever your instruction pointer reaches the
specified address, gdb
will pause, no matter what symbol files you’ve
added - so since the text of your kernel and the user process are both
loaded, you can place breakpoints in either one.
With dynamic linking enabled, it becomes a step more difficult to debug
userland processes from the kernel debugger, but certainly not
impossible. The main issue is that there is a bunch more code in the
ld-weenix
shared library that you won’t be able to debug unless you
tell gdb
to load the symbol file. The best way we know of to do this
is to print out the memory map of the process you’re trying to debug and
make an educated guess about which region might correspond to
ld-weenix
. (It should be a shared region with execute permissions.)
Then, load the debugging symbols starting at the address which
corresponds to the beginning of that region.
1: Watchpoints do not work with all the simulators which Weenix will run on.
2 Conditional breakpoints do not work with all simulators which Weenix will run on. To get similar functionality, add an if
statement to your code with the condition you care about and set a breakpoint inside there.