LAB Assignment 1 - seporaitis/xv6-public GitHub Wiki
First two exercises are to get acquainted with assembler and what BIOS is doing at boot time.
At what point does the processor start executing 32-bit code? What exactly causes the switch from 16- to 32-bit mode?
After enabling the protected mode in CR0 (line 49):
movl %cr0, %eax
orl $CR0_PE_ON, %eax # CR0_PE_ON = 0x1
movl %eax, %cr0
a long jump is executed to identity CS address, that serializes CPU for 32-bit mode (line 55)
ljmp $PROT_MODE_CSEG, $protcseg # PROT_MODE_CSEG = 0x8
First argument of LJMP instruction is set to CS register and second -
EIP register, therefore CS is 0x00000008
and EIP points to
0x00007c32
.
What is the last instruction of the bootloader executed?
(line 69)
call bootmain
What is the first instruction of the kernel it just loaded?
The C function preamble (line 69):
push %ebp
mov %esp, %ebp
push %esi
push %ebx
Where is the first instruction of the kernel?
CS: 0x00000008
EIP: 0x00007d15
How does the boot loader decide how many sectors it must read in order to fetch the entire kernel from disk? Where does it find this information?
Boot loader first reads a single page (512 * 8 = 4096 = 4K Bytes) off
of disk and interprets it as Elf
structure, tries to verify the ELF
header, and, if valid - reads full kernel process image:
#define ELFHDR ((struct Elf *) 0x10000)
/* ... */
struct Proghdr *ph, *eph;
// read 1st page off disk
readseg((uint32_t)ELFHDR, 512 * 8, 0);
// validate ELF hader
if (ELFHDR->e_magic != ELF_MAGIC)
goto bad
// point 'ph' to the first program header address.
ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
// point 'eph' to the address of the last program header address.
eph = ph + ELFHDR->e_phnum;
// increment 'ph' until last program header address is found.
for (; ph < eph; ph++)
// 'p_pa' is the physical load address of this segment.
readseg(ph->p_pa, ph->p_memsz, ph->p_offset);
/* ... */
In terms of sectors, the boot loader rounds the number of bytes to sector boundaries and loads them.
Read "The C Programming Language" to familiarize with pointers and addresses. Summary can be found here.
Trace through the first few instructions of the boot loader again and identify the first instruction that would "break" or otherwise do the wrong thing if you were to get the boot loader's link address wrong. Then change the link address in
boot/Makefrag
to something wrong, runmake clean
, recompile the lab withmake
, and trace into the boot loader again to see what happens. Don't forget to change the link address back andmake clean
again afterward!
Changed the start of .text
segment from:
$(V)$(LD) $(LDFLAGS) -N -e start -Ttext 0x7C00 -o [email protected] $^
to 0x7c01
$(V)$(LD) $(LDFLAGS) -N -e start -Ttext 0x7C01 -o [email protected] $^
And observed that BIOS could not start boot loader.
Examine the 8 words of memory at 0x00100000 at the point the BIOS enters the boot loader, and then again at the point the boot loader enters the kernel. Why are they different? What is there at the second breakpoint? (You do not really need to use QEMU to answer this question - just think)
0x00100000
is the extended memory, found after 0x000fffff
. At the
beginning it is all zeroes, because boot-loader operates in 16-bit
real-mode, which does not have access to memory beyond 1MBytes.
After switching to 32-bit protected-mode boot loader loads the kernel into this extended memory:
0x000000: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
0x100010: 0x34000004 0x0000b812 0x220f0011 0xc0200fd8
This can be verified by cross-referencing with obj/kern/kernel.asm
:
entry:
movw $0x1234,0x472 # warm boot
f0100000: 02 b0 ad 1b 00 00 add 0x1bad(%eax),%dh
f0100006: 00 00 add %al,(%eax)
f0100008: fe 4f 52 decb 0x52(%edi)
f010000b: e4 .byte 0xe4
f010000c <entry>:
f010000c: 66 c7 05 72 04 00 00 movw $0x1234,0x472
f0100013: 34 12
# sufficient until we set up our real page table in mem_init
# in lab 2.
# Load the physical address of entry_pgdir into cr3. entry_pgdir
# is defined in entrypgdir.c.
movl $(RELOC(entry_pgdir)), %eax
f0100015: b8 00 00 11 00 mov $0x110000,%eax
movl %eax, %cr3
f010001a: 0f 22 d8 mov %eax,%cr3
As visible the 0x1badb002
from debugger output matches the first
instructions of the kernel disassembly 02 b0 ad 1b
.
Use QEMU and GDB to trace into the JOS kernel and stop at the
movl %eax, %cr0
. Examine memory at0x00100000
and at0xf0100000
. Now, single step over that instruction using thestepi
GDB command. Again, examine memory at0x00100000
and at0xf0100000
. Make sure you understand what just happened.
Before movl %eax, %cr0
:
(gdb) x/4x 0x00100000
0x100000: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
(gdb) x/4x 0xf0100000
0xf0100000 <_start+4026531828>: 0x00000000 0x00000000 0x00000000 0x00000000
Before the instruction 0x00100000
and 0xf0100000
both are physical
addresses and the kernel is loaded at 0x00100000
, whereas there's
nothing at 0xf0100000
.
After movl %eax, %cr0
:
(gdb) x/4x 0x00100000
0x100000: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
(gdb) x/4x 0xf0100000
0xf0100000 <_start+4026531828>: 0x1badb002 0x00000000 0xe4524ffe 0x7205c766
Paging table is set up to map virtual addresses [KERNBASE; KERNBASE + 4MB)
to physical address [0; 4MB)
. That results in 0xf0100000
being mapped to 0x00100000
.
What is the first instruction after the new mapping is established that would fail to work properly if the mapping weren't in place? Comment out the
movl %eax, %cr0
inkern/entry.S
, trace into it, and see if you were right.
The first instruction that would not work after new mapping is the
mov $relocated, %eax
, because $relocated
points to address that
has 0's in it - no kernel mapped there.
mov $relocated, %eax
f0100028: b8 2f 00 10 f0 mov $0xf010002f,%eax
jmp *%eax
f010002d: ff e0 jmp *%eax
f010002f <relocated>:
relocated:
Commented the movl %eax, %cr0
instruction and QEMU failed:
qemu: fatal: Trying to execute code outside RAM or ROM at 0xf010002c
0xf010002c
is the address of relocated
in updated code:
mov $relocated, %eax
f0100025: b8 2c 00 10 f0 mov $0xf010002c,%eax
jmp *%eax
f010002a: ff e0 jmp *%eax
f010002c <relocated>:
relocated:
Read through
kern/printf.c
,lib/printfmt.c
, andkern/console.c
, and make sure you understand their relationship. It will become clear in later labs whyprintfmt.c
is located in the separatelib
directory.
We have omitted a small fragment of code - the code necessary to print octal numbers using patterns of the form "%o". Find and fill in this code fragment
// (unsigned) octal
case 'o':
putch('0', putdat);
num = getuint(&ap, lflag);
base = 8;
goto number;
1. Explain the interface between
printf.c
andconsole.c
. Specifically, what function doesconsole.c
export? How is this function used byprintf.c
.
printf.c
provides high level console output functions, whereas
console.c
- close to hardware, low level, building blocks.
console.c
exports cputchar
.
Since cprintf
interface requires function to return number of
characters printed - cputchar
is wrapped in putch
, which
increments counter every time it calls cputchar
.
2. Explain the following from
console.c
.
if (crt_pos >= CRT_SIZE) {
int i;
memmove(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS) * sizeof(uint16_t));
for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++)
crt_buf[i] = 0x0700 | ' ';
crt_pos -= CRT_COLS;
}
In short: if putting next char would result in going out of screen, move all the output buffer one line upward, clear the new bottom line and reset the position to the beginning of the line.
2. Trace the execution of the following code step by step:
int x = 1, y = 3, z = 4;
csprintf("x %d, y %x, z %d\n", x, y, z);
Disassembly:
void test_function(void) {
/* The C calling convention mandates that a subprogram first save the
value of EBP on the stack and then set EBP to be equal to ESP. This allows
ESP to change as data is pushed or popped off the stack without modifying
EBP. At the end of the subprogram, the original value of EBP must be
restored (this is why it is saved at the start of the subprogram.) */
/* save tha value of %ebp */
f0100778: 55 push %ebp
/* %ebp now points to the top of the stack */
f0100779: 89 e5 mov %esp,%ebp
/* allocate space on the stack for local variables (8 bytes???) */
f010077b: 83 ec 08 sub $0x8,%esp
int x = 1, y = 3, z = 4;
cprintf("x %d, y %x, z %d\n", x, y, z);
/* push value of z=4 to stack */
f010077e: 6a 04 push $0x4
/* push value of y=3 to stack */
f0100780: 6a 03 push $0x3
/* push value of x=1 to stack */
f0100782: 6a 01 push $0x1
/* push address of the format string to stack */
f0100784: 68 6e 1b 10 f0 push $0xf0101b6e
/* call cprintf */
f0100789: e8 8a 01 00 00 call f0100918 <cprintf>
}
/* clear the stack by directly manipulating stack pointer
more here: https://en.wikibooks.org/wiki/X86_Disassembly/Functions_and_Stack_Frames
clear the stack. hex(0x10) = dec(16), integer size is 4 bytes, so it
clears 4 variables: x, y, z and pointer to format string.
*/
f010078e: 83 c4 10 add $0x10,%esp
f0100791: c9 leave
f0100792: c3 ret
/* After the subprogram is over, the parameters that were pushed on the
stack must be removed. The C calling convention specifies that the caller
code must do this. */
z, y, x are pushed to stack followed by address where the format
string "x %d, y %x, z %d\n"
is stored, finally calling into address
f0100918
, where cprintf
is located.
In the call to
cprintf
, to what doesfmt
point to? To what doesap
point?
void test_function(void) {
f0100778: 55 push %ebp
f0100779: 89 e5 mov %esp,%ebp
f010077b: 83 ec 08 sub $0x8,%esp
int x = 1, y = 3, z = 4;
cprintf("x %d, y %x, z %d\n", x, y, z);
f010077e: 6a 04 push $0x4
f0100780: 6a 03 push $0x3
f0100782: 6a 01 push $0x1
f0100784: 68 6e 1b 10 f0 push $0xf0101b6e
f0100789: e8 8a 01 00 00 call f0100918 <cprintf>
}
f010078e: 83 c4 10 add $0x10,%esp
f0100791: c9 leave
f0100792: c3 ret
fmt
points to 0xf0101b6e which is address where the format string
starts.
ap
points to 0xf010ff54 which is where the function stack starts and
hold the rest of the arguments.
List (in order of execution) each call to
cons_putc
,va_arg
, andvcprintf
.
- For
cons_putc
, list its argument as well.- For
va_arg
, list whatap
points to before and after the call.- For
vcprintf
list the values of its two arguments.
vcprintf(fmt=0xf0101b6e, ap=0xf010ff54)
>>> x/s 0xf0101b6e: "x %d, y %x, z %d\n"
>>> x/3dw 0xf010ff54: 1 3 4
cons_putc(c=120)
>>> p/c c: 120 'x'
cons_putc(c=32)
>>> p/c c: 32 ' '
getint(ap=0xf010ff54, 0)
>>> p *0xf010ff54: 1
va_arg(*ap=1, int)
>>> p ap: 0xf010ff58
>>> p *0xf010ff58: 3
cons_putc(c=49)
>>> p/c c: 49 '1'
cons_putc(c=44)
>>> p/c c: 44 ','
cons_putc(c=32)
>>> p/c c: 32 ' '
cons_putc(c=121)
>>> p/c c: 121 'y'
cons_putc(c=32)
>>> p/c c: 32 ' '
getuint(ap=0xf010ff58, 0)
>>> p *0xf010ff58: 3
va_arg(*ap=3, unsigned int)
>>> p ap: 0xf010ff5c
>>> p *0xf010ff5c: 4
cons_putc(c=51)
>>> p/c c: 41 '3'
cons_putc(c=44)
>>> p/c c: 44 ','
cons_putc(c=32)
>>> p/c c: 32 ' '
cons_putc(c=122)
>>> p/c c: 122 'z'
cons_putc(c=32)
>>> p/c c: 32 ' '
getint(ap=0xf010ff5c, 0)
>>> p *0xf010ff5c: 4
va_arg(*ap=4, int)
>>> p ap: 0xf010ff60
>>> p *0xf010ff60: 4027645792
cons_putc(c=52)
>>> p/c c: 52 '4'
cons_putc(c=10)
>>> p/c c: 10 '\n'
cons_putc
prints a processed byte. If fmt
had %d
, the actual number is retrieved in getint
and that number is passed to cons_putc
(caveat, through printnum
).
va_arg
retrieves the current value from variable argument list and points ap
to the next one.
vcprintf
takes in the pointer to format string and variable argument list.
4. Run the following code.
unsigned int i = 0x00646c72;
cprintf("H%x Wo%s", 57616, &i);
What is the output? Explain how this output is arrived at in the step-by-step manner of the previous exercise.
The output is He110 World
.
e110
is hex representation of decimal 57616
.
0x00646c72
is hex representation of \0dlr
The output depends on that fact that the x86 is little-endian. If the x86 were instead big-endian what would you set
i
to in order to yield the same output? Would you need to change57616
to a different value?
On a big-endian machine value of i
would be 0x726c6400
, but the
57616
would not change, because it's hex representation is the same.
5. In the following code, what is going to be printed after 'y='? (note: the answer is not a specific value.) Why does this happen?
cprintf("x=%d y=%d", 3);
Output is x=3, y=-1
. At first ap
points to 3, then for the next
value it points to (&ap) + sizeof(int)
, which can be undefined, but
in this case is value -1
.
6. Let's say that GCC changed its calling convention so that it pushed arguments on the stack in declaration order, so that the last argument is pushed last. How would you have to change
cprintf
or its interface so that it would still be possible to pass it a variable number of arguments?
Currently stack looks like this:
bottom of the stack -> n'th parameter
n-1'th parameter
...
3'rd parameter (-)
2'nd parameter (3)
top of the stack -> 1'st parameter (fmt)
stack pointer -> return address
If the GCC changed its calling convention it would look like this:
bottom of the stack -> n'th parameter
n-1'th parameter
...
2'nd parameter (fmt)
top of the stack -> 1'st parameter (3)
stack pointer -> return address
So to get a similar effect, the interface should be changed to always
expect format string as the last argument, then the argument list
would start at format string address + sizeof(int)
(or whatever the
size of address) and would walk upward toward the top of the stack.
NOTE: this is very flaky answer; not sure how accurate it is, so might need to revisit. Tips are welcome.
Determine where the kernel initializes its stack, and exactly where in memory its stack is located. How does the kernel reserve space for its stack? And at which "end" of this reserved area is the stack pointer initialized to point to?
Virtual memory map documentation in (JOS) inc/memlayout.h
:
/*
* Virtual memory map: Permissions
* kernel/user
*
* 4 Gig --------> +------------------------------+
* | | RW/--
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* : . :
* : . :
* : . :
* |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
* | | RW/--
* | Remapped Physical Memory | RW/--
* | | RW/--
* KERNBASE, ----> +------------------------------+ 0xf0000000 --+
* KSTACKTOP | CPU0's Kernel Stack | RW/-- KSTKSIZE |
* | - - - - - - - - - - - - - - -| |
* | Invalid Memory (*) | --/-- KSTKGAP |
* +------------------------------+ |
* | CPU1's Kernel Stack | RW/-- KSTKSIZE |
* | - - - - - - - - - - - - - - -| PTSIZE
* | Invalid Memory (*) | --/-- KSTKGAP |
* +------------------------------+ |
* : . : |
* : . : |
* MMIOLIM ------> +------------------------------+ 0xefc00000 --+
* | Memory-mapped I/O | RW/-- PTSIZE
* ULIM, MMIOBASE --> +------------------------------+ 0xef800000
* | Cur. Page Table (User R-) | R-/R- PTSIZE
* UVPT ----> +------------------------------+ 0xef400000
* | RO PAGES | R-/R- PTSIZE
* UPAGES ----> +------------------------------+ 0xef000000
* | RO ENVS | R-/R- PTSIZE
* UTOP,UENVS ------> +------------------------------+ 0xeec00000
* UXSTACKTOP -/ | User Exception Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebff000
* | Empty Memory (*) | --/-- PGSIZE
* USTACKTOP ---> +------------------------------+ 0xeebfe000
* | Normal User Stack | RW/RW PGSIZE
* +------------------------------+ 0xeebfd000
* | |
* | |
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* . .
* . .
* . .
* |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
* | Program Data & Heap |
* UTEXT --------> +------------------------------+ 0x00800000
* PFTEMP -------> | Empty Memory (*) | PTSIZE
* | |
* UTEMP --------> +------------------------------+ 0x00400000 --+
* | Empty Memory (*) | |
* | - - - - - - - - - - - - - - -| |
* | User STAB Data (optional) | PTSIZE
* USTABDATA ----> +------------------------------+ 0x00200000 |
* | Empty Memory (*) | |
* 0 ------------> +------------------------------+ --+
*
* (*) Note: The kernel ensures that "Invalid Memory" is *never* mapped.
* "Empty Memory" is normally unmapped, but user programs may map pages
* there if desired. JOS user programs map pages temporarily at UTEMP.
*/
/// ...
// All physical memory mapped at this address
#define KERNBASE 0xF0000000
/// ...
// Kernel stack.
#define KSTACKTOP KERNBASE
#define KSTKSIZE (8*PGSIZE) // size of a kernel stack
#define KSTKGAP (8*PGSIZE) // size of a kernel stack guard
/// ...
From above, kernel stack starts at 0xF0000000
.
In (JOS) kern/entry.S
:
.data
###################################################################
# boot stack
###################################################################
.p2align PGSHIFT # force page alignment
.globl bootstack
bootstack:
.space KSTKSIZE
.globl bootstacktop
bootstacktop:
GNU Assembler Manual documentation for .p2align directive:
Pad the location counter (in the current subsection) to a particular storage boundary. The first expression (which must be absolute) is the number of low-order zero bits the location counter must have after advancement. For example
.p2align 3
advances the location counter until it a multiple of 8. If the location counter is already a multiple of 8, no change is needed.
GNU Assembler Manual documentation for .space directive:
This directive emits size bytes, each of value fill. Both size and fill are absolute expressions. If the comma and fill are omitted, fill is assumed to be zero. This is the same as
.skip
.
GNU Assembler Manual documentation for .globl directive:
.global
makes the symbol visible told
. If you define symbol in your partial program, its value is made available to other partial programs that are linked with it. Otherwise, symbol takes its attributes from a symbol of the same name from another file linked into the same program.
As seen in (JOS) inc/mmu.h
:
#define PGSIZE 4096 // bytes mapped by a page
#define PGSHIFT 12 // log2(PGSIZE)
So as intended .p2align
pads the space until the page boundary, then
.space
directive allocates KTSTKSIZE
bytes (8*4096
) for the
stack, then it's exported as bootstacktop
symbol.
Then in (JOS) kern/entry.S
, just before calling into C function
i386_init
, the kernel stack pointer is initialized:
relocated:
# Clear the frame pointer register (EBP)
# so that once we get into debugging C code,
# stack backtraces will be terminated properly.
movl $0x0,%ebp # nuke frame pointer
# Set the stack pointer
movl $(bootstacktop),%esp
# now to C code
call i386_init
From above, esp
is initialized to point to bootstacktop
and grows
backwards towards page boundary bootstack
.
To become familiar with the C calling conventions on the x86, find the address of the
test_backtrace
function inobj/kern/kernel.asm
, set a breakpoint there, and examine what happens each time it gets called after the kernel starts.
test_backtrace
address from obj/kern/kernel.asm
:
f0100040 <test_backtrace>:
#include <kern/console.h>
// Test the stack backtrace function (lab 1 only)
void
test_backtrace(int x)
{
f0100040: 55 push %ebp
f0100041: 89 e5 mov %esp,%ebp
f0100043: 53 push %ebx
f0100044: 83 ec 14 sub $0x14,%esp
f0100047: 8b 5d 08 mov 0x8(%ebp),%ebx
cprintf("entering test_backtrace %d\n", x);
f010004a: 89 5c 24 04 mov %ebx,0x4(%esp)
f010004e: c7 04 24 20 19 10 f0 movl $0xf0101920,(%esp)
f0100055: e8 d7 08 00 00 call f0100931 <cprintf>
// ...
Some debug info for "easier following" of stack, to be able to answer the question:
- inside
i386_init
, beforetest_backtrace(5)
:
esp = 0xf010ffe0
eip = 0xf01000e5 (0xf01000ea)
ebp = 0xf010fff8
ebx = 0x00010094
>>> x/x 0xf010ffe0
0xf010ffd0: 0x00010094
0xf010ffd4: 0x00000000
0xf010ffd8: 0xf010fff8
0xf010ffdc: 0xf01000de
0xf010ffe0: 0x00000005 < esp
0xf010ffe4: 0x00001aac
0xf010ffe8: 0x00000644
0xf010ffec: 0x00000000
0xf010fff0: 0x00000000
0xf010fff4: 0x00000000
0xf010fff8: 0x00000000 < ebp
0xf010fffc: 0xf010003e
- inside
test_backtrace(5)
:
esp = 0xf010ffc0
eip = 0xf010004a (0xf010005a)
ebp = 0xf010ffd8
ebx = 0x00000005
>>> x/x 0xf010ffc0
0xf010ffac: 0x00000000
0xf010ffb0: 0x00000000
0xf010ffb4: 0x00000000
0xf010ffb8: 0xf010ffd8
0xf010ffbc: 0xf0100949
0xf010ffc0: 0xf0101957 < esp
0xf010ffc4: 0xf010ffe4 < edi
0xf010ffc8: 0x00000000 < local var 4 (edi?)
0xf010ffcc: 0x00010094 < local var 3 (esi?)
0xf010ffd0: 0x00010094 < local var 2 (esi?)
0xf010ffd4: 0x00010094 < local var 1 (esi?)
0xf010ffd8: 0xf010fff8 < ebp
0xf010ffdc: 0xf01000ea < return address
0xf010ffe0: 0x00000005 < x=5
0xf010ffe4: 0x00001aac
0xf010ffe8: 0x00000644
- inside
test_backtrace(5)
, beforetest_backtrace(4)
:
esp = 0xf010ffc0
eip = 0xf0100064 (0xf0100069)
ebp = 0xf010ffd8
ebx = 0x00000005
eax = 0x00000004
edi = 0x00000000
esi = 0x00010094
>>> x/x 0xf010ffc0
0xf010ffc0: 0x00000004 < esp
0xf010ffc4: 0x00000005 < x=5
0xf010ffc8: 0x00000000
0xf010ffcc: 0x00010094
0xf010ffd0: 0x00010094
0xf010ffd4: 0x00010094
0xf010ffd8: 0xf010fff8 < ebp
0xf010ffdc: 0xf01000ea < return address
0xf010ffe0: 0x00000005 < x=5
- inside
test_backtrace(4)
:
esp = 0xf010ffa0
eip = 0xf010004a (0xf010005a)
ebp = 0xf010ffb8
ebx = 0x00000004
eax = 0x00000004
edi = 0x00000000
esi = 0x00010094
>>> x/x 0xf010ffa0
0xf010ff90: 0xf01008eb
0xf010ff94: 0xf010ffac
0xf010ff98: 0xf010ffb8
0xf010ff9c: 0xf0100949
0xf010ffa0: 0xf0101920 < esp
0xf010ffa4: 0xf010ffc4
0xf010ffa8: 0x00000000
0xf010ffac: 0x00000000
0xf010ffb0: 0x00000000
0xf010ffb4: 0x00000005
0xf010ffb8: 0xf010ffd8 < ebp
0xf010ffbc: 0xf0100069 < return address
0xf010ffc0: 0x00000004 < x=4
0xf010ffc4: 0x00000005
0xf010ffc8: 0x00000000
- inside
test_backtrace(4)
, beforetest_backtrace(3)
:
esp = 0xf010ffa0
eip = 0xf0100064 (0xf0100069)
ebp = 0xf010ffb8
ebx = 0x00000004
eax = 0x00000003
edi = 0x00000000
esi = 0x00010094
>>> x/x 0xf010ffa0
0xf010ffa0: 0x00000003 < esp
- inside
test_backtrace(3)
:
esp = 0xf010ff80
eip = 0xf010004a
ebp = 0xf010ff98
ebx = 0x00000003
eax = 0x00000003
edi = 0x00000000
esi = 0x00010094
>>> x/x 0xf010ff80
0xf010ff70: 0xf01008eb
0xf010ff74: 0xf010ff8c
0xf010ff78: 0xf010ff98
0xf010ff7c: 0xf0100949
0xf010ff80: 0xf0101920 < esp
0xf010ff84: 0xf010ffa4
0xf010ff88: 0xf010ffb8
0xf010ff8c: 0x00000000
0xf010ff90: 0xf01008eb
0xf010ff94: 0x00000004
0xf010ff98: 0xf010ffb8 < ebp
0xf010ff9c: 0xf0100069 < return address
0xf010ffa0: 0x00000003 < x=3
0xf010ffa4: 0x00000004
0xf010ffa8: 0x00000000
- inside
test_backtrace(3)
, beforetest_backtrace(2)
:
esp = 0xf010ff80
eip = 0xf0100064 (0xf0100069)
ebp = 0xf010ff98
ebx = 0x00000003
eax = 0x00000002
edi = 0x00000000
esi = 0x00010094
>>> x/x 0xf010ff80
0xf010ff74: 0xf010ff8c
0xf010ff78: 0xf010ff98
0xf010ff7c: 0xf010005a < (previous return address for cprintf call)
0xf010ff80: 0x00000002 < esp
0xf010ff84: 0x00000003
0xf010ff88: 0xf010ffb8
0xf010ff8c: 0x00000000
0xf010ff90: 0xf01008eb
0xf010ff94: 0x00000004
0xf010ff98: 0xf010ffb8 < ebp
0xf010ff9c: 0xf0100069 < return address
0xf010ffa0: 0x00000003 < x=3
0xf010ffa4: 0x00000004
0xf010ffa8: 0x00000000
- inside
test_backtrace(2)
:
esp = 0xf010ff60
eip = 0xf010004a (0xf010005a)
ebp = 0xf010ff78
ebx = 0x00000002
eax = 0x00000002
edi = 0x00000000
esi = 0x00010094
>>> x/x 0xf010ff60
0xf010ff54: 0xf010ff6c
0xf010ff58: 0xf010ff78
0xf010ff5c: 0xf0100949
0xf010ff60: 0xf0101920 < esp
0xf010ff64: 0xf010ff84
0xf010ff68: 0xf010ff98
0xf010ff6c: 0x00000000
0xf010ff70: 0xf01008eb
0xf010ff74: 0x00000003
0xf010ff78: 0xf010ff98 < ebp
0xf010ff7c: 0xf0100069 < return address
0xf010ff80: 0x00000002 < x=2
0xf010ff84: 0x00000003
0xf010ff88: 0xf010ffb8
- inside
test_backtrace(2)
, beforetest_backtrace(1)
:
esp = 0xf010ff60
eip = 0xf0100064 (0xf0100069)
ebp = 0xf010ff78
ebx = 0x00000002
eax = 0x00000001
edi = 0x00000000
esi = 0x00010094
>>> x/x 0xf010ff60
0xf010ff54: 0xf010ff6c
0xf010ff58: 0xf010ff78
0xf010ff5c: 0xf010005a
0xf010ff60: 0x00000001 < esp
0xf010ff64: 0x00000002
0xf010ff68: 0xf010ff98
0xf010ff6c: 0x00000000
0xf010ff70: 0xf01008eb
0xf010ff74: 0x00000003
0xf010ff78: 0xf010ff98 < ebp
0xf010ff7c: 0xf0100069 < return address
0xf010ff80: 0x00000002 < x=2
0xf010ff84: 0x00000003
0xf010ff88: 0xf010ffb8
- inside
test_backtrace(1)
:
esp = 0xf010ff40
eip = 0xf010004a (0xf010005a)
ebp = 0xf010ff58
ebx = 0x00000001
eax = 0x00000001
edi = 0x00000000
esi = 0x00010094
>>> x/x 0xf010ff40
0xf010ff34: 0xf010ff4c
0xf010ff38: 0xf010ff58
0xf010ff3c: 0xf0100949
0xf010ff40: 0xf0101920 < esp
0xf010ff44: 0xf010ff64
0xf010ff48: 0xf010ff78
0xf010ff4c: 0x00000000
0xf010ff50: 0xf01008eb
0xf010ff54: 0x00000002
0xf010ff58: 0xf010ff78 < ebp
0xf010ff5c: 0xf0100069 < return address
0xf010ff60: 0x00000001 < x=1
0xf010ff64: 0x00000002
0xf010ff68: 0xf010ff98
- inside
test_backtrace(1)
, beforetest_backtrace(0)
:
esp = 0xf010ff40
eip = 0xf0100064 (0xf0100069)
ebp = 0xf010ff58
ebx = 0x00000001
eax = 0x00000000
edi = 0x00000000
esi = 0x00010094
>>> x/x 0xf010ff40
0xf010ff34: 0xf010ff4c
0xf010ff38: 0xf010ff58
0xf010ff3c: 0xf010005a
0xf010ff40: 0x00000000 < esp
0xf010ff44: 0x00000001
0xf010ff48: 0xf010ff78
0xf010ff4c: 0x00000000
0xf010ff50: 0xf01008eb
0xf010ff54: 0x00000002
0xf010ff58: 0xf010ff78 < ebp
0xf010ff5c: 0xf0100069 < return address
0xf010ff60: 0x00000001 < x=1
0xf010ff64: 0x00000002
0xf010ff68: 0xf010ff98
- inside
test_backtrace(0)
:
esp = 0xf010ff20
eip = 0xf010004a (0xf010005a)
ebp = 0xf010ff38
ebx = 0x00000000
eax = 0x00000000
edi = 0x00000000
esi = 0x00010094
>>> x/x 0xf010ff20
0xf010ff14: 0xf010ff2c
0xf010ff18: 0xf010ff38
0xf010ff1c: 0xf0100949
0xf010ff20: 0xf0101920 < esp
0xf010ff24: 0xf010ff44
0xf010ff28: 0xf010ff58
0xf010ff2c: 0x00000000
0xf010ff30: 0xf01008eb
0xf010ff34: 0x00000001
0xf010ff38: 0xf010ff58 < ebp
0xf010ff3c: 0xf0100069 < return address
0xf010ff40: 0x00000000 < x=0
0xf010ff44: 0x00000001
0xf010ff48: 0xf010ff78
- inside
test_backtrace(0)
, beforemon_backtrace(0, 0, 0)
:
esp = 0xf010ff20
eip = 0xf0100082 (0xf0100087)
ebp = 0xf010ff38
ebx = 0x00000000
eax = 0x00000000
edi = 0x00000000
esi = 0x00010094
>>> x/x 0xf010ff20
0xf010ff20: 0x00000000 < esp
0xf010ff24: 0x00000000
0xf010ff28: 0x00000000
0xf010ff2c: 0x00000000
0xf010ff30: 0xf01008eb
0xf010ff34: 0x00000001
0xf010ff38: 0xf010ff58 < ebp
0xf010ff3c: 0xf0100069 < return address
0xf010ff40: 0x00000000 < x=0
0xf010ff44: 0x00000001
0xf010ff48: 0xf010ff78
- inside
mon_backtrace(0, 0, 0)
:
esp = 0xf010ff18
eip = 0xf01007a2
ebp = 0xf010ff18
ebx = 0x00000000
eax = 0x00000000
>>> x/x 0xf010ff18
0xf010ff0c: 0x00000000
0xf010ff10: 0xf01008eb
0xf010ff14: 0xf010ff2c
0xf010ff18: 0xf010ff38 < esp, ebp
0xf010ff1c: 0xf0100087 < return address
0xf010ff20: 0x00000000 < argc=0
0xf010ff24: 0x00000000 < **argv=0
0xf010ff28: 0x00000000 < *tf=0
0xf010ff2c: 0x00000000
0xf010ff30: 0xf01008eb
0xf010ff34: 0x00000001
0xf010ff38: 0xf010ff58
How many 32-bit words does each recursive nesting level of
test_backtrace
push on the stack, and what are those words?
Following above debug output from gdb
and this calling convention
diagram:
Each recursive call to test_backtrace
pushes onto stack:
- function parameter
x
value; - return address;
- stored
ebp
value; - 5 local variables.
In total - eight 32-bit words.
The above exercise should give you the information you need to implement a stack backtrace function, which you should call
mon_backtrace()
. A prototype for this function is already waiting for you inkern/monitor.c
. You can do it entirely in C, but you may find theread_ebp()
function ininc/x86.h
useful. You'll also have to hook this new function into the kernel monitor's command list so that it can be invoked interactively by the user.
The backtrace function should display a listing of function call frames in the following format:
Stack backtrace:
ebp f0109e58 eip f0100a62 args 00000001 f0109e80 f0109e98 f0100ed2 00000031
ebp f0109ed8 eip f01000d6 args 00000000 00000000 f0100058 f0109f28 00000061
...
Implement the backtrace function as specified above.
int
mon_backtrace(int argc, char **argv, struct Trapframe *tf)
{
uint32_t ebp = read_ebp();
uint32_t *p = (uint32_t *)ebp;
int i;
cprintf("Stack backtrace:\n");
while (*p) {
cprintf(" ebp %08x eip %08x args", *p, *(p + 1));
for (i = 0; i < 5; i++) {
cprintf(" %08x", *(p + 2 + i));
}
cprintf("\n");
p = (uint32_t *)*p;
}
return 0;
}
Sample output:
6828 decimal is 15254 octal!
entering test_backtrace 5
entering test_backtrace 4
entering test_backtrace 3
entering test_backtrace 2
entering test_backtrace 1
entering test_backtrace 0
Stack backtrace:
ebp f010ff38 eip f0100087 args 00000000 00000000 00000000 00000000 f010094c
ebp f010ff58 eip f0100069 args 00000000 00000001 f010ff78 00000000 f010094c
ebp f010ff78 eip f0100069 args 00000001 00000002 f010ff98 00000000 f010094c
ebp f010ff98 eip f0100069 args 00000002 00000003 f010ffb8 00000000 f010094c
ebp f010ffb8 eip f0100069 args 00000003 00000004 00000000 00000000 00000000
ebp f010ffd8 eip f0100069 args 00000004 00000005 00000000 00010094 00010094
ebp f010fff8 eip f01000ea args 00000005 00001aac 00000644 00000000 00000000
leaving test_backtrace 0
leaving test_backtrace 1
leaving test_backtrace 2
leaving test_backtrace 3
leaving test_backtrace 4
leaving test_backtrace 5
Welcome to the JOS kernel monitor!
Type 'help' for a list of commands.
K>
Modify your stack backtrace function to display, for each
eip
, the function name, source file name, and line number corresponding to thateip
.
In
debuginfo_eip
, where do__STAB_*
come from?
kern/kernel.ld
defines two sections for debugging information:
.stab
and .stabstr
. As per
further
linker script documentation:
In some cases, it is desirable for a linker script to define a symbol only if it is referenced and is not defined by any object included in the link.
gcc
is instructed to produce debugging information by -gstabs
option.
So in this case, linker script puts __STAB_BEGIN
and __STAB_END
symbols before and after .stabs
section, so that it is possible to
reference the whole .stabs
section in the code by its boundaries.
gdb
stabs
documentation on line numbers says:
An
N_SLINE
symbol represents the start of a source line. Thedesc
field contains the line number and thevalue
contains the code address for the start of that source line.
Therefore the missing piece of code in debuginfo_eip
function looks
like this:
// Search within [lline, rline] for the line number stab.
// If found, set info->eip_line to the right line number.
// If not found, return -1.
//
// Hint:
// There's a particular stabs type used for line numbers.
// Look at the STABS documentation and <inc/stab.h> to find
// which one.
stab_binsearch(stabs, &lline, &rline, N_SLINE, addr);
if (lline == 0)
return -1;
info->eip_line = stabs[rline].n_desc;
Updated monitor.c
file:
static struct Command commands[] = {
{ "help", "Display this list of commands", mon_help },
{ "kerninfo", "Display information about the kernel", mon_kerninfo },
{ "backtrace", "Display function backtrace", mon_backtrace },
};
// ...
int
mon_backtrace(int argc, char **argv, struct Trapframe *tf)
{
uint32_t ebp = read_ebp();
uint32_t *p = (uint32_t *)ebp;
int i, len, fn_offset;
struct Eipdebuginfo info;
char fn_name[128] = {0};
cprintf("Stack backtrace:\n");
while (p) {
cprintf(" ebp %08x eip %08x args", p, *(p + 1));
for (i = 0; i < 5; i++) {
cprintf(" %08x", *(p + 2 + i));
}
cprintf("\n");
debuginfo_eip(*(p + 1), &info);
cprintf(" %s:%d: ", info.eip_file, info.eip_line);
cprintf("%.*s", info.eip_fn_namelen, info.eip_fn_name);
cprintf("+%d\n", *(p + 1) - info.eip_fn_addr);
p = (uint32_t *)*p;
}
return 0;
}
Sample output of backtrace
kernel command:
K> backtrace
Stack backtrace:
ebp f010ff68 eip f0100991 args 00000001 f010ff80 00000000 f010ffc8 f0112540
kern/monitor.c:145: monitor+259
ebp f010ffd8 eip f01000f6 args 00000000 00001aac 00000644 00000000 00000000
kern/init.c:43: i386_init+89
ebp f010fff8 eip f010003e args 00111021 00000000 00000000 00000000 00000000
kern/entry.S:83: <unknown>+0
Output of make grade
:
$ make grade
make clean
make[1]: Entering directory `/home/vagrant/jos'
rm -rf obj .gdbinit jos.in qemu.log
make[1]: Leaving directory `/home/vagrant/jos'
./grade-lab1
make[1]: Entering directory `/home/vagrant/jos'
make[1]: Leaving directory `/home/vagrant/jos'
make[1]: Entering directory `/home/vagrant/jos'
+ as kern/entry.S
+ cc kern/entrypgdir.c
+ cc kern/init.c
+ cc kern/console.c
+ cc kern/monitor.c
+ cc kern/printf.c
+ cc kern/kdebug.c
+ cc lib/printfmt.c
+ cc lib/readline.c
+ cc lib/string.c
+ ld obj/kern/kernel
+ as boot/boot.S
+ cc -Os boot/main.c
+ ld boot/boot
boot block is 380 bytes (max 510)
+ mk obj/kern/kernel.img
make[1]: Leaving directory `/home/vagrant/jos'
running JOS: (1.9s)
printf: OK
backtrace count: OK
backtrace arguments: OK
backtrace symbols: OK
backtrace lines: OK
Score: 50/50
This completes the lab.