# 【施工完成】MIT 6.828 lab 1: C, Assembly, Tools and Bootstrapping

## Part 1: PC Bootstrap

### The ROM BIOS

• What is the last instruction of the boot loader executed, and what is the first instruction of the kernel it just loaded?

boot loader执行的最后一条指令是0x7d61:      call   *0x10018  ,对应的c语言代码是 ((void (*)(void)) (ELFHDR->e_entry))();   kernel加载后执行的第一条指令为 movw   $0x1234,0x472 • Where is the first instruction of the kernel? kernel的第一条指令的地址为0x10000c • How does the boot loader decide how many sectors it must read in order to fetch the entire kernel from disk? Where does it find this information? boot loader先读一小部分kernel，具体来说是8个sector，也就是1 page,对应的代码为 readseg((uint32_t) ELFHDR, SECTSIZE*8, 0); 然后读进来的这部分里面包含了整个kernel有多大的信息，这些信息存储在inc/elf.h文件中。 ### Loading the Kernel 练习4提到了要熟悉c语言的指针..去看了下推荐的”The C Programming Language “..发现真是一本非常棒的入门书…之前还以为是像《算法导论》一样只可远观的大部头…可惜已经不适初学者了… 练习4中给出了一段使用c语言指针的代码，第5个输出要注意一下大小端… 在继续之前，需要仔细看一下elf文件的内容ELF ### ELF文件 elf文件分成了很多个section,通常.data section存放初始化的global/static variable,.text 存放代码，.rodata section 用来存放字符串常量，.bss section用来存放未初始化的global/static variabel. .bss section没有对应的变量内容，原因是未初始化的变量按照规定会默认为0，因此没必要再存一次。“Thus there is no need to store contents for .bss in the ELF binary; instead, the linker records just the address and size of the .bss section. The loader or the program itself must arrange to zero the.bss section.” 我们比较关心的是.data section, .text section, .rodata section 我们可以用 objdump -h 命令查看一个ELF文件的 section header, 其中size是这个section的大小，VMA (Virtual Memory Address，6.828中叫link address) 是section开始执行时所在的memory address,LMA (Load Memory Address)是这个section被加载到memory中所处的位置。通常这两个地址是一样的。 boot loader使用elf文件中的program header来决定如何记载section, program header指明了ELF文件的哪一部分需要记载到memory中，以及加载到memory的什么位置。我们可以用bjdump -x obj/kern/kernel查看ELF的全部header文件 练习5 Trace through the first few instructions of the boot loader again and identify the first instruction that would “break” or otherwise do the wrong thing if you were to get the boot loader’s link address wrong. Then change the link address in boot/Makefrag to something wrong, run make clean, recompile the lab with make, and trace into the boot loader again to see what happens. Don’t forget to change the link address back and make clean again afterward! 把boot loader的link address从0x7c00改成了0x9c00… 然后进入gdb单步调试。 发现lgdtw的参数出现了负数 [ 0:7c1e] => 0x7c1e: lgdtw -0x639c ，然后继续执行，到[ 0:7c2d] => 0x7c2d: ljmp$0x8,\$0x9c32  ,发生了crash.

0x7c00时，0x00100000处的8个word的值都为0…

## Part 3: The Kernel

### Using virtual memory to work around position dependence

OS的kernel通常喜欢运行再较高地址的虚拟内存中，比如0xf0100000，为的是低地址留给用户程序。但是有的机器可能没有那么大的memory，因此不存在0xf0100000这个物理地址。因此这里需要做一个虚拟内存到物理内存的映射。在这个部分实验中，我们不需要至少地址映射是如何work的，只需要知道效果就好。

Exercise 7.  Use QEMU and GDB to trace into the JOS kernel and stop at the  movl %eax, %cr0. Examine memory at 0x00100000 and at 0xf0100000. Now, single step over that instruction using the stepi GDB command. Again, examine memory at 0x00100000 and at 0xf0100000. Make sure you understand what just happened.

What is the first instruction after the new mapping is established that would fail to work properly if the mapping weren’t in place? Comment out the  movl %eax, %cr0 in kern/entry.S, trace into it, and see if you were right.

Once  CR0_PG is set, memory references are virtual addresses that get translated by the virtual memory hardware to physical addresses.  entry_pgdir translates virtual addresses in the range 0xf0000000 through 0xf0400000 to physical addresses 0x00000000 through 0x00400000, as well as virtual addresses 0x00000000 through 0x00400000 to physical addresses 0x00000000 through 0x00400000

### Formatted Printing to the Console

printf的格式化输出并不是天生就有的，首先阅读一下相关的几个代码。kern/printf.c, kern/console.c和lib/printfmt.c

Exercise 8. We have omitted a small fragment of code – the code necessary to print octal numbers using patterns of the form “%o”. Find and fill in this code fragment.

1. Explain the interface between printf.c and console.c. Specifically, what function does console.c export? How is this function used by printf.c?

printf.c与console.c的接口是console.c中的cputchar()，作用是向console中打印一个字符。printf.c在patch()函数中使用了cputchar()

2.Explain the following from console.c:

3. For the following questions you might wish to consult the notes for Lecture 2. These notes cover GCC’s calling convention on the x86.

Trace the execution of the following code step-by-step:

• In the call to  cprintf(), to what does  fmt point? To what does  ap point?
• List (in order of execution) each call to  cons_putcva_arg, and  vcprintf. For  cons_putc, list its argument as well. For  va_arg, list what  ap points to before and after the call. For  vcprintf list the values of its two arguments.

• 在cprintf的调用中，fmt指向的是”x %d, y %x, z %d\n”, ap指向的是第一个变长参数，也就是变量x在调用栈中的地址。
• cons_putc调用的过程按先后顺序为:
• cons_putc(‘x’)
• cons_putc(‘ ‘)
• cons_putc(‘1’)
• cons_putc(‘,’)
• cons_putc(‘ ‘)
• cons_putc(‘y’)
• cons_putc(‘ ‘)
• cons_putc(‘3’)
• cons_putc(‘,’)
• cons_putc(‘ ‘)
• cons_putc(‘z’)
• cons_putc(‘ ‘)
• cons_putc(‘4’)
• cons_putc(‘\n’)
• va_arg一共调用了三次
• 第一次调用前,ap指向参数x在栈中的地址,调用之后,ap指向参数y在栈中的地址。
• 第二次调用前,ap指向参数y在栈中的地址,调用之后,ap指向参数z在栈中的地址。
• 第三次调用前,ap指向参数z在栈中的地址,调用之后,ap指向参数z之后4字节的地址。
• vcprintf的参数值为”x %d, y %x, z %d\n” 和 参数x在调用栈中的地址。

4.Run the following code.

What is the output? Explain how this output is arrived at in the step-by-step manner of the previous exercise. Here’s an ASCII table that maps bytes to characters.

The output depends on that fact that the x86 is little-endian. If the x86 were instead big-endian what would you set  i to in order to yield the same output? Would you need to change  57616 to a different value?

57616不需要做修改，因为整数类型staic_cast不存在字节序问题。

5.In the following code, what is going to be printed after  'y='? (note: the answer is not a specific value.) Why does this happen?

x的结果就是3，y的输出是没意义的一个整数。原因是，这句话会发生当va_list中没有下一个变量时，仍然使用va_arg去取下一个变量。而根据va_arg，此时的行为是undefined behaviour.

6.Let’s say that GCC changed its calling convention so that it pushed arguments on the stack in declaration order, so that the last argument is pushed last. How would you have to change  cprintf or its interface so that it would still be possible to pass it a variable number of arguments?

### The Stack

Exercise 9. Determine where the kernel initializes its stack, and exactly where in memory its stack is located. How does the kernel reserve space for its stack? And at which “end” of this reserved area is the stack pointer initialized to point to?

Exercise 10. To become familiar with the C calling conventions on the x86, find the address of the  test_backtrace function in obj/kern/kernel.asm, set a breakpoint there, and examine what happens each time it gets called after the kernel starts. How many 32-bit words does each recursive nesting level of  test_backtrace push on the stack, and what are those words?

Note that, for this exercise to work properly, you should be using the patched version of QEMU available on the tools page or on Athena. Otherwise, you’ll have to manually translate all breakpoint and memory addresses to linear addresses.

test_backtrace的入口地址在0xf0100040,在这里设置断点，然后最后的输出结果如下:

Exercise 11. Implement the backtrace function as specified above. Use the same format as in the example, since otherwise the grading script will be confused. When you think you have it working right, run make grade to see if its output conforms to what our grading script expects, and fix it if it doesn’t. After you have handed in your Lab 1 code, you are welcome to change the output format of the backtrace function any way you like.

If you use  read_ebp(), note that GCC may generate “optimized” code that calls  read_ebp() before mon_backtrace()’s function prologue, which results in an incomplete stack trace (the stack frame of the most recent function call is missing). While we have tried to disable optimizations that cause this reordering, you may want to examine the assembly of  mon_backtrace() and make sure the call to read_ebp() is happening after the function prologue.

Exercise 12. Modify your stack backtrace function to display, for each eip, the function name, source file name, and line number corresponding to that eip.

In  debuginfo_eip, where do __STAB_* come from? This question has a long answer; to help you to discover the answer, here are some things you might want to do:

• look in the file kern/kernel.ld for __STAB_*
• run objdump -h obj/kern/kernel
• run objdump -G obj/kern/kernel
• run gcc -pipe -nostdinc -O2 -fno-builtin -I. -MD -Wall -Wno-format -DJOS_KERNEL -gstabs -c -S kern/init.c, and look at init.s.

Complete the implementation of  debuginfo_eip by inserting the call to  stab_binsearch to find the line number for an address.

Add a backtrace command to the kernel monitor, and extend your implementation of  mon_backtrace to call  debuginfo_eip and print a line for each stack frame of the form:

Each line gives the file name and line within that file of the stack frame’s eip, followed by the name of the function and the offset of the eip from the first instruction of the function (e.g., monitor+106 means the return eip is 106 bytes past the beginning of monitor).

Be sure to print the file and function names on a separate line, to avoid confusing the grading script.

Tip: printf format strings provide an easy, albeit obscure, way to print non-null-terminated strings like those in STABS tables. printf("%.*s", length, string) prints at most  length characters of  string. Take a look at the printf man page to find out why this works.

You may find that some functions are missing from the backtrace. For example, you will probably see a call to  monitor() but not to  runcmd(). This is because the compiler in-lines some function calls. Other optimizations may cause you to see unexpected line numbers. If you get rid of the -O2 fromGNUMakefile, the backtraces may make more sense (but your kernel will run more slowly).

objdump -h obj/kern/kernel的输出为