diff --git a/.gitignore b/.gitignore index fe8af14..df1a3c9 100644 --- a/.gitignore +++ b/.gitignore @@ -1,7 +1,6 @@ all all.raw all.raw.gz -report uimage *.o *.cc diff --git a/report/making-of.tex b/report/making-of.tex new file mode 100644 index 0000000..771d286 --- /dev/null +++ b/report/making-of.tex @@ -0,0 +1,311 @@ +\documentclass{shevek} +\begin{document} +\title{Writing a kernel from scratch} +\author{Bas Wijnen} +\date{\today} +\maketitle +\begin{abstract} +This is a report of the process of writing a kernel from scratch for +the cheap (€150) Trendtac laptop. In a following report I shall write about +the operating system on top of it. It is written while writing the system, so +that no steps are forgotten. Choices are explained and problems (and their +solutions) are shown. After reading this, you should have a thorough +understanding of the kernel, and (with significant effort) be able to write a +similar kernel yourself. This document assumes a working Debian system with +root access (for installing packages), and some knowledge about computer +architectures. (If you lack that knowledge, you can try to read it anyway and +check other sources when you see something new.) +\end{abstract} + +\tableofcontents + +\section{Hardware details} +The first step in the process of writing an operating system is finding out +what the system is you're going to program for. While most of the work is +supposed to be platform--independant, some parts, especially in the beginning, +will depend very much on the actual hardware. So I searched the net and found: +\begin{itemize} +\item There's a \textbf{Jz4730} chip inside, which implements most +functionality. It has a mips core, an OHCI USB host controller (so no USB2), +an AC97 audio device, a TFT display controller, an SD card reader, a network +device, and lots of general purpose I/O pins, which are used for the LEDs and +the keyboard. There are also two PWM outputs, one of which seems to be used +with the display. It also has some other features, such as a digital camera +controller, which are not used in the design. +\item There's a separate 4-port USB hub inside. +\item There's a serial port which is accessible with a tiny connector inside +the battery compartiment. It uses TTL signals, so to use it with a PC serial +port, the signals must be converted with a MAX232. That is normal for these +boards, so I already have one handy. The main problem in this case is that the +connector is an unusual one, so it may take some time until I can actually +connect things to the serial port. +\end{itemize} + +First problem is how to write code which can be booted. This seems easy: put a +file named \textbf{uimage} on the first partition on an SD card, which must be +formatted FAT or ext3, and hold down Fn, left shift and left control while +booting. The partition must also not be larger than 32 MB. + +The boot program is u-boot, which has good documentation on the web. Also, +there is a Debian package named uboot-mkimage, which has the mkimage executable +to create images that can be booted using u-boot. uimage should be in this +format. + +To understand at least something of addresses, it's important to understand the +memory model of the mips architecture: +\begin{itemize} +\item usermode code will never reference anything in the upper half of the memory (above 0x80000000). If it does, it receives a segmentation fault. +\item access in the lower half is paged and can be cached. This is called +kuseg when used from kernel code. It will access the same pages as non-kernel +code finds there. +\item the upper half is divided in 3 segments. +\item kseg0 runs from 0x80000000 to 0xa0000000. Access to this memory will +access physical memory from 0x00000000 to 0x20000000. It is cached, but not +mapped (meaning it accesses physical, not virtual, memory) +\item kseg1 runs from 0xa0000000 to 0xc0000000. It is identical to kseg0, +except that is is not cached. +\item kseg2 runs from 0xc0000000 to the top. It is mapped like user memory, +differently for each process, and can be cached. It is intended for +per-address space kernel structures. I shall not use it in my kernel. +\end{itemize} +U-boot has some standard commands. It can load the image from the SD card at +0x80600000. Even though the Linux image seems to use a different address, I'll +go with this one for now. + +\section{Cross-compiler} +Next thing to do is build a cross-compiler so it is possible to try out some +things. This shouldn't need to be very complex, but it is. I wrote a separate +document about how to do this. Please read that if you don't have a working +cross-compiler, or if you would like to install libraries for cross-building +more easily. + +\section{Making things run} +For loading a program, it must be a binary executable with a header. The +header is inserted by mkimage. It needs a load address and an entry point. +Initially at least, the load address is 0x80600000. The entry point must be +computed from the executable. The easiest way to do this is by making sure +that it is the first byte in the executable. The file can then be linked as +binary, so without any headers. This is done by giving the +\verb+--oformat binary+ switch to ld. I think the image is loaded without the +header, so that can be completely ignored while building. However, it might +include it. In that case, the entry point should be 0x40 higher, because +that's the size of the header. + +\section{The first version of the kernel} +This sounds better than it is. The first version will be able to boot, and +somehow show that it did that. Not too impressive at all, and certainly not +usable. It is meant to find out if everything I wrote above actually works. + +For this kernel I need several things: a program which can boot, and a way to +tell the user. As the way to tell the user, I decided to use the caps-lock +LED. The display is quite complex to program, I suppose, so I won't even try +at this stage. The LED should be easy. Especially because Linux can use it +too. I copied the code from the Linux kernel patch that seemed to be about the +LED, and that gave me the macros \verb+__gpio_as_output+, \verb+__gpio_set_pin+ +and \verb+__gpio_clear_pin+. And of course there's \verb+CAPSLOCKLED_IO+, +which is the pin to set or clear. + +I used these macros in a function I called \verb+kernel_entry+. In an endless +loop, it switches the LED on 1000000 times, then off 1000000 times. If the +time required to set the led is in the order of microseconds, the LED should be +blinking in the order of seconds. I tried with 1000 first, but that left the +LED on seemingly permanently, so it was appearantly way too fast. + +This is the code I want to run, but it isn't quite ready for that yet. A C +function needs to have a stack when it is called. It is possible that u-boot +provides one, but it may also not do that. To be sure, it's best to use some +assembly as the real entry point, which sets up the stack and calls the +function. + +The symbol that ld will use as its entry point must be called \verb+__start+ +(on some other architectures with just one underscore). So I created a simple +assembly file which defines some stack space and does the setting up. It also +sets \$gp to the so-called \textit{global offset table}, and clears the .bss +section. This is needed to make compiler-generated code run properly. + +Now how to build the image file? This is a problem. The ELF format allows +paged memory, which means that simply loading the file may not put everything +at its proper address. ld has an option for this, \verb+--omagic+. This is +meant for the a.out format, which isn't supported by mipsel binutils, but that +doesn't matter. The result is still that the .text section (with the +executable code) is first in the file, immediately followed by the .data +section. So that means that loading the file into memory at the right address +results in all parts of the file in the proper place. Adding +\verb+-Ttext 0x80600000+ makes everything right. However, the result is still +an ELF file. So I use objcopy with \verb+-Obinary+ to create a binary file +from it. At this point, I also extract the start address (the location of +\verb+__start+) from the ELF file, and use that for building uimage. That +way it is no longer needed that \_\_start is at the first byte of the file. + +Booting from the SD card is as easy as it seemed, except that I first tried an +mmc card (which fits in the same slot, and usually works when SD is accepted) +and that didn't work. So you really need an SD card. + +\section{Context switching} +One very central thing in the kernel is context switching. That is, we need to +know how the registers and the memory are organized when a user program is +running. In order to understand that, we must know how paging is done. I +already found that it is done by coprocessor 0, so now I need to find out how +that works. + +On the net I found the \textit{MIPS32 architecture for developers}, version 3 +of which is sub-titled \textit{the MIPS32 priviledged resource architecture}. +It explains everything there is to know about things which are not accessible +from normal programs. In other words, it is exactly the right book for +programming a kernel or device driver using this processor. How nice. + +It explains that memory accesses to the lower 2GB are (almost always) mapped +through a TLB (translation lookaside buffer). This is an array of some records +where virtual to physical address mappings are stored. In case of a TLB-miss +(the virtual address cannot be found in the table), an exception is generated +and the kernel must insert the mapping into the TLB. + +This is very flexible, because I get to decide how I write the kernel. I shall +use something similar to the hardware implementation of the IBM PC: a page +directory which contains links to page tables, with each page table filled with +pointers to page information. It is useful to have a direct mapping from +virtual address to kernel data as well. There are several ways how this can be +achieved. The two simplest ones each have their own drawback: making a shadow +page directory with shadow page tables with links to the kernel structures +instead of the pages wastes some memory. Using only the shadow, and doing a +lookup of the physical address in the kernel structure (where it must be stored +anyway) wastes some cpu time during the lookup. At this moment I do not know +what is more expensive. I'll initially go for the cpu time wasting approach. + +\section{Kernel entry} +Now that I have an idea of how a process looks in memory, I need to implement +kernel entry and exit. A process is preempted or makes a request, then the +kernel responds, and then a process (possibly the same) is started again. + +The main problem of kernel entry is to save all registers in the kernel +structure which is associated with the thread. In case of the MIPS processor, +there is a simple solution: there are two registers, k0 and k1, which cannot be +used by the thread. So they can be set before starting the thread, and will +still have their values when the kernel is entered again. By pointing one of +them to the place to save the data, it becomes easy to perform the save and +restore. + +As with the bootstrap process, this must be done in assembly. In this case +this is because the user stack must not be used, and a C function will use the +current stack. It will also mess up some registers before you can save them. + +The next problem is how to get the interrupt code at its address. I'll try to +load the thing at address 0x80000000. It seems to work, which is good. Linux +probably has some reason to do things differently, but if this works, it is the +easiest way. + +\section{Memory organization} +Now I've reached the point where I need to create some memory structures. To +do that, I first need to decide how to organize the memory. There's one very +simple rule in my system: everyone must pay for what they use. For memory, +this means that a process brings its own memory where the kernel can write +things about it. The kernel does not need its own allocation system, because +it always works for some process. If the process doesn't provide the memory, +the operation will fail. + +Memory will be organized hierarchically. It belongs to a container, which I +shall call \textit{memory}. The entire memory is the property of another +memory, its parent. This is true for all but one, which is the top level +memory. The top level memory owns all memory in the system. Some of it +directly, most of it through other memories. + +The kernel will have a list of unclaimed pages. For optimization, it actually +has two lists: one with pages containing only zeroes, one with pages containing +junk. When idle, the junk pages can be filled with zeroes. + +Because the kernel starts at address 0, building up the list of pages is very +easy: starting from the first page above the top of the kernel, everything is +free space. Initially, all pages are added to the junk list. + +\section{The idle task} +When there is nothing to do, an endless loop should be waiting for interrupts. +This loop is called the idle task. I use it also to exit bootstrapping, by +enabling interrupts after everything is set up as if we're running the idle +task, and then jumping to it. + +There are two options for the idle task, again with their own drawbacks. The +idle task can run in kernel mode. This is easy, it doesn't need any paging +machinery then. However, this means that the kernel must read-modify-write the +status register of coprocessor 0, which contains the operating mode, on every +context switch. That's quite an expensive operation for such a critical path. + +The other option is to run it in user mode. The drawback there is that it +needs a page directory and a page table. However, since the code is completely +trusted, it may be possible to sneak that in through some unused space between +two interrupt handlers. That means there's no fault when accessing some memory +owned by others, but the idle task is so trivial that it can be assumed to run +without affecting them. + +\section{Intermezzo: some problems} +Some problems came up while working. First, I found that the code sometimes +didn't work and sometimes it did. It seemed that it had problems when the +functions I called became more complex. Looking at the disassembly, it appears +that I didn't fully understand the calling convention used by the compiler. +Appearantly, it always needs to have register t9 set to the called function. +In all compiled code, functions are called as \verb+jalr $t9+. It took quite +some time to figure this out, but setting t9 to the called function in my +assembly code does indeed solve the problem. + +The other problem is that the machine was still doing unexpected things. +Appearantly, u-boot enables interrupts and handles them. This is not very nice +when I'm busy setting up interrupt handlers. So before doing anything else, I +first switch off all interrupts by writing 0 to the status register of CP0. + +This also reminded me that I need to flush the cache, so that I can be sure +everything is correct. For that reason, I need to start at 0xa0000000, not +0x80000000, so that the startup code is not cached. It should be fine to load +the kernel at 0x80000000, but jump in at the non-cached location anyway, if I +make sure the initial code, which clears the cache, can handle it. After that, +I jump to the cached region, and everything should be fine. However, at this +moment I first link the kernel at the non-cached address, so I don't need to +worry about it. + +Finally, I read in the books that k0 and k1 are in fact normal general purpose +registers. So while they are by convention used for kernel purposes, and +compilers will likely not touch them. However, the kernel can't actually rely +on them not being changed by user code. So I'll need to use a different +approach for saving the processor state. The solution is trivial: use k1 as +before, but first load it from a fixed memory location. To be able to store k1 +itself, a page must be mapped in kseg3 (wired into the tlb), which can then be +accessed with a negative index to \$zero. + +At this point, I was completely startled by crashes depending on seemingly +irrelevant changes. After a lot of investigation, I saw that I had forgotten +that mips jumps have a delay slot, which is executed after the jump, before the +first new instruction is executed. I was executing random instructions, which +lead to random behaviour. + +\section{Back to the idle task} +With all this out of the way, I continued to implement the idle task. I hoped +to be able to never write to the status register. However, this is not +possible. The idle task must be in user mode, and it must call wait. That +means it needs the coprocessor 0 usable bit set. This bit may not be set for +normal processes, however, or they would be able to change the tlb and all +protection would be lost. However, writing to the status register is not a +problem. First of all, it is only needed during a task switch, and they aren't +as frequent as context switches (every entry to the kernel is a context switch, +only when a different task is entered from the kernel than exited to the kernel +is it a task switch). Furthermore, and more importantly, coprocessor 0 is +intgrated into the cpu, and writing to it is actually a very fast operation and +not something to be avoided at all. + +So to switch to user mode, I set up the status register so that it looks like +it's handling an exception, set EPC to the address of the idle task, and use +eret to ``return'' to it. + +\section{Timer interrupts} +This worked well. Now I expected to get a timer interrupt soon after jumping +to the idle task. After all, I have set up the compare register, the timer +should be running and I enabled the interrupts. However, nothing happened. I +looked at the contents of the count register, and found that it was 0. This +means that it is not actually counting at all. Looking at the Linux sources, +they don't use this timer either, but instead use the cpu-external (but +integrated in the chip) timer. The documentation says that they have a +different reason for this than a non-functional cpu timer. Still, it means it +can be used as an alternative. + +Having a timer is important for preemptive multitasking: a process needs to be +interrupted in order to be preempted, so there needs to be a periodic interrupt +source. + +\end{document}