add report

2025-04-21 12:27:27 +03:00 · 2009-05-25 21:52:44 +02:00
parent f800bc51be
commit 1a30189b1b
2 changed files with 311 additions and 1 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,7 +1,6 @@
 all
 all.raw
 all.raw.gz
-report
 uimage
 *.o
 *.cc
--- a/report/making-of.tex
+++ b/report/making-of.tex
@@ -0,0 +1,311 @@
+\documentclass{shevek}
+\begin{document}
+\title{Writing a kernel from scratch}
+\author{Bas Wijnen}
+\date{\today}
+\maketitle
+\begin{abstract}
+This is a report of the process of writing a kernel from scratch for
+the cheap (€150) Trendtac laptop.  In a following report I shall write about
+the operating system on top of it.  It is written while writing the system, so
+that no steps are forgotten.  Choices are explained and problems (and their
+solutions) are shown.  After reading this, you should have a thorough
+understanding of the kernel, and (with significant effort) be able to write a
+similar kernel yourself.  This document assumes a working Debian system with
+root access (for installing packages), and some knowledge about computer
+architectures.  (If you lack that knowledge, you can try to read it anyway and
+check other sources when you see something new.)
+\end{abstract}
+
+\tableofcontents
+
+\section{Hardware details}
+The first step in the process of writing an operating system is finding out
+what the system is you're going to program for.  While most of the work is
+supposed to be platform--independant, some parts, especially in the beginning,
+will depend very much on the actual hardware.  So I searched the net and found:
+\begin{itemize}
+\item There's a \textbf{Jz4730} chip inside, which implements most
+functionality.  It has a mips core, an OHCI USB host controller (so no USB2),
+an AC97 audio device, a TFT display controller, an SD card reader, a network
+device, and lots of general purpose I/O pins, which are used for the LEDs and
+the keyboard.  There are also two PWM outputs, one of which seems to be used
+with the display.  It also has some other features, such as a digital camera
+controller, which are not used in the design.
+\item There's a separate 4-port USB hub inside.
+\item There's a serial port which is accessible with a tiny connector inside
+the battery compartiment.  It uses TTL signals, so to use it with a PC serial
+port, the signals must be converted with a MAX232.  That is normal for these
+boards, so I already have one handy.  The main problem in this case is that the
+connector is an unusual one, so it may take some time until I can actually
+connect things to the serial port.
+\end{itemize}
+
+First problem is how to write code which can be booted.  This seems easy: put a
+file named \textbf{uimage} on the first partition on an SD card, which must be
+formatted FAT or ext3, and hold down Fn, left shift and left control while
+booting.  The partition must also not be larger than 32 MB.
+
+The boot program is u-boot, which has good documentation on the web.  Also,
+there is a Debian package named uboot-mkimage, which has the mkimage executable
+to create images that can be booted using u-boot.  uimage should be in this
+format.
+
+To understand at least something of addresses, it's important to understand the
+memory model of the mips architecture:
+\begin{itemize}
+\item usermode code will never reference anything in the upper half of the memory (above 0x80000000).  If it does, it receives a segmentation fault.
+\item access in the lower half is paged and can be cached.  This is called
+kuseg when used from kernel code.  It will access the same pages as non-kernel
+code finds there.
+\item the upper half is divided in 3 segments.
+\item kseg0 runs from 0x80000000 to 0xa0000000.  Access to this memory will
+access physical memory from 0x00000000 to 0x20000000.  It is cached, but not
+mapped (meaning it accesses physical, not virtual, memory)
+\item kseg1 runs from 0xa0000000 to 0xc0000000.  It is identical to kseg0,
+except that is is not cached.
+\item kseg2 runs from 0xc0000000 to the top.  It is mapped like user memory,
+differently for each process, and can be cached.  It is intended for
+per-address space kernel structures.  I shall not use it in my kernel.
+\end{itemize}
+U-boot has some standard commands.  It can load the image from the SD card at
+0x80600000.  Even though the Linux image seems to use a different address, I'll
+go with this one for now.
+
+\section{Cross-compiler}
+Next thing to do is build a cross-compiler so it is possible to try out some
+things.  This shouldn't need to be very complex, but it is.  I wrote a separate
+document about how to do this.  Please read that if you don't have a working
+cross-compiler, or if you would like to install libraries for cross-building
+more easily.
+
+\section{Making things run}
+For loading a program, it must be a binary executable with a header.  The
+header is inserted by mkimage.  It needs a load address and an entry point.
+Initially at least, the load address is 0x80600000.  The entry point must be
+computed from the executable.  The easiest way to do this is by making sure
+that it is the first byte in the executable.  The file can then be linked as
+binary, so without any headers.  This is done by giving the
+\verb+--oformat binary+ switch to ld.  I think the image is loaded without the
+header, so that can be completely ignored while building.  However, it might
+include it.  In that case, the entry point should be 0x40 higher, because
+that's the size of the header.
+
+\section{The first version of the kernel}
+This sounds better than it is.  The first version will be able to boot, and
+somehow show that it did that.  Not too impressive at all, and certainly not
+usable.  It is meant to find out if everything I wrote above actually works.
+
+For this kernel I need several things: a program which can boot, and a way to
+tell the user.  As the way to tell the user, I decided to use the caps-lock
+LED.  The display is quite complex to program, I suppose, so I won't even try
+at this stage.  The LED should be easy.  Especially because Linux can use it
+too.  I copied the code from the Linux kernel patch that seemed to be about the
+LED, and that gave me the macros \verb+__gpio_as_output+, \verb+__gpio_set_pin+
+and \verb+__gpio_clear_pin+.  And of course there's \verb+CAPSLOCKLED_IO+,
+which is the pin to set or clear.
+
+I used these macros in a function I called \verb+kernel_entry+.  In an endless
+loop, it switches the LED on 1000000 times, then off 1000000 times.  If the
+time required to set the led is in the order of microseconds, the LED should be
+blinking in the order of seconds.  I tried with 1000 first, but that left the
+LED on seemingly permanently, so it was appearantly way too fast.
+
+This is the code I want to run, but it isn't quite ready for that yet.  A C
+function needs to have a stack when it is called.  It is possible that u-boot
+provides one, but it may also not do that.  To be sure, it's best to use some
+assembly as the real entry point, which sets up the stack and calls the
+function.
+
+The symbol that ld will use as its entry point must be called \verb+__start+
+(on some other architectures with just one underscore).  So I created a simple
+assembly file which defines some stack space and does the setting up.  It also
+sets \$gp to the so-called \textit{global offset table}, and clears the .bss
+section.  This is needed to make compiler-generated code run properly.
+
+Now how to build the image file?  This is a problem.  The ELF format allows
+paged memory, which means that simply loading the file may not put everything
+at its proper address.  ld has an option for this, \verb+--omagic+.  This is
+meant for the a.out format, which isn't supported by mipsel binutils, but that
+doesn't matter.  The result is still that the .text section (with the
+executable code) is first in the file, immediately followed by the .data
+section.  So that means that loading the file into memory at the right address
+results in all parts of the file in the proper place.  Adding
+\verb+-Ttext 0x80600000+ makes everything right.  However, the result is still
+an ELF file.  So I use objcopy with \verb+-Obinary+ to create a binary file
+from it.  At this point, I also extract the start address (the location of
+\verb+__start+) from the ELF file, and use that for building uimage.  That
+way it is no longer needed that \_\_start is at the first byte of the file.
+
+Booting from the SD card is as easy as it seemed, except that I first tried an
+mmc card (which fits in the same slot, and usually works when SD is accepted)
+and that didn't work.  So you really need an SD card.
+
+\section{Context switching}
+One very central thing in the kernel is context switching.  That is, we need to
+know how the registers and the memory are organized when a user program is
+running.  In order to understand that, we must know how paging is done.  I
+already found that it is done by coprocessor 0, so now I need to find out how
+that works.
+
+On the net I found the \textit{MIPS32 architecture for developers}, version 3
+of which is sub-titled \textit{the MIPS32 priviledged resource architecture}.
+It explains everything there is to know about things which are not accessible
+from normal programs.  In other words, it is exactly the right book for
+programming a kernel or device driver using this processor.  How nice.
+
+It explains that memory accesses to the lower 2GB are (almost always) mapped
+through a TLB (translation lookaside buffer).  This is an array of some records
+where virtual to physical address mappings are stored.  In case of a TLB-miss
+(the virtual address cannot be found in the table), an exception is generated
+and the kernel must insert the mapping into the TLB.
+
+This is very flexible, because I get to decide how I write the kernel.  I shall
+use something similar to the hardware implementation of the IBM PC: a page
+directory which contains links to page tables, with each page table filled with
+pointers to page information.  It is useful to have a direct mapping from
+virtual address to kernel data as well.  There are several ways how this can be
+achieved.  The two simplest ones each have their own drawback: making a shadow
+page directory with shadow page tables with links to the kernel structures
+instead of the pages wastes some memory.  Using only the shadow, and doing a
+lookup of the physical address in the kernel structure (where it must be stored
+anyway) wastes some cpu time during the lookup.  At this moment I do not know
+what is more expensive.  I'll initially go for the cpu time wasting approach.
+
+\section{Kernel entry}
+Now that I have an idea of how a process looks in memory, I need to implement
+kernel entry and exit.  A process is preempted or makes a request, then the
+kernel responds, and then a process (possibly the same) is started again.
+
+The main problem of kernel entry is to save all registers in the kernel
+structure which is associated with the thread.  In case of the MIPS processor,
+there is a simple solution: there are two registers, k0 and k1, which cannot be
+used by the thread.  So they can be set before starting the thread, and will
+still have their values when the kernel is entered again.  By pointing one of
+them to the place to save the data, it becomes easy to perform the save and
+restore.
+
+As with the bootstrap process, this must be done in assembly.  In this case
+this is because the user stack must not be used, and a C function will use the
+current stack.  It will also mess up some registers before you can save them.
+
+The next problem is how to get the interrupt code at its address.  I'll try to
+load the thing at address 0x80000000.  It seems to work, which is good.  Linux
+probably has some reason to do things differently, but if this works, it is the
+easiest way.
+
+\section{Memory organization}
+Now I've reached the point where I need to create some memory structures.  To
+do that, I first need to decide how to organize the memory.  There's one very
+simple rule in my system: everyone must pay for what they use.  For memory,
+this means that a process brings its own memory where the kernel can write
+things about it.  The kernel does not need its own allocation system, because
+it always works for some process.  If the process doesn't provide the memory,
+the operation will fail.
+
+Memory will be organized hierarchically.  It belongs to a container, which I
+shall call \textit{memory}.  The entire memory is the property of another
+memory, its parent.  This is true for all but one, which is the top level
+memory.  The top level memory owns all memory in the system.  Some of it
+directly, most of it through other memories.
+
+The kernel will have a list of unclaimed pages.  For optimization, it actually
+has two lists: one with pages containing only zeroes, one with pages containing
+junk.  When idle, the junk pages can be filled with zeroes.
+
+Because the kernel starts at address 0, building up the list of pages is very
+easy: starting from the first page above the top of the kernel, everything is
+free space.  Initially, all pages are added to the junk list.
+
+\section{The idle task}
+When there is nothing to do, an endless loop should be waiting for interrupts.
+This loop is called the idle task.  I use it also to exit bootstrapping, by
+enabling interrupts after everything is set up as if we're running the idle
+task, and then jumping to it.
+
+There are two options for the idle task, again with their own drawbacks.  The
+idle task can run in kernel mode.  This is easy, it doesn't need any paging
+machinery then.  However, this means that the kernel must read-modify-write the
+status register of coprocessor 0, which contains the operating mode, on every
+context switch.  That's quite an expensive operation for such a critical path.
+
+The other option is to run it in user mode.  The drawback there is that it
+needs a page directory and a page table.  However, since the code is completely
+trusted, it may be possible to sneak that in through some unused space between
+two interrupt handlers.  That means there's no fault when accessing some memory
+owned by others, but the idle task is so trivial that it can be assumed to run
+without affecting them.
+
+\section{Intermezzo: some problems}
+Some problems came up while working.  First, I found that the code sometimes
+didn't work and sometimes it did.  It seemed that it had problems when the
+functions I called became more complex.  Looking at the disassembly, it appears
+that I didn't fully understand the calling convention used by the compiler.
+Appearantly, it always needs to have register t9 set to the called function.
+In all compiled code, functions are called as \verb+jalr $t9+.  It took quite
+some time to figure this out, but setting t9 to the called function in my
+assembly code does indeed solve the problem.
+
+The other problem is that the machine was still doing unexpected things.
+Appearantly, u-boot enables interrupts and handles them.  This is not very nice
+when I'm busy setting up interrupt handlers.  So before doing anything else, I
+first switch off all interrupts by writing 0 to the status register of CP0.
+
+This also reminded me that I need to flush the cache, so that I can be sure
+everything is correct.  For that reason, I need to start at 0xa0000000, not
+0x80000000, so that the startup code is not cached.  It should be fine to load
+the kernel at 0x80000000, but jump in at the non-cached location anyway, if I
+make sure the initial code, which clears the cache, can handle it.  After that,
+I jump to the cached region, and everything should be fine.  However, at this
+moment I first link the kernel at the non-cached address, so I don't need to
+worry about it.
+
+Finally, I read in the books that k0 and k1 are in fact normal general purpose
+registers.  So while they are by convention used for kernel purposes, and
+compilers will likely not touch them.  However, the kernel can't actually rely
+on them not being changed by user code.  So I'll need to use a different
+approach for saving the processor state.  The solution is trivial: use k1 as
+before, but first load it from a fixed memory location.  To be able to store k1
+itself, a page must be mapped in kseg3 (wired into the tlb), which can then be
+accessed with a negative index to \$zero.
+
+At this point, I was completely startled by crashes depending on seemingly
+irrelevant changes.  After a lot of investigation, I saw that I had forgotten
+that mips jumps have a delay slot, which is executed after the jump, before the
+first new instruction is executed.  I was executing random instructions, which
+lead to random behaviour.
+
+\section{Back to the idle task}
+With all this out of the way, I continued to implement the idle task.  I hoped
+to be able to never write to the status register.  However, this is not
+possible.  The idle task must be in user mode, and it must call wait.  That
+means it needs the coprocessor 0 usable bit set.  This bit may not be set for
+normal processes, however, or they would be able to change the tlb and all
+protection would be lost.  However, writing to the status register is not a
+problem.  First of all, it is only needed during a task switch, and they aren't
+as frequent as context switches (every entry to the kernel is a context switch,
+only when a different task is entered from the kernel than exited to the kernel
+is it a task switch).  Furthermore, and more importantly, coprocessor 0 is
+intgrated into the cpu, and writing to it is actually a very fast operation and
+not something to be avoided at all.
+
+So to switch to user mode, I set up the status register so that it looks like
+it's handling an exception, set EPC to the address of the idle task, and use
+eret to ``return'' to it.
+
+\section{Timer interrupts}
+This worked well.  Now I expected to get a timer interrupt soon after jumping
+to the idle task.  After all, I have set up the compare register, the timer
+should be running and I enabled the interrupts.  However, nothing happened.  I
+looked at the contents of the count register, and found that it was 0.  This
+means that it is not actually counting at all.  Looking at the Linux sources,
+they don't use this timer either, but instead use the cpu-external (but
+integrated in the chip) timer.  The documentation says that they have a
+different reason for this than a non-functional cpu timer.  Still, it means it
+can be used as an alternative.
+
+Having a timer is important for preemptive multitasking: a process needs to be
+interrupted in order to be preempted, so there needs to be a periodic interrupt
+source.
+
+\end{document}