more

2025-04-21 12:27:27 +03:00 · 2009-05-22 22:48:49 +02:00
parent 60ea570aaa
commit 12637f5695
18 changed files with 567 additions and 386 deletions
--- a/report/kernel.tex
+++ b/report/kernel.tex
@@ -1,311 +1,218 @@
 \documentclass{shevek}
 \begin{document}
-\title{Writing a kernel from scratch}
+\title{Overview of my kernel}
 \author{Bas Wijnen}
 \date{\today}
 \maketitle
 \begin{abstract}
-This is a report of the process of writing a kernel from scratch for
-the cheap (€150) Trendtac laptop.  In a following report I shall write about
-the operating system on top of it.  It is written while writing the system, so
-that no steps are forgotten.  Choices are explained and problems (and their
-solutions) are shown.  After reading this, you should have a thorough
-understanding of the kernel, and (with significant effort) be able to write a
-similar kernel yourself.  This document assumes a working Debian system with
-root access (for installing packages), and some knowledge about computer
-architectures.  (If you lack that knowledge, you can try to read it anyway and
-check other sources when you see something new.)
+This document briefly describes the inner workings of my kernel, including the
+reasons for the choices that were made.  It is meant to be understandable (with
+effort) for people who know nothing of operating systems.  On the other hand,
+it should also be readable for people who know about computer architecture, but
+want to know about this kernel.
 \end{abstract}

 \tableofcontents

-\section{Hardware details}
-The first step in the process of writing an operating system is finding out
-what the system is you're going to program for.  While most of the work is
-supposed to be platform--independant, some parts, especially in the beginning,
-will depend very much on the actual hardware.  So I searched the net and found:
+\section{Operating systems}
+This section describes what the purpose of an operating system is, and defines
+what I call an ``operating system''\footnote{Different people use very
+different definitions, so this is not as trivial as it sounds.}.  It also goes
+into some detail about microkernels and capabilities.  If you already know, you
+can safely skip this section.  It contains no information about my kernel.
+
+\subsection{The goal of an operating system}
+In the 1980s, a computer could only run one program at a time.  When the
+program had finished, the next one could be started.  This follows the
+processor itself: it runs a program, from the beginning until the end, and
+can't run more than one program simultaneously\footnote{Multi-core processors
+technically can run multiple programs simultaneously, but I'm not talking about
+those here.}.  In those days, an \textit{operating system} was the program that
+allowed other programs to be started.  The best known operating systems were
+called \textit{Disk operating system}, or \textit{DOS} (of which there were
+several).
+
+At some point, there was a need for programs that would ``help'' other programs
+in some way.  For example, they could provide a calculator which would pop up
+when the user pressed a certain key combination.  Such programs were called
+\textit{terminate and stay resident} programs, or TSRs.  This name came from
+the fact that they terminated, in the sense that they would allow the next
+program to be run, but they would stay resident and do their job in the
+background.
+
+At some point, people wanted to de \textit{multitasking}.  That is, multiple
+``real'' programs should run concurrently, not just some helpers.  The easiest
+way to implement this is with \textit{cooperative multitasking}.  Every program
+returns control to the system every now and then.  The system switches between
+all the running programs.  The result is that every program runs for a short
+time, several times per second.  For the user, this looks like the programs are
+all running simultaneously, while in reality it is similar to a chess master
+playing simultaneously on many boards: he really plays on one board at a time,
+but switches a lot.  On such a system, the \textit{kernel} is the program that
+chooses which program to run next.  The \textit{operating system} is the kernel
+plus some support programs which allow the user to control the system.
+
+On a system where multiple programs all think they ``own'' the computer, there
+is another problem: if more than one program tries to access the same device,
+it is very likely that at least one of them, and probably both, will fail.  For
+this reason, \textit{device drivers} on a multitasking system must not only
+allow the device to be controlled, but they must also make sure that concurrent
+access doesn't fail.  The simplest way to achieve this is simply to disallow
+it (let all operations fail that don't come from the first program using the
+driver).  A better way, if the device can handle it, is to somehow make sure
+that both work.
+
+There is one problem with cooperative multitasking: when one program crashes,
+or for some other reason doesn't return control to the system, the other
+programs stop running as well.  The solution to this is \textit{preemptive
+multitasking}.  This means that every program is interrupted every now and
+then, without asking for it, and the system switches to a different program.
+This makes the kernel slightly more complex, because it must take care to store
+every aspect of the running programs.  After all, the program doesn't expect to
+be interrupted, so it can't expect its state to change either.  This shouldn't
+be a problem though.  It's just something to remember when writing the kernel.
+
+Concluding, every modern desktop kernel uses preemptive multitasking.  This
+requires a timer interrupt.  The operating system consists of this kernel, plus
+the support programs that allow the user to control the system.
+
+\subsection{Microkernel}
+Most modern kernels are so-called \textit{monolithic} kernels: they include
+most of the operating system.  In particular, they include the device drivers.
+This is useful, because the device drivers need special attention anyway, and
+they are very kernel-specific.  Modern processors allow the kernel to protect
+access to the hardware, so that programs can't interfere with each other.  A
+device driver which doesn't properly ask the kernel will simply not be allowed
+to control the device.
+
+However, adding device drivers and everything that comes with them
+(filesystems, for example) to the kernel makes it a very large program.
+Furthermore, it makes it an ever-changing program: as new devices are built,
+new drivers must be added.  Such a program can never become stable and
+bug-free.
+
+Conceptually much nicer is the microkernel.  It includes the minimum that is
+needed for a kernel, and nothing more.  It does include task switching and some
+mehtod for tasks to communicate with each other.  It also ``handles'' hardware
+interrupts, but all it really does is passing them to the device driver, which
+is mostly a normal program.  Some microkernels don't do memory manangement
+(deciding which programs get how much and which memory), while others do.
+
+The drawback of a microkernel is that it requires much more communication
+between tasks.  Where a monolithic kernel can serve a driver request from a
+task directly, a microkernel must pass it to a device driver.  Usually there
+will be an answer, which must be passed back to the task.  This means more task
+switches.  This doesn't need to be a big problem, if task switching is
+optimized: because of the simpler structure of the microkernel, it can be much
+faster at this than a monolithic kernel.  And even if the end result is
+slightly slower, in my opinion the stability is still enough reason to prefer a
+microkernel over a monolitic one.
+
+Summarizing, a microkernel needs to do task switching and inter-process
+communication.  Because mapping memory into an address space is closely related
+to task switching, it is possible to include memory management as well.  The
+kernel must accept hardware interrupts, but doesn't handle them (except the
+timer interrupt).
+
+\subsection{Capabilities}
+Above I explained that the kernel must allow processes to communicate.  Many
+systems allow communication through the filesystem: one process writes to a
+file, and an other process reads from it.  This implies that any process can
+communicate with any other process, if they only have a place to write in the
+filesystem, where the other can read.
+
+This is a problem because of security.  If a process cannot communicate with
+any part of the system, except the parts that it really needs to perform its
+operation, it cannot leak or damage the other parts of the system either.  The
+reason that this is relevant is not that users will run programs that try to
+ruin their system (although this may happen as well), but that programs may
+break and damage random parts of the system, or be taken over by crackers.  If
+the broken or malicious process has fewer rights, it will also do less damage
+to the system.
+
+This leads to the goal of giving each process as little rights as possible.
+For this, it is best to have rights in a very fine-grained way.  Every
+operation of a driver (be it a hardware device driver, or just a shared program
+such as a file system) should have its own key, which can be given out without
+giving keys to the entire driver (or even multiple drivers).  Such a key is
+called a capability.
+
+Some operations are performed directly on the kernel itself.  For those, the
+kernel can provide its own capabilities.  Processes can create their own
+objects which can receive capability calls, and capabilities for those can be
+generated by them.  Processes can copy capabilities to other processes, if they
+have a channel to send them (using an existing capability).  This way, any
+operation of the process with the external world goes through a capability, and
+only one system call is needed, namely \textit{invoke}.
+
+This has a very nice side-effect, namely that it becomes very easy to tap
+communication of a task you control.  This means that a user can redirect
+certain requests from programs which don't do exactly what is desired to do
+nicer things.  For example, a program can be prevented from opening pop-up
+windows.  In other words, it puts control of the computer from the programmer
+into the hands of the user (as far as allowed by the system administrator).
+This is a very good thing.
+
+\section{Kernel objects}
+This section describes all the kernel objects, and the operations that can be
+performed on them.
+
+\subsection{Memory}
+A memory object is a container for storing things.  All objects live inside a
+memory object.  A memory object can contain other memory objects, capabilities,
+receivers, threads and pages.
+
+A memory object is also an address space.  Pages can be mapped (and unmapped).
+Any Thread in a memory object uses this address space while it is running.
+
+Every memory object has a limit.  When this limit is reached, no more pages can
+be allocated for it (including pages which it uses to store other objects).
+Using a new page in a memory object implies using it in all ancestor memory
+objects.  This means that setting a limit which is higher than the parent's
+limit means that the parent's limit applies anyway.
+
+Operations on memory objects:
 \begin{itemize}
-\item There's a \textbf{Jz4730} chip inside, which implements most
-functionality.  It has a mips core, an OHCI USB host controller (so no USB2),
-an AC97 audio device, a TFT display controller, an SD card reader, a network
-device, and lots of general purpose I/O pins, which are used for the LEDs and
-the keyboard.  There are also two PWM outputs, one of which seems to be used
-with the display.  It also has some other features, such as a digital camera
-controller, which are not used in the design.
-\item There's a separate 4-port USB hub inside.
-\item There's a serial port which is accessible with a tiny connector inside
-the battery compartiment.  It uses TTL signals, so to use it with a PC serial
-port, the signals must be converted with a MAX232.  That is normal for these
-boards, so I already have one handy.  The main problem in this case is that the
-connector is an unusual one, so it may take some time until I can actually
-connect things to the serial port.
+\item
 \end{itemize}

-First problem is how to write code which can be booted.  This seems easy: put a
-file named \textbf{uimage} on the first partition on an SD card, which must be
-formatted FAT or ext3, and hold down Fn, left shift and left control while
-booting.  The partition must also not be larger than 32 MB.
+\subsection{Page}
+A page can be used to store user data.  It can be mapped into an address space (a memory object).  Threads can then use the data directly.

-The boot program is u-boot, which has good documentation on the web.  Also,
-there is a Debian package named uboot-mkimage, which has the mkimage executable
-to create images that can be booted using u-boot.  uimage should be in this
-format.
+A page has no operations of itself; mapping a page is achieved using an
+operation on a memory object.

-To understand at least something of addresses, it's important to understand the
-memory model of the mips architecture:
+\subsection{Receiver}
+A receiver object is used for inter-process communication.  Capabilities can be
+created from it.  When those are invoked, the receiver can be used to retrieve
+the message.
+
+Operations on receiver objects:
 \begin{itemize}
-\item usermode code will never reference anything in the upper half of the memory (above 0x80000000).  If it does, it receives a segmentation fault.
-\item access in the lower half is paged and can be cached.  This is called
-kuseg when used from kernel code.  It will access the same pages as non-kernel
-code finds there.
-\item the upper half is divided in 3 segments.
-\item kseg0 runs from 0x80000000 to 0xa0000000.  Access to this memory will
-access physical memory from 0x00000000 to 0x20000000.  It is cached, but not
-mapped (meaning it accesses physical, not virtual, memory)
-\item kseg1 runs from 0xa0000000 to 0xc0000000.  It is identical to kseg0,
-except that is is not cached.
-\item kseg2 runs from 0xc0000000 to the top.  It is mapped like user memory,
-differently for each process, and can be cached.  It is intended for
-per-address space kernel structures.  I shall not use it in my kernel.
+\item
 \end{itemize}
-U-boot has some standard commands.  It can load the image from the SD card at
-0x80600000.  Even though the Linux image seems to use a different address, I'll
-go with this one for now.

-\section{Cross-compiler}
-Next thing to do is build a cross-compiler so it is possible to try out some
-things.  This shouldn't need to be very complex, but it is.  I wrote a separate
-document about how to do this.  Please read that if you don't have a working
-cross-compiler, or if you would like to install libraries for cross-building
-more easily.
+\subsection{Capability}
+A capability object can be invoked to send a message to a receiver or the
+kernel.  The owner cannot see from the capability where it points.  This is
+important, because the user must be able to substitute the capability for a
+different one, without the program noticing.

-\section{Making things run}
-For loading a program, it must be a binary executable with a header.  The
-header is inserted by mkimage.  It needs a load address and an entry point.
-Initially at least, the load address is 0x80600000.  The entry point must be
-computed from the executable.  The easiest way to do this is by making sure
-that it is the first byte in the executable.  The file can then be linked as
-binary, so without any headers.  This is done by giving the
-\verb+--oformat binary+ switch to ld.  I think the image is loaded without the
-header, so that can be completely ignored while building.  However, it might
-include it.  In that case, the entry point should be 0x40 higher, because
-that's the size of the header.
+Operations or capability objects:
+\begin{itemize}
+\item
+\end{itemize}

-\section{The first version of the kernel}
-This sounds better than it is.  The first version will be able to boot, and
-somehow show that it did that.  Not too impressive at all, and certainly not
-usable.  It is meant to find out if everything I wrote above actually works.
+\subsection{Thread}
+Thread objects hold the information about the current state of a thread.  This
+state is used to continue running the thread.  The address space is used to map
+the memory for the thread.  Different threads in the same address space have
+the same memory mapping.  All threads in one address space (often just one)
+together are called a process.

-For this kernel I need several things: a program which can boot, and a way to
-tell the user.  As the way to tell the user, I decided to use the caps-lock
-LED.  The display is quite complex to program, I suppose, so I won't even try
-at this stage.  The LED should be easy.  Especially because Linux can use it
-too.  I copied the code from the Linux kernel patch that seemed to be about the
-LED, and that gave me the macros \verb+__gpio_as_output+, \verb+__gpio_set_pin+
-and \verb+__gpio_clear_pin+.  And of course there's \verb+CAPSLOCKLED_IO+,
-which is the pin to set or clear.
-
-I used these macros in a function I called \verb+kernel_entry+.  In an endless
-loop, it switches the LED on 1000000 times, then off 1000000 times.  If the
-time required to set the led is in the order of microseconds, the LED should be
-blinking in the order of seconds.  I tried with 1000 first, but that left the
-LED on seemingly permanently, so it was appearantly way too fast.
-
-This is the code I want to run, but it isn't quite ready for that yet.  A C
-function needs to have a stack when it is called.  It is possible that u-boot
-provides one, but it may also not do that.  To be sure, it's best to use some
-assembly as the real entry point, which sets up the stack and calls the
-function.
-
-The symbol that ld will use as its entry point must be called \verb+__start+
-(on some other architectures with just one underscore).  So I created a simple
-assembly file which defines some stack space and does the setting up.  It also
-sets \$gp to the so-called \textit{global offset table}, and clears the .bss
-section.  This is needed to make compiler-generated code run properly.
-
-Now how to build the image file?  This is a problem.  The ELF format allows
-paged memory, which means that simply loading the file may not put everything
-at its proper address.  ld has an option for this, \verb+--omagic+.  This is
-meant for the a.out format, which isn't supported by mipsel binutils, but that
-doesn't matter.  The result is still that the .text section (with the
-executable code) is first in the file, immediately followed by the .data
-section.  So that means that loading the file into memory at the right address
-results in all parts of the file in the proper place.  Adding
-\verb+-Ttext 0x80600000+ makes everything right.  However, the result is still
-an ELF file.  So I use objcopy with \verb+-Obinary+ to create a binary file
-from it.  At this point, I also extract the start address (the location of
-\verb+__start+) from the ELF file, and use that for building uimage.  That
-way it is no longer needed that \_\_start is at the first byte of the file.
-
-Booting from the SD card is as easy as it seemed, except that I first tried an
-mmc card (which fits in the same slot, and usually works when SD is accepted)
-and that didn't work.  So you really need an SD card.
-
-\section{Context switching}
-One very central thing in the kernel is context switching.  That is, we need to
-know how the registers and the memory are organized when a user program is
-running.  In order to understand that, we must know how paging is done.  I
-already found that it is done by coprocessor 0, so now I need to find out how
-that works.
-
-On the net I found the \textit{MIPS32 architecture for developers}, version 3
-of which is sub-titled \textit{the MIPS32 priviledged resource architecture}.
-It explains everything there is to know about things which are not accessible
-from normal programs.  In other words, it is exactly the right book for
-programming a kernel or device driver using this processor.  How nice.
-
-It explains that memory accesses to the lower 2GB are (almost always) mapped
-through a TLB (translation lookaside buffer).  This is an array of some records
-where virtual to physical address mappings are stored.  In case of a TLB-miss
-(the virtual address cannot be found in the table), an exception is generated
-and the kernel must insert the mapping into the TLB.
-
-This is very flexible, because I get to decide how I write the kernel.  I shall
-use something similar to the hardware implementation of the IBM PC: a page
-directory which contains links to page tables, with each page table filled with
-pointers to page information.  It is useful to have a direct mapping from
-virtual address to kernel data as well.  There are several ways how this can be
-achieved.  The two simplest ones each have their own drawback: making a shadow
-page directory with shadow page tables with links to the kernel structures
-instead of the pages wastes some memory.  Using only the shadow, and doing a
-lookup of the physical address in the kernel structure (where it must be stored
-anyway) wastes some cpu time during the lookup.  At this moment I do not know
-what is more expensive.  I'll initially go for the cpu time wasting approach.
-
-\section{Kernel entry}
-Now that I have an idea of how a process looks in memory, I need to implement
-kernel entry and exit.  A process is preempted or makes a request, then the
-kernel responds, and then a process (possibly the same) is started again.
-
-The main problem of kernel entry is to save all registers in the kernel
-structure which is associated with the thread.  In case of the MIPS processor,
-there is a simple solution: there are two registers, k0 and k1, which cannot be
-used by the thread.  So they can be set before starting the thread, and will
-still have their values when the kernel is entered again.  By pointing one of
-them to the place to save the data, it becomes easy to perform the save and
-restore.
-
-As with the bootstrap process, this must be done in assembly.  In this case
-this is because the user stack must not be used, and a C function will use the
-current stack.  It will also mess up some registers before you can save them.
-
-The next problem is how to get the interrupt code at its address.  I'll try to
-load the thing at address 0x80000000.  It seems to work, which is good.  Linux
-probably has some reason to do things differently, but if this works, it is the
-easiest way.
-
-\section{Memory organization}
-Now I've reached the point where I need to create some memory structures.  To
-do that, I first need to decide how to organize the memory.  There's one very
-simple rule in my system: everyone must pay for what they use.  For memory,
-this means that a process brings its own memory where the kernel can write
-things about it.  The kernel does not need its own allocation system, because
-it always works for some process.  If the process doesn't provide the memory,
-the operation will fail.
-
-Memory will be organized hierarchically.  It belongs to a container, which I
-shall call \textit{memory}.  The entire memory is the property of another
-memory, its parent.  This is true for all but one, which is the top level
-memory.  The top level memory owns all memory in the system.  Some of it
-directly, most of it through other memories.
-
-The kernel will have a list of unclaimed pages.  For optimization, it actually
-has two lists: one with pages containing only zeroes, one with pages containing
-junk.  When idle, the junk pages can be filled with zeroes.
-
-Because the kernel starts at address 0, building up the list of pages is very
-easy: starting from the first page above the top of the kernel, everything is
-free space.  Initially, all pages are added to the junk list.
-
-\section{The idle task}
-When there is nothing to do, an endless loop should be waiting for interrupts.
-This loop is called the idle task.  I use it also to exit bootstrapping, by
-enabling interrupts after everything is set up as if we're running the idle
-task, and then jumping to it.
-
-There are two options for the idle task, again with their own drawbacks.  The
-idle task can run in kernel mode.  This is easy, it doesn't need any paging
-machinery then.  However, this means that the kernel must read-modify-write the
-status register of coprocessor 0, which contains the operating mode, on every
-context switch.  That's quite an expensive operation for such a critical path.
-
-The other option is to run it in user mode.  The drawback there is that it
-needs a page directory and a page table.  However, since the code is completely
-trusted, it may be possible to sneak that in through some unused space between
-two interrupt handlers.  That means there's no fault when accessing some memory
-owned by others, but the idle task is so trivial that it can be assumed to run
-without affecting them.
-
-\section{Intermezzo: some problems}
-Some problems came up while working.  First, I found that the code sometimes
-didn't work and sometimes it did.  It seemed that it had problems when the
-functions I called became more complex.  Looking at the disassembly, it appears
-that I didn't fully understand the calling convention used by the compiler.
-Appearantly, it always needs to have register t9 set to the called function.
-In all compiled code, functions are called as \verb+jalr $t9+.  It took quite
-some time to figure this out, but setting t9 to the called function in my
-assembly code does indeed solve the problem.
-
-The other problem is that the machine was still doing unexpected things.
-Appearantly, u-boot enables interrupts and handles them.  This is not very nice
-when I'm busy setting up interrupt handlers.  So before doing anything else, I
-first switch off all interrupts by writing 0 to the status register of CP0.
-
-This also reminded me that I need to flush the cache, so that I can be sure
-everything is correct.  For that reason, I need to start at 0xa0000000, not
-0x80000000, so that the startup code is not cached.  It should be fine to load
-the kernel at 0x80000000, but jump in at the non-cached location anyway, if I
-make sure the initial code, which clears the cache, can handle it.  After that,
-I jump to the cached region, and everything should be fine.  However, at this
-moment I first link the kernel at the non-cached address, so I don't need to
-worry about it.
-
-Finally, I read in the books that k0 and k1 are in fact normal general purpose
-registers.  So while they are by convention used for kernel purposes, and
-compilers will likely not touch them.  However, the kernel can't actually rely
-on them not being changed by user code.  So I'll need to use a different
-approach for saving the processor state.  The solution is trivial: use k1 as
-before, but first load it from a fixed memory location.  To be able to store k1
-itself, a page must be mapped in kseg3 (wired into the tlb), which can then be
-accessed with a negative index to \$zero.
-
-At this point, I was completely startled by crashes depending on seemingly
-irrelevant changes.  After a lot of investigation, I saw that I had forgotten
-that mips jumps have a delay slot, which is executed after the jump, before the
-first new instruction is executed.  I was executing random instructions, which
-lead to random behaviour.
-
-\section{Back to the idle task}
-With all this out of the way, I continued to implement the idle task.  I hoped
-to be able to never write to the status register.  However, this is not
-possible.  The idle task must be in user mode, and it must call wait.  That
-means it needs the coprocessor 0 usable bit set.  This bit may not be set for
-normal processes, however, or they would be able to change the tlb and all
-protection would be lost.  However, writing to the status register is not a
-problem.  First of all, it is only needed during a task switch, and they aren't
-as frequent as context switches (every entry to the kernel is a context switch,
-only when a different task is entered from the kernel than exited to the kernel
-is it a task switch).  Furthermore, and more importantly, coprocessor 0 is
-intgrated into the cpu, and writing to it is actually a very fast operation and
-not something to be avoided at all.
-
-So to switch to user mode, I set up the status register so that it looks like
-it's handling an exception, set EPC to the address of the idle task, and use
-eret to ``return'' to it.
-
-\section{Timer interrupts}
-This worked well.  Now I expected to get a timer interrupt soon after jumping
-to the idle task.  After all, I have set up the compare register, the timer
-should be running and I enabled the interrupts.  However, nothing happened.  I
-looked at the contents of the count register, and found that it was 0.  This
-means that it is not actually counting at all.  Looking at the Linux sources,
-they don't use this timer either, but instead use the cpu-external (but
-integrated in the chip) timer.  The documentation says that they have a
-different reason for this than a non-functional cpu timer.  Still, it means it
-can be used as an alternative.
-
-Having a timer is important for preemptive multitasking: a process needs to be
-interrupted in order to be preempted, so there needs to be a periodic interrupt
-source.
+Operations on thread objects:
+\begin{itemize}
+\item
+\end{itemize}

 \end{document}