diff --git a/doc/kernel.txt b/doc/kernel.txt new file mode 100644 index 0000000..d7f2771 --- /dev/null +++ b/doc/kernel.txt @@ -0,0 +1,151 @@ +This file describes the kernel architecture. It does no go into detail on all +the fields of structs; for that, refer to the source code. + +# Overview + +Iris is an operating system. The kernel should be called "the Iris kernel", +but sometimes it is simply called "Iris". If there can be confusion, the terms +"kernel" and "userspace" are used to clarify. + +Iris uses a capability based microkernel. Being a microkernel means that most +parts that would be part of a monolithic kernel are not part of the Iris +kernel, but of the Iris userspace. Being capability based means that there is +no public dictionary of running processes; in order to communicate with another +process, the caller must have received a capability to them. + + +# First class objects + +First class objects are implemented by the kernel. They can be used through +capabilities. + +- Cap: a single capability. Can be invoked and passed to others. +- Caps: storage container for a fixed number of Cap objects. Every Thread has + at least one of these so it can communicate with the kernel, its parent, and + other processes. +- Receiver: an object that allows to create Cap objects. When those are + invoked, the Receiver's listener receives the message. +- Thread: an execution context. On creation, a number of slots is specified + and space is reserved for that many Caps pointers. Only Cap objects in those + Caps can be invoked from the thread. +- Page: a single page of memory, always 4kB. A Page can be mapped in a Memory + and then accessed by a Thread. +- Memory: Everything[*] needs a Memory object to be stored in. In addition to + storing first class objects, a Memory can own Page objects and map them. A + mapped page is accessible for running Threads that stored in the Memory. +- List: Helper for implementing a list of Cap objects, which are stored by the + caller. A List allows servers to keep a list of clients without paying for + its storage. This prevents a denial of service attack. Each item is stored + with a code that is set and only accessible by the List owner. +- ListItem: an item in a List object. + +[*] There is of course one exception to the rule that everything is stored in a + Memory. Everything is a tree, with Memory objects as nodes and all other + objects as leaves. The root of the tree is not stored in anything. This + node is called the "top Memory". + +Example: A new process consists of a Memory with one or more mapped Page + objects that hold the code and data, a Receiver, a Thread, and a Caps + that contains a Cap for each of those objects, plus one for its parent + process. That Caps is stored in slot 0. + +Note that the kernel provides system calls through capabilities. If a thread +doesn't hold the capability, it cannot make the system call. The parent Cap is +used to request access to other processes, or devices. The Thread has no way +to know if it is talking to the thing it requested, or something that simulates +it. That is intentional; Threads should not be able to detect that they are +being debugged. + + +# Capability invocations + +When a Cap is invoked, a message is sent to the Receiver that created it (or, +if it was created by the kernel, to the kernel). This message contains three +64 bit numbers (which are usually treated as two 32 bit numbers each) and two +Cap objects. Two of the numbers, named d0 and d1, are passed with the +invocation, the third one is named protected_data and is defined when the Cap +is created. The owner of the Cap cannot see or change protected_data; it is +the target's way of recognizing who's sending the message. + +The Cap objects in the message are called arg and reply. By convention, a call +that requires a reply passes a Cap for it, which will be invoked with the +reply. However, this is only a convention; if a program wants, it can use both +arg and reply as regular arguments if no reply is required. Normally a Caps is +passed in arg if more than one Cap should be sent though. + +Cap objects can be passed around. The target of the invocation cannot see if +the original recipient is calling, or some other process that was given access. +The Receiver does allow to revoke a Cap; after this, any invocation no longer +sends a message to the Receiver. When sending a Cap, a flag specifies whether +it is mapped (the default), or copied. A mapped Cap is revoked when its source +is revoked; a copy is not. To give a Cap to another process and then drop it, +it must be copied. Otherwise the new Cap is immediately revoked. + + +# Interrupts + +Interrupts are handled by one or a few interrupt handlers. In a microkernel, +it would be ideal to let userspace handle them, but that is not reasonable +given the hardware architecture. However, it is possible for the kernel to +find out who should handle it, and then pass it to userspace. In Linux-terms: +the top half is in kernel space, but the bottom half is not. (Note that those +two halves are highly asymmetrical; the top half is very small, the bottom half +can be very large.) So this is what Iris does. A process can register as an +interrupt handler, the kernel masks the interrupt when it arrives, so it isn't +immediately triggered again, enables all interrupts and sends a message to the +registered process. It will normally clear the interrupt condition and +reregister itself as the interrupt handler. The reregistration is required to +avoid queueing of interrupts; if they are not reregistered, they are no longer +handled. + + +# Userspace + +When the system boots, the kernel is started with its first process. This +process sets up userspace. Unlike Linux init systems, the first process does +not continue running; it is hard to change (because the filesystem is not yet +accessible) and so it must be as simple as possible. + +As part of the startup, drivers for built in devices are started. These are +regular userspace programs, most of them handle interrupts and all of them have +access to memory mapped I/O. Note that this means they are just as critical as +the kernel; in a monolithical system, only the kernel needs to be ultimately +trusted (if it is compromised, all is lost). With a microkernel, it's both the +kernel and some parts of userspace. The total amount of trusted code is likely +smaller in a microkernel design, because it is easier to split parts that don't +need to be critical into their own process. + +A user session is a process which can start other processes and switch between +them. For this, it contains the following components: + +- A bag of device Cap objects, which can be mapped to the active process (and + revoked when they are deactivated). What's in the bag can change. For + example, if the user wants sound to continue playing while switching to + another user, the sound Cap must not be in the bag. +- An interface for task switching: when the user makes a system request (which + is some dedicated hardware, such as a button), the active process is + deactivated and the session itself (or a designated helper) is activated. It + allows switching to a different process, or starting a new one, or stopping + or ending running processes. The session can also allow communication + between certain processes. (The processes need to cooperate to actually make + the link; they ask the session for a link of a certain type (for example, a + file system) and the session responds with the Cap or an error. +- There is a list of things that can be started; an important one is a shell, + which allows control over the session. In other words, the shell is able to + start and end other processes, make and break communication links, and define + which programs can be started. + + +# Multi user support + +For multi user support, a login manager is required which can start user +sessions and switch between them. This is very similar to what a user session +does, and so the same process is used for it. Just a few changes are required, +and those can be implemented by choosing different helper programs. + +The login manager lets the user select an identity to log in. The login +program itself is run by the user session, so that users can change the way +they log in without asking the administrator to set it up for them. For +example, one user may set up to only allow logging in with a physical crypto +device, while a guest login may be set up that doesn't require credentials at +all.