This file describes the kernel architecture. It does no go into detail on all the fields of structs; for that, refer to the source code. # Overview Iris is an operating system. The kernel should be called "the Iris kernel", but sometimes it is simply called "Iris". If there can be confusion, the terms "kernel" and "userspace" are used to clarify. Iris uses a capability based microkernel. Being a microkernel means that most parts that would be part of a monolithic kernel are not part of the Iris kernel, but of the Iris userspace. Being capability based means that there is no public dictionary of running processes; in order to communicate with another process, the caller must have received a capability to them. # First class objects First class objects are implemented by the kernel. They can be used through capabilities. - Cap: a single capability. Can be invoked and passed to others. - Caps: storage container for a fixed number of Cap objects. Every Thread has at least one of these so it can communicate with the kernel, its parent, and other processes. - Receiver: an object that allows to create Cap objects. When those are invoked, the Receiver's listener receives the message. - Thread: an execution context. On creation, a number of slots is specified and space is reserved for that many Caps pointers. Only Cap objects in those Caps can be invoked from the thread. - Page: a single page of memory, always 4kB. A Page can be mapped in a Memory and then accessed by a Thread. - Memory: Everything[*] needs a Memory object to be stored in. In addition to storing first class objects, a Memory can own Page objects and map them. A mapped page is accessible for running Threads that stored in the Memory. - List: Helper for implementing a list of Cap objects, which are stored by the caller. A List allows servers to keep a list of clients without paying for its storage. This prevents a denial of service attack. Each item is stored with a code that is set and only accessible by the List owner. - ListItem: an item in a List object. [*] There is of course one exception to the rule that everything is stored in a Memory. Everything is a tree, with Memory objects as nodes and all other objects as leaves. The root of the tree is not stored in anything. This node is called the "top Memory". Example: A new process consists of a Memory with one or more mapped Page objects that hold the code and data, a Receiver, a Thread, and a Caps that contains a Cap for each of those objects, plus one for its parent process. That Caps is stored in slot 0. Note that the kernel provides system calls through capabilities. If a thread doesn't hold the capability, it cannot make the system call. The parent Cap is used to request access to other processes, or devices. The Thread has no way to know if it is talking to the thing it requested, or something that simulates it. That is intentional; Threads should not be able to detect that they are being debugged. # Capability invocations When a Cap is invoked, a message is sent to the Receiver that created it (or, if it was created by the kernel, to the kernel). This message contains three 64 bit numbers (which are usually treated as two 32 bit numbers each) and two Cap objects. Two of the numbers, named d0 and d1, are passed with the invocation, the third one is named protected_data and is defined when the Cap is created. The owner of the Cap cannot see or change protected_data; it is the target's way of recognizing who's sending the message. The Cap objects in the message are called arg and reply. By convention, a call that requires a reply passes a Cap for it, which will be invoked with the reply. However, this is only a convention; if a program wants, it can use both arg and reply as regular arguments if no reply is required. Normally a Caps is passed in arg if more than one Cap should be sent though. Cap objects can be passed around. The target of the invocation cannot see if the original recipient is calling, or some other process that was given access. The Receiver does allow to revoke a Cap; after this, any invocation no longer sends a message to the Receiver. When sending a Cap, a flag specifies whether it is mapped (the default), or copied. A mapped Cap is revoked when its source is revoked; a copy is not. To give a Cap to another process and then drop it, it must be copied. Otherwise the new Cap is immediately revoked. # Interrupts Interrupts are handled by one or a few interrupt handlers. In a microkernel, it would be ideal to let userspace handle them, but that is not reasonable given the hardware architecture. However, it is possible for the kernel to find out who should handle it, and then pass it to userspace. In Linux-terms: the top half is in kernel space, but the bottom half is not. (Note that those two halves are highly asymmetrical; the top half is very small, the bottom half can be very large.) So this is what Iris does. A process can register as an interrupt handler, the kernel masks the interrupt when it arrives, so it isn't immediately triggered again, enables all interrupts and sends a message to the registered process. It will normally clear the interrupt condition and reregister itself as the interrupt handler. The reregistration is required to avoid queueing of interrupts; if they are not reregistered, they are no longer handled. # Userspace When the system boots, the kernel is started with its first process. This process sets up userspace. Unlike Linux init systems, the first process does not continue running; it is hard to change (because the filesystem is not yet accessible) and so it must be as simple as possible. As part of the startup, drivers for built in devices are started. These are regular userspace programs, most of them handle interrupts and all of them have access to memory mapped I/O. Note that this means they are just as critical as the kernel; in a monolithical system, only the kernel needs to be ultimately trusted (if it is compromised, all is lost). With a microkernel, it's both the kernel and some parts of userspace. The total amount of trusted code is likely smaller in a microkernel design, because it is easier to split parts that don't need to be critical into their own process. A user session is a process which can start other processes and switch between them. For this, it contains the following components: - A bag of device Cap objects, which can be mapped to the active process (and revoked when they are deactivated). What's in the bag can change. For example, if the user wants sound to continue playing while switching to another user, the sound Cap must not be in the bag. - An interface for task switching: when the user makes a system request (which is some dedicated hardware, such as a button), the active process is deactivated and the session itself (or a designated helper) is activated. It allows switching to a different process, or starting a new one, or stopping or ending running processes. The session can also allow communication between certain processes. (The processes need to cooperate to actually make the link; they ask the session for a link of a certain type (for example, a file system) and the session responds with the Cap or an error. - There is a list of things that can be started; an important one is a shell, which allows control over the session. In other words, the shell is able to start and end other processes, make and break communication links, and define which programs can be started. # Multi user support For multi user support, a login manager is required which can start user sessions and switch between them. This is very similar to what a user session does, and so the same process is used for it. Just a few changes are required, and those can be implemented by choosing different helper programs. The login manager lets the user select an identity to log in. The login program itself is run by the user session, so that users can change the way they log in without asking the administrator to set it up for them. For example, one user may set up to only allow logging in with a physical crypto device, while a guest login may be set up that doesn't require credentials at all.