iris/doc/kernel.txt

This file describes the kernel architecture.  It does no go into detail on all
the fields of structs; for that, refer to the source code.

# Overview

Iris is an operating system.  The kernel should be called "the Iris kernel",
but sometimes it is simply called "Iris".  If there can be confusion, the terms
"kernel" and "userspace" are used to clarify.

Iris uses a capability based microkernel.  Being a microkernel means that most
parts that would be part of a monolithic kernel are not part of the Iris
kernel, but of the Iris userspace.  Being capability based means that there is
no public dictionary of running processes; in order to communicate with another
process, the caller must have received a capability to them.


# First class objects

First class objects are implemented by the kernel.  They can be used through
capabilities.

- Cap: a single capability.  Can be invoked and passed to others.
- Caps: storage container for a fixed number of Cap objects.  Every Thread has
  at least one of these so it can communicate with the kernel, its parent, and
  other processes.
- Receiver: an object that allows to create Cap objects.  When those are
  invoked, the Receiver's listener receives the message.
- Thread: an execution context.  On creation, a number of slots is specified
  and space is reserved for that many Caps pointers.  Only Cap objects in those
  Caps can be invoked from the thread.
- Page: a single page of memory, always 4kB.  A Page can be mapped in a Memory
  and then accessed by a Thread.
- Memory: Everything[*] needs a Memory object to be stored in.  In addition to
  storing first class objects, a Memory can own Page objects and map them.  A
  mapped page is accessible for running Threads that stored in the Memory.
- List: Helper for implementing a list of Cap objects, which are stored by the
  caller.  A List allows servers to keep a list of clients without paying for
  its storage.  This prevents a denial of service attack.  Each item is stored
  with a code that is set and only accessible by the List owner.
- ListItem: an item in a List object.

[*] There is of course one exception to the rule that everything is stored in a
    Memory.  Everything is a tree, with Memory objects as nodes and all other
    objects as leaves.  The root of the tree is not stored in anything.  This
    node is called the "top Memory".

Example: A new process consists of a Memory with one or more mapped Page
	objects that hold the code and data, a Receiver, a Thread, and a Caps
	that contains a Cap for each of those objects, plus one for its parent
	process.  That Caps is stored in slot 0.

Note that the kernel provides system calls through capabilities.  If a thread
doesn't hold the capability, it cannot make the system call.  The parent Cap is
used to request access to other processes, or devices.  The Thread has no way
to know if it is talking to the thing it requested, or something that simulates
it.  That is intentional; Threads should not be able to detect that they are
being debugged.


# Capability invocations

When a Cap is invoked, a message is sent to the Receiver that created it (or,
if it was created by the kernel, to the kernel).  This message contains three
64 bit numbers (which are usually treated as two 32 bit numbers each) and two
Cap objects.  Two of the numbers, named d0 and d1, are passed with the
invocation, the third one is named protected_data and is defined when the Cap
is created.  The owner of the Cap cannot see or change protected_data; it is
the target's way of recognizing who's sending the message.

The Cap objects in the message are called arg and reply.  By convention, a call
that requires a reply passes a Cap for it, which will be invoked with the
reply.  However, this is only a convention; if a program wants, it can use both
arg and reply as regular arguments if no reply is required.  Normally a Caps is
passed in arg if more than one Cap should be sent though.

Cap objects can be passed around.  The target of the invocation cannot see if
the original recipient is calling, or some other process that was given access.
The Receiver does allow to revoke a Cap; after this, any invocation no longer
sends a message to the Receiver.  When sending a Cap, a flag specifies whether
it is mapped (the default), or copied.  A mapped Cap is revoked when its source
is revoked; a copy is not.  To give a Cap to another process and then drop it,
it must be copied.  Otherwise the new Cap is immediately revoked.


# Interrupts

Interrupts are handled by one or a few interrupt handlers.  In a microkernel,
it would be ideal to let userspace handle them, but that is not reasonable
given the hardware architecture.  However, it is possible for the kernel to
find out who should handle it, and then pass it to userspace.  In Linux-terms:
the top half is in kernel space, but the bottom half is not.  (Note that those
two halves are highly asymmetrical; the top half is very small, the bottom half
can be very large.)  So this is what Iris does.  A process can register as an
interrupt handler, the kernel masks the interrupt when it arrives, so it isn't
immediately triggered again, enables all interrupts and sends a message to the
registered process.  It will normally clear the interrupt condition and
reregister itself as the interrupt handler.  The reregistration is required to
avoid queueing of interrupts; if they are not reregistered, they are no longer
handled.


# Userspace

When the system boots, the kernel is started with its first process.  This
process sets up userspace.  Unlike Linux init systems, the first process does
not continue running; it is hard to change (because the filesystem is not yet
accessible) and so it must be as simple as possible.

As part of the startup, drivers for built in devices are started.  These are
regular userspace programs, most of them handle interrupts and all of them have
access to memory mapped I/O.  Note that this means they are just as critical as
the kernel; in a monolithical system, only the kernel needs to be ultimately
trusted (if it is compromised, all is lost).  With a microkernel, it's both the
kernel and some parts of userspace.  The total amount of trusted code is likely
smaller in a microkernel design, because it is easier to split parts that don't
need to be critical into their own process.

A user session is a process which can start other processes and switch between
them.  For this, it contains the following components:

- A bag of device Cap objects, which can be mapped to the active process (and
  revoked when they are deactivated).  What's in the bag can change.  For
  example, if the user wants sound to continue playing while switching to
  another user, the sound Cap must not be in the bag.
- An interface for task switching: when the user makes a system request (which
  is some dedicated hardware, such as a button), the active process is
  deactivated and the session itself (or a designated helper) is activated.  It
  allows switching to a different process, or starting a new one, or stopping
  or ending running processes.  The session can also allow communication
  between certain processes.  (The processes need to cooperate to actually make
  the link; they ask the session for a link of a certain type (for example, a
  file system) and the session responds with the Cap or an error.
- There is a list of things that can be started; an important one is a shell,
  which allows control over the session.  In other words, the shell is able to
  start and end other processes, make and break communication links, and define
  which programs can be started.


# Multi user support

For multi user support, a login manager is required which can start user
sessions and switch between them.  This is very similar to what a user session
does, and so the same process is used for it.  Just a few changes are required,
and those can be implemented by choosing different helper programs.

The login manager lets the user select an identity to log in.  The login
program itself is run by the user session, so that users can change the way
they log in without asking the administrator to set it up for them.  For
example, one user may set up to only allow logging in with a physical crypto
device, while a guest login may be set up that doesn't require credentials at
all.