iris/report/kernel.tex

% Iris: micro-kernel for a capability-based operating system.
% kernel.tex: Description of Iris.
% Copyright 2009 Bas Wijnen <wijnen@debian.org>
%
% This program is free software: you can redistribute it and/or modify
% it under the terms of the GNU General Public License as published by
% the Free Software Foundation, either version 3 of the License, or
% (at your option) any later version.
%
% This program is distributed in the hope that it will be useful,
% but WITHOUT ANY WARRANTY; without even the implied warranty of
% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
% GNU General Public License for more details.
%
% You should have received a copy of the GNU General Public License
% along with this program.  If not, see <http://www.gnu.org/licenses/>.

\documentclass{shevek}
\begin{document}
\title{Overview of Iris}
\author{Bas Wijnen}
\date{\today}
\maketitle
\begin{abstract}
This document briefly describes the inner workings of my kernel, Iris,
including the reasons for the choices that were made.  It is meant to be
understandable (with effort) for people who know nothing of operating systems.
On the other hand, it should also be readable for people who know about
computer architecture, but want to know about this kernel.  It is probably
better suited for the latter category.
\end{abstract}

\tableofcontents

\section{Operating systems}
This section describes what the purpose of an operating system is, and defines
what I call an ``operating system''\footnote{Different people use very
different definitions, so this is not as trivial as it sounds.}.  It also goes
into some detail about microkernels and capabilities.  If you already know, you
can safely skip this section.  It contains no information about Iris.

\subsection{The goal of an operating system}
In the 1980s, a computer could only run one program at a time.  When the
program had finished, the next one could be started.  This follows the
processor itself: it runs a program, from the beginning until the end, and
can't run more than one program simultaneously\footnote{Multi-core processors
technically can run multiple programs simultaneously, but I'm not talking about
those here.}.  In those days, an \textit{operating system} was the program that
allowed other programs to be started.  The best known operating systems were
called \textit{Disk operating system}, or \textit{DOS} (of which there were
several).

At some point, there was a need for programs that would ``help'' other programs
in some way.  For example, they could provide a calculator which would pop up
when the user pressed a certain key combination.  Such programs were called
\textit{terminate and stay resident} programs, or TSRs.  This name came from
the fact that they terminated, in the sense that they would allow the next
program to be run, but they would stay resident and do their job in the
background.

At some point, people wanted to de \textit{multitasking}.  That is, multiple
``real'' programs should run concurrently, not just some helpers.  The easiest
way to implement this is with \textit{cooperative multitasking}.  Every program
returns control to the system every now and then.  The system switches between
all the running programs.  The result is that every program runs for a short
time, several times per second.  For the user, this looks like the programs are
all running simultaneously, while in reality it is similar to a chess master
playing simultaneously on many boards: he really plays on one board at a time,
but switches a lot.  On such a system, the \textit{kernel} is the program that
chooses which program to run next.  The \textit{operating system} is the kernel
plus some support programs which allow the user to control the system.

On a system where multiple programs all think they ``own'' the computer, there
is another problem: if more than one program tries to access the same device,
it is very likely that at least one of them, and probably both, will fail.  For
this reason, \textit{device drivers} on a multitasking system must not only
allow the device to be controlled, but they must also make sure that concurrent
access doesn't fail.  The simplest way to achieve this is simply to disallow
it (let all operations fail that don't come from the first program using the
driver).  A better way, if the device can handle it, is to somehow make sure
that both work.

There is one problem with cooperative multitasking: when one program crashes,
or for some other reason doesn't return control to the system, the other
programs stop running as well.  The solution to this is \textit{preemptive
multitasking}.  This means that every program is interrupted every now and
then, without asking for it, and the system switches to a different program.
This makes the kernel slightly more complex, because it must take care to store
every aspect of the running programs.  After all, the program doesn't expect to
be interrupted, so it can't expect its state to change either.  This shouldn't
be a problem though.  It's just something to remember when writing the kernel.

Concluding, every modern desktop kernel uses preemptive multitasking.  This
requires a timer interrupt.  The operating system consists of this kernel, plus
the support programs that allow the user to control the system.

\subsection{Microkernel}
Most modern kernels are so-called \textit{monolithic} kernels: they include
most of the operating system.  In particular, they include the device drivers.
This is useful, because the device drivers need special attention anyway, and
they are very kernel-specific.  Modern processors allow the kernel to protect
access to the hardware, so that programs can't interfere with each other.  A
device driver which doesn't properly ask the kernel will simply not be allowed
to control the device.

However, adding device drivers and everything that comes with them
(filesystems, for example) to the kernel makes it a very large program.
Furthermore, it makes it an ever-changing program: as new devices are built,
new drivers must be added.  Such a program can never become stable and
bug-free.

Conceptually much nicer is the microkernel.  It includes the minimum that is
needed for a kernel, and nothing more.  It does include task switching and some
mehtod for tasks to communicate with each other.  It also ``handles'' hardware
interrupts, but all it really does is passing them to the device driver, which
is mostly a normal program.  Some microkernels don't do memory manangement
(deciding which programs get how much and which memory), while others do.

The drawback of a microkernel is that it requires much more communication
between tasks.  Where a monolithic kernel can serve a driver request from a
task directly, a microkernel must pass it to a device driver.  Usually there
will be an answer, which must be passed back to the task.  This means more task
switches.  This doesn't need to be a big problem, if task switching is
optimized: because of the simpler structure of the microkernel, it can be much
faster at this than a monolithic kernel.  And even if the end result is
slightly slower, in my opinion the stability is still enough reason to prefer a
microkernel over a monolitic one.

Summarizing, a microkernel needs to do task switching and inter-process
communication.  Because mapping memory into an address space is closely related
to task switching, it is possible to include memory management as well.  The
kernel must accept hardware interrupts, but doesn't handle them (except the
timer interrupt).

\subsection{Capabilities}
Above I explained that the kernel must allow processes to communicate.  Many
systems allow communication through the filesystem: one process writes to a
file, and an other process reads from it.  This implies that any process can
communicate with any other process, if they only have a place to write in the
filesystem, where the other can read.

This is a problem because of security.  If a process cannot communicate with
any part of the system, except the parts that it really needs to perform its
operation, it cannot leak or damage the other parts of the system either.  The
reason that this is relevant is not that users will run programs that try to
ruin their system (although this may happen as well), but that programs may
break and damage random parts of the system, or be taken over by
crackers\footnote{Crackers are better known by the public as ``hackers''.
However, I use this word to describe people who like to play with software (or
sometimes also with other things).  Therefore the malicious people who use
hacking skills for evil need a different name.}.  If the broken or malicious
process has fewer rights, it will also do less damage to the system.

This leads to the goal of giving each process as little rights as possible.
For this, it is best to have rights in a very fine-grained way.  Every
operation of a driver (be it a hardware device driver, or just a shared program
such as a file system) should have its own key, which can be given out without
giving keys to the entire driver (or even multiple drivers).  Such a key is
called a capability.  For example, a capability can allow the holder to access
a single file, or to use one specific network connection, or to see what keys
are typed by the user.

Some operations are performed directly on the kernel itself.  For those, the
kernel can provide its own capabilities.  Processes can create their own
objects which can receive capability calls, and capabilities for those can be
generated by them.  Processes can copy capabilities to other processes, if they
have a channel to send them (using an existing capability).  This way, any
operation of the process with the external world goes through a capability, and
only one system call is needed: \textit{invoke}.

This has a very nice side-effect, which is that it becomes very easy to tap
communication of a task you control.  This means that a user can redirect
certain requests from programs which don't do exactly what is desired to do
nicer things.  For example, a program can be prevented from opening pop-up
windows.  In other words, it puts control of the computer from the programmer
into the hands of the user (as far as allowed by the system administrator).
This is a very good thing.

\section{Communication}
This section shortly describes how communication between threads is performed
by Iris.  Below are more details about the kernel structures, this section just
explains which steps are taken.

Iris doesn't hold any state about the communication, other than the state that
it holds for threads on request of the threads (in the memory paid for by the
threads).  For Iris, there is no such thing as a \textit{conversation}.  There
are messages.  When there is a conversation, Iris just sees several messages
going both ways.  For Iris these are not connected\footnote{This is not
entirely true; Iris has call capabilities as an optimization feature.  They do
implement some conversation aspects.  But they are only an optimization: Iris
doesn't require them to be used.}.

So understanding communication between threads boils down to understanding the
transfer of a single message.  A message is short: four 32-bit words of data,
plus four capabilities.  Besides that, a 64-bit protected value is sent.  This
value is set by the creator of the capability, usually the server, and cannot
be changed by the invoker.

Sending a message between threads is mostly about a Receiver object.  The
server has a Receiver, for which it creates a capability (with the mentioned
protected data).  If a client wants to contact the server, it must get this
capability.

The client then invokes the capability with four data words and four
capabilities (possibly set to 0).  The message is queued by the receiver.  The
capabilities are stored into a Caps object.

When the server is ready for it, it queries the receiver for new messages.  It
then gets the protected data, the four data words and copies of the
capabilities.  The ones in the receiver's Caps are invalidated and can be
reused after that.  Note that it does not get a capability of the sender,
unless the sender sends it.  There is no way for the server to know who is
sending the message, only which capability was used (through the protected
data).

\section{Kernel objects}
This section describes all kernel objects of Iris, and the operations that can
be performed on them.  One operation is possible on any kernel object (except a
message and reply and call Capabilities).  This operation is \textit{degrade}.
It creates a copy of the capability with some rights removed.  This can be
useful when giving away a capability.

\subsection{Memory}
A Memory object is a container for storing things.  All objects live inside a
Memory object.  A Memory object can contain other Memory objects, Capabilities,
Receivers, Threads, Pages and Cappages.

A Memory object is also an address space.  Pages can be mapped (and unmapped).
Any Thread in a Memory object uses this address space while it is running.

Every Memory object has a limit.  When this limit is reached, no more Pages can
be allocated for it (including Pages which it uses to store other objects).
Using a new Page in a Memory object implies using it in all ancestor Memory
objects.  This means that setting a limit which is higher than the parent's
limit means that the parent's limit applies anyway.

Operations on Memory objects:
\begin{itemize}
\item Create a new item of type Receiver, Memory, Thread, Page, or Cappage.
\item Destroy an item of any type, which is owned by the Memory.
\item List items owned by the Memory, Pages mapped in it, and messages in owned
Receiver's queues.
\item Map a Page at an address.
\item Get the Page which is mapped at a certain address.
\item Get and set the limit, which is checked when allocating pages for this
Memory or any sub-structure.
\item Drop a capability.  This can only be done by Threads owned by the Memory,
because only they can present capabilities owned by it.\footnote{Iris checks if
presented capabilities are owned by the Thread's Memory.  If they aren't, no
capability is passed instead.  The destroy operation destroys an object that a
capability points to.  Drop destroys the capability itself.  If a Thread from
an other Memory would try to drop a capability, Iris would refuse to send it in
the message, or it would not be dropped because it would be owned by a
different Memory.}
\end{itemize}

\subsection{Receiver}
A receiver object is used for inter-process communication.  Capabilities can be
created from it.  When those are invoked, the receiver can be used to retrieve
the message.

Operations on Receiver objects:
\begin{itemize}
\item Set the owner.  This is the Thread that messages will be sent to when
they arrive.  Messages are stored in the receiver until the owner is ready to
accept them.  If it is waiting while the message arrives, it is immediately
delivered.
\item Create a capability.  The new capability should be given to Threads who
need to send a message to the receiver.
\item Create a call capability.  This is an optimization.  Because
\textit{calls} happen a lot, where a capability is created, sent in a message,
then a reply is sent over this new capability, and then it is dropped.  This
can be done using a call capability.  The call capability is invoked instead of
the target, and the target is specified where the reply capability should be.
The message is sent to the call capability (which is handled by the Receiver in
the kernel).  It creates a new reply capability and sends the message to the
target with it.  When the reply capability is invoked, the message is sent to
the owner, and the capability is dropped.  This approach reduces the number of
kernel calls from four (create, call, reply, drop) to two (call, reply).
\end{itemize}

\subsection{Thread}
Thread objects hold the information about the current state of a thread.  This
state is used to continue running the thread.  The address space is used to map
the memory for the Thread.  Different Threads in the same address space have
the same memory mapping.  All Threads in one address space (often there is only
one) together are called a process.

Because all threads have a capability to their own Thread object (for claiming
Receivers), this is also used to make some calls which don't actually need an
object.  The reason that these are not operations on some fake object which
every process implicitly owns, is that for debugging it may be useful to see
every action of a process.  In that case, all its capabilities must be pointing
to the watcher, which will send them through to the actual target (or not).
With such an implicit capability, it would be impossible to intercept these
calls.

Operations on Thread objects:
\begin{itemize}
\item Get information about the thread.  Details of this are
architecture-specific.  Standard ways are defined for getting and setting some
flags (whether the process is running or waiting for a message, setting these
flags is a way to control this for other Threads), the program counter and the
stack pointer.  This call is also used to get the contents of processor
registers and possibly other information which is different per Thread.
\item Let Iris schedule the next process.  This is not thread-specific.
\item Get the top Memory object.  This is not thread-specific.  Most Threads
are not allowed to perform this operation.  It is given to the initial Threads.
They can pass it on to Threads that need it (if any).
\item In the same category, register a Receiver for an interrupt.  Upon
registration, the interrupt is enabled.  When the interrupt arrives, the
registered Receiver gets a message from Iris and the interrupt is disabled
again.  After the Thread has handled the interrupt, it must reregister it in
order to enable it again.
\item Allocate a range of contiguous physical memory.  This is only relevant
for device drivers whose device will directly access the storage, such as the
display driver.  The result of this call is that the memory is counted as used
by the Thread, and it is reserved, but it is not returned.  Instead, the
address of physical memory is returned, and the pages need to be retrieved with
the next operation.  This capability is not present in normally created
threads.
\item Allocate a page of physical memory.  This is used in combination with the
previous operation to reserve a block of physical memory, and by device drivers
to map I/O memory into their address space.  There is a flag indicating whether
this memory should be freed (ranges) or not (I/O).  Users of this operation are
trusted to handle it properly; no checks are done to ensure that no kernel
memory is leaked, or that the allocated memory isn't used by other threads or
the kernel.  Of course, this capability is not present in normally created
threads.
\item Get the physical address of a page.  Only device drivers need to know the
physical address of their pages, so this operation is not available on normal
threads.
\item And similarly, allow these priviledged operations (or some of them) in an
other thread.  This is a property of the caller, because the target thread
normally doesn't have the permission to do this (otherwise the call would not
be needed).  The result of this operation is a new Thread capability with all
specified rights set.  Normally this is inserted in a priviledged process's
address space during setup, before it is run (instead of the capability which
is obtained during Thread creation).
\end{itemize}

\subsection{Page and Cappage}
A Page can be used to store user data.  It can be mapped into an address space
(a Memory object).  Threads can then use the data directly.  A Cappage is very
similar, in that it is owned by the user.  However, the user cannot see its
contents directly.  It contains a frame with Capabilities.  They can be invoked
like other owned capabilities.  The main feature of a Cappage, however, is that
they can be shared.  It is a fast way to copy many capabilities to a different
address space.  Capabilities in a Cappage are not directly owned by the Memory,
and thus cannot be dropped.

Operations on Page and Cappage objects:
\begin{itemize}
\item Copy or move the frame to a different Page, which is usually in a
different Memory.  This way, large amounts of data can be copied between
address spaces without needing to really copy it.
\item Set or get flags, which contain information on whether the page is
shared, is writable, has a frame allocated, and is paying for the frame.  Not
all flags can be set in all cases.
\item Cappages can also set a capability in the frame (pointed to with an index).
\end{itemize}

\subsection{Capability}
A capability object can be invoked to send a message to a receiver or to Iris
itself.  The owner cannot see from the capability where it points.  This is
important, because the user must be able to substitute the capability for a
different one, without the program noticing.  In some cases, it is needed to
say things about capabilities.  For example, a Memory can list the Capabilities
owned by it.  In such a case, the list consists of Capabilities which point to
other Capabilities.  These capabilities can also be used to destroy the target
capability (using an operation on the owning Memory object), for example.

Operations or capability objects:
\begin{itemize}
\item Get a copy of the capability.
\end{itemize}

\section{Interface classes}
Around Iris is a system of some programs to create the operating system.  These
include the device drivers.  While Iris itself needs no specific interfaces
from them, some interface classes are defined, which are used by the default
environment.  By defining classes, it is possible to let a program use any
device of that type without needing changes to its code.

These definitions are in the source.  A copy of the information here would only
lead to it getting outdated.

\end{document}