mirror of
git://projects.qi-hardware.com/iris.git
synced 2025-01-04 12:00:15 +02:00
389 lines
22 KiB
TeX
389 lines
22 KiB
TeX
% Iris: micro-kernel for a capability-based operating system.
|
|
% kernel.tex: Description of Iris.
|
|
% Copyright 2009 Bas Wijnen <wijnen@debian.org>
|
|
%
|
|
% This program is free software: you can redistribute it and/or modify
|
|
% it under the terms of the GNU General Public License as published by
|
|
% the Free Software Foundation, either version 3 of the License, or
|
|
% (at your option) any later version.
|
|
%
|
|
% This program is distributed in the hope that it will be useful,
|
|
% but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
% GNU General Public License for more details.
|
|
%
|
|
% You should have received a copy of the GNU General Public License
|
|
% along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
\documentclass{shevek}
|
|
\begin{document}
|
|
\title{Overview of Iris}
|
|
\author{Bas Wijnen}
|
|
\date{\today}
|
|
\maketitle
|
|
\begin{abstract}
|
|
This document briefly describes the inner workings of my kernel, Iris,
|
|
including the reasons for the choices that were made. It is meant to be
|
|
understandable (with effort) for people who know nothing of operating systems.
|
|
On the other hand, it should also be readable for people who know about
|
|
computer architecture, but want to know about this kernel. It is probably
|
|
better suited for the latter category.
|
|
\end{abstract}
|
|
|
|
\tableofcontents
|
|
|
|
\section{Operating systems}
|
|
This section describes what the purpose of an operating system is, and defines
|
|
what I call an ``operating system''\footnote{Different people use very
|
|
different definitions, so this is not as trivial as it sounds.}. It also goes
|
|
into some detail about microkernels and capabilities. If you already know, you
|
|
can safely skip this section. It contains no information about Iris.
|
|
|
|
\subsection{The goal of an operating system}
|
|
In the 1980s, a computer could only run one program at a time. When the
|
|
program had finished, the next one could be started. This follows the
|
|
processor itself: it runs a program, from the beginning until the end, and
|
|
can't run more than one program simultaneously\footnote{Multi-core processors
|
|
technically can run multiple programs simultaneously, but I'm not talking about
|
|
those here.}. In those days, an \textit{operating system} was the program that
|
|
allowed other programs to be started. The best known operating systems were
|
|
called \textit{Disk operating system}, or \textit{DOS} (of which there were
|
|
several).
|
|
|
|
At some point, there was a need for programs that would ``help'' other programs
|
|
in some way. For example, they could provide a calculator which would pop up
|
|
when the user pressed a certain key combination. Such programs were called
|
|
\textit{terminate and stay resident} programs, or TSRs. This name came from
|
|
the fact that they terminated, in the sense that they would allow the next
|
|
program to be run, but they would stay resident and do their job in the
|
|
background.
|
|
|
|
At some point, people wanted to de \textit{multitasking}. That is, multiple
|
|
``real'' programs should run concurrently, not just some helpers. The easiest
|
|
way to implement this is with \textit{cooperative multitasking}. Every program
|
|
returns control to the system every now and then. The system switches between
|
|
all the running programs. The result is that every program runs for a short
|
|
time, several times per second. For the user, this looks like the programs are
|
|
all running simultaneously, while in reality it is similar to a chess master
|
|
playing simultaneously on many boards: he really plays on one board at a time,
|
|
but switches a lot. On such a system, the \textit{kernel} is the program that
|
|
chooses which program to run next. The \textit{operating system} is the kernel
|
|
plus some support programs which allow the user to control the system.
|
|
|
|
On a system where multiple programs all think they ``own'' the computer, there
|
|
is another problem: if more than one program tries to access the same device,
|
|
it is very likely that at least one of them, and probably both, will fail. For
|
|
this reason, \textit{device drivers} on a multitasking system must not only
|
|
allow the device to be controlled, but they must also make sure that concurrent
|
|
access doesn't fail. The simplest way to achieve this is simply to disallow
|
|
it (let all operations fail that don't come from the first program using the
|
|
driver). A better way, if the device can handle it, is to somehow make sure
|
|
that both work.
|
|
|
|
There is one problem with cooperative multitasking: when one program crashes,
|
|
or for some other reason doesn't return control to the system, the other
|
|
programs stop running as well. The solution to this is \textit{preemptive
|
|
multitasking}. This means that every program is interrupted every now and
|
|
then, without asking for it, and the system switches to a different program.
|
|
This makes the kernel slightly more complex, because it must take care to store
|
|
every aspect of the running programs. After all, the program doesn't expect to
|
|
be interrupted, so it can't expect its state to change either. This shouldn't
|
|
be a problem though. It's just something to remember when writing the kernel.
|
|
|
|
Concluding, every modern desktop kernel uses preemptive multitasking. This
|
|
requires a timer interrupt. The operating system consists of this kernel, plus
|
|
the support programs that allow the user to control the system.
|
|
|
|
\subsection{Microkernel}
|
|
Most modern kernels are so-called \textit{monolithic} kernels: they include
|
|
most of the operating system. In particular, they include the device drivers.
|
|
This is useful, because the device drivers need special attention anyway, and
|
|
they are very kernel-specific. Modern processors allow the kernel to protect
|
|
access to the hardware, so that programs can't interfere with each other. A
|
|
device driver which doesn't properly ask the kernel will simply not be allowed
|
|
to control the device.
|
|
|
|
However, adding device drivers and everything that comes with them
|
|
(filesystems, for example) to the kernel makes it a very large program.
|
|
Furthermore, it makes it an ever-changing program: as new devices are built,
|
|
new drivers must be added. Such a program can never become stable and
|
|
bug-free.
|
|
|
|
Conceptually much nicer is the microkernel. It includes the minimum that is
|
|
needed for a kernel, and nothing more. It does include task switching and some
|
|
mehtod for tasks to communicate with each other. It also ``handles'' hardware
|
|
interrupts, but all it really does is passing them to the device driver, which
|
|
is mostly a normal program. Some microkernels don't do memory manangement
|
|
(deciding which programs get how much and which memory), while others do.
|
|
|
|
The drawback of a microkernel is that it requires much more communication
|
|
between tasks. Where a monolithic kernel can serve a driver request from a
|
|
task directly, a microkernel must pass it to a device driver. Usually there
|
|
will be an answer, which must be passed back to the task. This means more task
|
|
switches. This doesn't need to be a big problem, if task switching is
|
|
optimized: because of the simpler structure of the microkernel, it can be much
|
|
faster at this than a monolithic kernel. And even if the end result is
|
|
slightly slower, in my opinion the stability is still enough reason to prefer a
|
|
microkernel over a monolitic one.
|
|
|
|
Summarizing, a microkernel needs to do task switching and inter-process
|
|
communication. Because mapping memory into an address space is closely related
|
|
to task switching, it is possible to include memory management as well. The
|
|
kernel must accept hardware interrupts, but doesn't handle them (except the
|
|
timer interrupt).
|
|
|
|
\subsection{Capabilities}
|
|
Above I explained that the kernel must allow processes to communicate. Many
|
|
systems allow communication through the filesystem: one process writes to a
|
|
file, and an other process reads from it. This implies that any process can
|
|
communicate with any other process, if they only have a place to write in the
|
|
filesystem, where the other can read.
|
|
|
|
This is a problem because of security. If a process cannot communicate with
|
|
any part of the system, except the parts that it really needs to perform its
|
|
operation, it cannot leak or damage the other parts of the system either. The
|
|
reason that this is relevant is not that users will run programs that try to
|
|
ruin their system (although this may happen as well), but that programs may
|
|
break and damage random parts of the system, or be taken over by
|
|
crackers\footnote{Crackers are better known by the public as ``hackers''.
|
|
However, I use this word to describe people who like to play with software (or
|
|
sometimes also with other things). Therefore the malicious people who use
|
|
hacking skills for evil need a different name.}. If the broken or malicious
|
|
process has fewer rights, it will also do less damage to the system.
|
|
|
|
This leads to the goal of giving each process as little rights as possible.
|
|
For this, it is best to have rights in a very fine-grained way. Every
|
|
operation of a driver (be it a hardware device driver, or just a shared program
|
|
such as a file system) should have its own key, which can be given out without
|
|
giving keys to the entire driver (or even multiple drivers). Such a key is
|
|
called a capability. For example, a capability can allow the holder to access
|
|
a single file, or to use one specific network connection, or to see what keys
|
|
are typed by the user.
|
|
|
|
Some operations are performed directly on the kernel itself. For those, the
|
|
kernel can provide its own capabilities. Processes can create their own
|
|
objects which can receive capability calls, and capabilities for those can be
|
|
generated by them. Processes can copy capabilities to other processes, if they
|
|
have a channel to send them (using an existing capability). This way, any
|
|
operation of the process with the external world goes through a capability, and
|
|
only one system call is needed: \textit{invoke}.
|
|
|
|
This has a very nice side-effect, which is that it becomes very easy to tap
|
|
communication of a task you control. This means that a user can redirect
|
|
certain requests from programs which don't do exactly what is desired to do
|
|
nicer things. For example, a program can be prevented from opening pop-up
|
|
windows. In other words, it puts control of the computer from the programmer
|
|
into the hands of the user (as far as allowed by the system administrator).
|
|
This is a very good thing.
|
|
|
|
\section{Communication}
|
|
This section shortly describes how communication between threads is performed
|
|
by Iris. Below are more details about the kernel structures, this section just
|
|
explains which steps are taken.
|
|
|
|
Iris doesn't hold any state about the communication, other than the state that
|
|
it holds for threads on request of the threads (in the memory paid for by the
|
|
threads). For Iris, there is no such thing as a \textit{conversation}. There
|
|
are messages. When there is a conversation, Iris just sees several messages
|
|
going both ways. For Iris these are not connected\footnote{This is not
|
|
entirely true; Iris has call capabilities as an optimization feature. They do
|
|
implement some conversation aspects. But they are only an optimization: Iris
|
|
doesn't require them to be used.}.
|
|
|
|
So understanding communication between threads boils down to understanding the
|
|
transfer of a single message. A message is short: four 32-bit words of data,
|
|
plus four capabilities. Besides that, a 64-bit protected value is sent. This
|
|
value is set by the creator of the capability, usually the server, and cannot
|
|
be changed by the invoker.
|
|
|
|
Sending a message between threads is mostly about a Receiver object. The
|
|
server has a Receiver, for which it creates a capability (with the mentioned
|
|
protected data). If a client wants to contact the server, it must get this
|
|
capability.
|
|
|
|
The client then invokes the capability with four data words and four
|
|
capabilities (possibly set to 0). The message is queued by the receiver. The
|
|
capabilities are stored into a Caps object.
|
|
|
|
When the server is ready for it, it queries the receiver for new messages. It
|
|
then gets the protected data, the four data words and copies of the
|
|
capabilities. The ones in the receiver's Caps are invalidated and can be
|
|
reused after that. Note that it does not get a capability of the sender,
|
|
unless the sender sends it. There is no way for the server to know who is
|
|
sending the message, only which capability was used (through the protected
|
|
data).
|
|
|
|
\section{Kernel objects}
|
|
This section describes all kernel objects of Iris, and the operations that can
|
|
be performed on them. One operation is possible on any kernel object (except a
|
|
message and reply and call Capabilities). This operation is \textit{degrade}.
|
|
It creates a copy of the capability with some rights removed. This can be
|
|
useful when giving away a capability.
|
|
|
|
\subsection{Memory}
|
|
A Memory object is a container for storing things. All objects live inside a
|
|
Memory object. A Memory object can contain other Memory objects, Capabilities,
|
|
Receivers, Threads, Pages and Cappages.
|
|
|
|
A Memory object is also an address space. Pages can be mapped (and unmapped).
|
|
Any Thread in a Memory object uses this address space while it is running.
|
|
|
|
Every Memory object has a limit. When this limit is reached, no more Pages can
|
|
be allocated for it (including Pages which it uses to store other objects).
|
|
Using a new Page in a Memory object implies using it in all ancestor Memory
|
|
objects. This means that setting a limit which is higher than the parent's
|
|
limit means that the parent's limit applies anyway.
|
|
|
|
Operations on Memory objects:
|
|
\begin{itemize}
|
|
\item Create a new item of type Receiver, Memory, Thread, Page, or Cappage.
|
|
\item Destroy an item of any type, which is owned by the Memory.
|
|
\item List items owned by the Memory, Pages mapped in it, and messages in owned
|
|
Receiver's queues.
|
|
\item Map a Page at an address.
|
|
\item Get the Page which is mapped at a certain address.
|
|
\item Get and set the limit, which is checked when allocating pages for this
|
|
Memory or any sub-structure.
|
|
\item Drop a capability. This can only be done by Threads owned by the Memory,
|
|
because only they can present capabilities owned by it.\footnote{Iris checks if
|
|
presented capabilities are owned by the Thread's Memory. If they aren't, no
|
|
capability is passed instead. The destroy operation destroys an object that a
|
|
capability points to. Drop destroys the capability itself. If a Thread from
|
|
an other Memory would try to drop a capability, Iris would refuse to send it in
|
|
the message, or it would not be dropped because it would be owned by a
|
|
different Memory.}
|
|
\end{itemize}
|
|
|
|
\subsection{Receiver}
|
|
A receiver object is used for inter-process communication. Capabilities can be
|
|
created from it. When those are invoked, the receiver can be used to retrieve
|
|
the message.
|
|
|
|
Operations on Receiver objects:
|
|
\begin{itemize}
|
|
\item Set the owner. This is the Thread that messages will be sent to when
|
|
they arrive. Messages are stored in the receiver until the owner is ready to
|
|
accept them. If it is waiting while the message arrives, it is immediately
|
|
delivered.
|
|
\item Create a capability. The new capability should be given to Threads who
|
|
need to send a message to the receiver.
|
|
\item Create a call capability. This is an optimization. Because
|
|
\textit{calls} happen a lot, where a capability is created, sent in a message,
|
|
then a reply is sent over this new capability, and then it is dropped. This
|
|
can be done using a call capability. The call capability is invoked instead of
|
|
the target, and the target is specified where the reply capability should be.
|
|
The message is sent to the call capability (which is handled by the Receiver in
|
|
the kernel). It creates a new reply capability and sends the message to the
|
|
target with it. When the reply capability is invoked, the message is sent to
|
|
the owner, and the capability is dropped. This approach reduces the number of
|
|
kernel calls from four (create, call, reply, drop) to two (call, reply).
|
|
\end{itemize}
|
|
|
|
\subsection{Thread}
|
|
Thread objects hold the information about the current state of a thread. This
|
|
state is used to continue running the thread. The address space is used to map
|
|
the memory for the Thread. Different Threads in the same address space have
|
|
the same memory mapping. All Threads in one address space (often there is only
|
|
one) together are called a process.
|
|
|
|
Because all threads have a capability to their own Thread object (for claiming
|
|
Receivers), this is also used to make some calls which don't actually need an
|
|
object. The reason that these are not operations on some fake object which
|
|
every process implicitly owns, is that for debugging it may be useful to see
|
|
every action of a process. In that case, all its capabilities must be pointing
|
|
to the watcher, which will send them through to the actual target (or not).
|
|
With such an implicit capability, it would be impossible to intercept these
|
|
calls.
|
|
|
|
Operations on Thread objects:
|
|
\begin{itemize}
|
|
\item Get information about the thread. Details of this are
|
|
architecture-specific. Standard ways are defined for getting and setting some
|
|
flags (whether the process is running or waiting for a message, setting these
|
|
flags is a way to control this for other Threads), the program counter and the
|
|
stack pointer. This call is also used to get the contents of processor
|
|
registers and possibly other information which is different per Thread.
|
|
\item Let Iris schedule the next process. This is not thread-specific.
|
|
\item Get the top Memory object. This is not thread-specific. Most Threads
|
|
are not allowed to perform this operation. It is given to the initial Threads.
|
|
They can pass it on to Threads that need it (if any).
|
|
\item In the same category, register a Receiver for an interrupt. Upon
|
|
registration, the interrupt is enabled. When the interrupt arrives, the
|
|
registered Receiver gets a message from Iris and the interrupt is disabled
|
|
again. After the Thread has handled the interrupt, it must reregister it in
|
|
order to enable it again.
|
|
\item Allocate a range of contiguous physical memory. This is only relevant
|
|
for device drivers whose device will directly access the storage, such as the
|
|
display driver. The result of this call is that the memory is counted as used
|
|
by the Thread, and it is reserved, but it is not returned. Instead, the
|
|
address of physical memory is returned, and the pages need to be retrieved with
|
|
the next operation. This capability is not present in normally created
|
|
threads.
|
|
\item Allocate a page of physical memory. This is used in combination with the
|
|
previous operation to reserve a block of physical memory, and by device drivers
|
|
to map I/O memory into their address space. There is a flag indicating whether
|
|
this memory should be freed (ranges) or not (I/O). Users of this operation are
|
|
trusted to handle it properly; no checks are done to ensure that no kernel
|
|
memory is leaked, or that the allocated memory isn't used by other threads or
|
|
the kernel. Of course, this capability is not present in normally created
|
|
threads.
|
|
\item Get the physical address of a page. Only device drivers need to know the
|
|
physical address of their pages, so this operation is not available on normal
|
|
threads.
|
|
\item And similarly, allow these priviledged operations (or some of them) in an
|
|
other thread. This is a property of the caller, because the target thread
|
|
normally doesn't have the permission to do this (otherwise the call would not
|
|
be needed). The result of this operation is a new Thread capability with all
|
|
specified rights set. Normally this is inserted in a priviledged process's
|
|
address space during setup, before it is run (instead of the capability which
|
|
is obtained during Thread creation).
|
|
\end{itemize}
|
|
|
|
\subsection{Page and Cappage}
|
|
A Page can be used to store user data. It can be mapped into an address space
|
|
(a Memory object). Threads can then use the data directly. A Cappage is very
|
|
similar, in that it is owned by the user. However, the user cannot see its
|
|
contents directly. It contains a frame with Capabilities. They can be invoked
|
|
like other owned capabilities. The main feature of a Cappage, however, is that
|
|
they can be shared. It is a fast way to copy many capabilities to a different
|
|
address space. Capabilities in a Cappage are not directly owned by the Memory,
|
|
and thus cannot be dropped.
|
|
|
|
Operations on Page and Cappage objects:
|
|
\begin{itemize}
|
|
\item Copy or move the frame to a different Page, which is usually in a
|
|
different Memory. This way, large amounts of data can be copied between
|
|
address spaces without needing to really copy it.
|
|
\item Set or get flags, which contain information on whether the page is
|
|
shared, is writable, has a frame allocated, and is paying for the frame. Not
|
|
all flags can be set in all cases.
|
|
\item Cappages can also set a capability in the frame (pointed to with an index).
|
|
\end{itemize}
|
|
|
|
\subsection{Capability}
|
|
A capability object can be invoked to send a message to a receiver or to Iris
|
|
itself. The owner cannot see from the capability where it points. This is
|
|
important, because the user must be able to substitute the capability for a
|
|
different one, without the program noticing. In some cases, it is needed to
|
|
say things about capabilities. For example, a Memory can list the Capabilities
|
|
owned by it. In such a case, the list consists of Capabilities which point to
|
|
other Capabilities. These capabilities can also be used to destroy the target
|
|
capability (using an operation on the owning Memory object), for example.
|
|
|
|
Operations or capability objects:
|
|
\begin{itemize}
|
|
\item Get a copy of the capability.
|
|
\end{itemize}
|
|
|
|
\section{Interface classes}
|
|
Around Iris is a system of some programs to create the operating system. These
|
|
include the device drivers. While Iris itself needs no specific interfaces
|
|
from them, some interface classes are defined, which are used by the default
|
|
environment. By defining classes, it is possible to let a program use any
|
|
device of that type without needing changes to its code.
|
|
|
|
These definitions are in the source. A copy of the information here would only
|
|
lead to it getting outdated.
|
|
|
|
\end{document}
|