527 lines
24 KiB
C
527 lines
24 KiB
C
// vi:ft=c
|
|
/**
|
|
\mainpage Fiasco.OC & L4 Runtime Environment (L4Re)
|
|
|
|
\section l4re_concepts_preface Preface
|
|
|
|
The intention of this document is to provide a birds eye overview
|
|
about L4Re and about the environment in which typical
|
|
applications and servers run. We highlight here the principled
|
|
functionality of the servers in the environment but do not
|
|
discuss their specific interfaces. Detailed documentation about
|
|
these interface is available in the modules section.
|
|
|
|
The document is meant as a general overview repeating many design
|
|
concepts of L4-based systems and capability systems in general.
|
|
We do though assume familiarity with C++ and an idea on the
|
|
general concepts and terms of L4: threads --- as an abstraction
|
|
for execution ---, tasks --- holding the capabilities to kernel
|
|
objects that are accessible by the threads executing in this task
|
|
---, and \link l4_ipc_api IPC\endlink over
|
|
\link l4_kernel_object_gate_api IPC-gates\endlink to send messages
|
|
and to transfer capabilities between tasks.
|
|
|
|
\todo A more detailed introduction to these concepts is given in the User Manual.
|
|
\todo _XXX REF TO APPROPRIATE SPEC / BIG TODO / L4-Ref
|
|
|
|
\section l4re_concepts_structure General System Structure
|
|
|
|
The system has a multi-tier architecture consisting of the
|
|
following layers depicted in the figure below:
|
|
|
|
\li <b>Microkernel</b> The microkernel is the component at the lowest level of
|
|
the software stack. It is the only piece of software that is running in the
|
|
privileged mode of the processor.
|
|
|
|
\li <b>Tasks</b> Tasks are the basic containers (address spaces) in which system
|
|
services and applications are executed. They run in the processor's deprivileged
|
|
user mode.
|
|
|
|
\image html l4re-basic.png "Basic Structure of an L4Re based system"
|
|
\image latex l4re-basic.pdf "Basic Structure of an L4Re based system"
|
|
|
|
In terms of functionality, the system is structured as follows:
|
|
|
|
\li <b>Microkernel</b> The kernel provides primitives to execute programs in tasks,
|
|
to enforce isolation among them, and to provide means of secure communication in
|
|
order to let them cooperate. As the kernel is the most privileged, security-critical
|
|
software component in the system, it is a general design goal to make it as small
|
|
as possible in order to reduce its attack surface. It provides only a minimal set of
|
|
mechanisms that are necessary to support applications.
|
|
|
|
\li <b>Runtime Environment</b> The small kernel offers a concise set of interfaces,
|
|
but these are not necessarily suited for building applications directly on top of
|
|
it. The L4 Runtime Environment aims at providing more convenient abstractions for
|
|
application development. It comprises low-level software components that interface
|
|
directly with the microkernel. The root pager \em sigma0 and the root task \em Moe
|
|
are the most basic components of the runtime environment. Other services (e.g., for
|
|
device enumeration) use interfaces provided by them.
|
|
|
|
\li \b Applications Applications run on top of the system and use services
|
|
provided by the Runtime Environment -- or by other applications. There may be
|
|
several types of applications in the system and even virtual machine monitors
|
|
and device drivers are considered applications in the terminology used in this
|
|
document. They are running alongside other applications on the system.
|
|
|
|
Lending terminology from the distributed systems area, applications offering
|
|
services to other applications are usually called \em servers, whereas
|
|
applications using those services are named \em clients. Being in both
|
|
roles is also common, for instance, a file system server may be viewed as a
|
|
server with respect to clients using the file system, while the server itself
|
|
may also act as a client of a hard disk driver.
|
|
|
|
In the following sections, we discuss the basic concepts of our microkernel and
|
|
its runtime environment in more depth.
|
|
|
|
\section l4re_concepts_fiasco The Fiasco.OC Microkernel
|
|
|
|
The Fiasco.OC microkernel is the lowest-level piece of software running in an
|
|
L4-based system. The microkernel is the only program that runs in privileged
|
|
processor mode. It does not include complex services such as program loading,
|
|
device drivers, or file systems; those are implemented in user-level programs on
|
|
top of it (a basic set these services and abstractions is provided by the L4
|
|
Runtime Environment).
|
|
|
|
Fiasco.OC kernel services are implemented in kernel objects. Tasks hold
|
|
references to kernel objects in their respective \em "object space", which is a
|
|
kernel-protected table.
|
|
These references are called \em capabilities. Fiasco system calls are function
|
|
invocations on kernel objects through the corresponding capabilities. These
|
|
can be thought of as function invocations on object references in an
|
|
object-oriented programming environment. Furthermore, if a task owns a
|
|
capability, it may grant other tasks the same (or fewer) rights on this object
|
|
by passing the capability from its own to the other task's object space.
|
|
|
|
From a design perspective, capabilities are a concept that enables flexibility
|
|
in the system structure. A thread that invokes an object through a capability
|
|
does not need to care about where this object is implemented. In fact, it is
|
|
possible to implement all objects either in the kernel or in a user-level
|
|
server and replace one implementation with the other transparently for clients.
|
|
|
|
\subsection l4re_concepts_fiasco_ipc Communication
|
|
|
|
The basic communication mechanism in L4-based systems is called
|
|
\em "Inter Process Communication (IPC)". It is always synchronous, i.e. both
|
|
communication partners need to actively rendezvous for IPC. In addition to
|
|
transmitting arbitrary data between threads, IPC is also used to resolve
|
|
hardware exceptions, faults and for virtual memory management.
|
|
|
|
\subsection l4re_concepts_fiasco_kobjects Kernel Objects
|
|
|
|
The following list gives a short overview of the kernel objects provided by
|
|
the Fiasco.OC microkernel:
|
|
|
|
\li <b>Task</b> A task comprises a memory address space (represented by the task's page table), an object space (holding the kernel protected capabilities), and on X86 an IO-port address space.
|
|
\li <b>Thread</b> A thread is bound to a task and executes code. Multiple
|
|
threads can coexist in one task and are scheduled by the Fiasco scheduler.
|
|
\li <b>Factory</b> A factory is used by applications to create new kernel
|
|
objects. Access to a factory is required to create any new kernel object.
|
|
Factories can control and restrict object creation.
|
|
\li <b>IPC Gate</b> An IPC gate is used to create a secure communication
|
|
channel between different tasks. It embeds a label (kernel protected payload) that securely identifies
|
|
the gate through which a message is received. The gate label is not visible
|
|
to and cannot be altered by the sender.
|
|
\li <b>IRQ</b> IRQ objects provide access to hardware interrupts. Additionally,
|
|
programs can create new virtual interrupt objects and trigger them. This
|
|
allows to implement a signaling mechanism. The receiver cannot decide whether
|
|
the interrupt is a physical or virtual one.
|
|
\li <b>Vcon</b> Provides access to the in-kernel debugging console (input and output).
|
|
There is only one such object in the kernel and it is only available, if the
|
|
kernel is built with debugging enabled. This object is typically interposed through a user-level service or without debugging in the kernel can be completely based on user-level services.
|
|
\li <b>Scheduler</b> Implements scheduling policy and assignment of threads
|
|
to CPUs, including CPU statistics.
|
|
|
|
\section main_l4re_sec L4 Runtime Environment (L4Re)
|
|
|
|
The L4 Runtime Environment (L4Re) provides a basic set of services and
|
|
abstractions, which are useful to implement and run user-level applications on
|
|
top of the Fiasco.OC microkernel.
|
|
|
|
L4Re consists of a set of libraries and servers. Libraries as well as server
|
|
interfaces are completely object oriented. They implement prototype
|
|
implementations for the classes defined by the L4Re specification.
|
|
|
|
A minimal L4Re-based application needs 3 components to be booted beforehand:
|
|
the Fiasco microkernel, the root pager (Sigma0), and the root task (Moe). The
|
|
Sigma0 root pager initially owns all system resources, but is usually used
|
|
only to resolve page faults for the Moe root task. Moe provides the essential
|
|
services to normal user applications such as an initial program loader, a
|
|
region-map service for virtual memory management, and a memory (data space)
|
|
allocator.
|
|
|
|
|
|
\section l4re_concepts Introduction to L4Re's concepts
|
|
|
|
This section introduces basic concepts used by L4Re. Understanding of these
|
|
concepts is a fundamental requirement to understand the inner workings of
|
|
L4Re's software components and can dramatically help developers in
|
|
efficiently developing L4Re-based software.
|
|
|
|
\if WORKING_SUBPAGES
|
|
\section l4re_concepts_details Further Details
|
|
|
|
- \subpage l4re_concepts_naming
|
|
- \subpage l4re_concepts_env_and_start
|
|
- \subpage l4re_concepts_ds_rm
|
|
- \subpage l4re_concepts_env
|
|
- \subpage l4re_concepts_stdio
|
|
- \subpage l4re_concepts_memalloc
|
|
- \subpage l4re_concepts_apps_svr
|
|
\endif
|
|
|
|
|
|
\if WORKING_SUBPAGES
|
|
\page l4re_concepts_ds_rm Memory management - Data Spaces and the Region Map
|
|
\else
|
|
\section l4re_concepts_ds_rm Memory management - Data Spaces and the Region Map
|
|
\endif
|
|
|
|
\subsection l4re_concept_pagers User-level paging
|
|
|
|
Memory management in L4-based systems is done by user-level applications, the
|
|
role is usually called \em pager. Tasks can give other tasks full or
|
|
restricted access rights to parts of their own memory. The kernel offers means
|
|
to grant the memory in a secure way, often referred to as \em memory mapping.
|
|
|
|
The described mechanism can be used to construct a memory hierarchy among
|
|
tasks. The root of the hierarchy is \em sigma0, which initially gets all
|
|
system resources and hands them out once on a first-come-first-served basis.
|
|
Memory resources can be mapped between tasks at a page-size granularity. This
|
|
size is predetermined by the CPU's memory management unit and is commonly set
|
|
to 4 kB.
|
|
|
|
\subsection l4re_concept_data_spaces Data spaces
|
|
|
|
A data space is the L4Re abstraction for objects which may be
|
|
accessed in a memory mapped fashion (i.e., using normal memory
|
|
read and write instructions). Examples include the sections of a
|
|
binary which the loader attaches to the application's address
|
|
space, files in the ROM or on disk provided by a file server, the
|
|
registers of memory-mapped devices and anonymous memory such as
|
|
the heap or the stack.
|
|
|
|
Anonymous memory data spaces in particular (but in general all
|
|
data spaces except memory mapped IO) can either be constructed
|
|
entirely from a portion of the RAM or the current working set may
|
|
be multiplexed on some portion of the RAM. In the first case it
|
|
is possible to eagerly insert all pages (more precisely
|
|
page-frame capabilities) into the application's address space
|
|
such that no further page faults occur when this data space is
|
|
accessed. In general, however, only the pages for the some
|
|
portion are provided and further pages are inserted by the pager
|
|
as a result of page faults.
|
|
|
|
\subsection l4re_concept_regions Virtual Memory Handling
|
|
|
|
The virtual memory of each task is constructed from data spaces, backing virtual memory regions (VMRs).
|
|
The management of the VMRs is provided by an object called \em region \em map. A dedicated region-map object is associated with each task,
|
|
it allows to attach and detach data spaces to an
|
|
address space as well as to reserve areas of virtual memory. Since the region-map
|
|
object possesses all knowledge about virtual memory layout of a task, it also serves as
|
|
an application's default pager.
|
|
|
|
\subsection l4re_concept_mem_alloc Memory Allocation
|
|
|
|
Operating systems commonly use anonymous memory for implementing dynamic
|
|
memory allocation (e.g., using \em malloc or \em new). In an
|
|
L4Re-based system, each task gets assigned a memory allocator providing
|
|
anonymous memory using data spaces.
|
|
|
|
|
|
\see \ref api_l4re_dataspace and \ref api_l4re_rm.
|
|
|
|
|
|
\if WORKING_SUBPAGES
|
|
\page l4re_concepts_naming Capabilities and Naming
|
|
\else
|
|
\section l4re_concepts_naming Capabilities and Naming
|
|
\endif
|
|
|
|
The L4Re system is a capability based system which uses and offers
|
|
capabilities to implement fine-grained access control.
|
|
|
|
Generally, owning a capability means to be allowed to communicate with the
|
|
object the capability points to. All user-visible kernel objects, such as
|
|
tasks, threads, and IRQs, can be accessed only through a capability.
|
|
Please refer to the \link l4_kernel_object_api Kernel Objects\endlink
|
|
documentation for details. Capabilities are stored in per-task capability
|
|
tables (the object space) and are referenced by capability selectors or
|
|
object flex pages. In a simplified view, a capability selector is a natural
|
|
number indexing into the capability table of the current task.
|
|
|
|
As a matter of fact, a system designed solely based on capabilities, uses
|
|
so-called 'local names', because each task can only access those objects made
|
|
available to this task. Other objects are not visible to and accessible by the
|
|
task.
|
|
|
|
\image html l4-caps-basic.png "Capabilities and Local Naming in L4"
|
|
\image latex l4-caps-basic.pdf "Capabilities and Local Naming in L4
|
|
|
|
So how does an application get access to service?
|
|
In general all applications are started with an initial set of objects
|
|
available. This set of objects is predetermined by the creator of a new
|
|
application process and granted directly to into the new task before starting
|
|
the first application thread. The application can then use these initial objects
|
|
to request access to further objects or to transfer capabilities to own objects
|
|
to other applications. A central L4Re object for exchanging capabilities at
|
|
runtime is the name-space object, implementing a store of named capabilities.
|
|
|
|
From a security perspective, the set of initial capabilities (access rights to
|
|
objects) completely define the execution environment of an application.
|
|
Mandatory security policies can be defined by well known properties of the
|
|
initial objects and carefully handled access rights to them.
|
|
|
|
|
|
\if WORKING_SUBPAGES
|
|
\page l4re_concepts_env_and_start Initial Environment and Application Bootstrapping
|
|
\else
|
|
\section l4re_concepts_env_and_start Initial Environment and Application Bootstrapping
|
|
\endif
|
|
|
|
New applications that are started by a loader conforming to L4Re get
|
|
provided an \link api_l4re_env initial environment\endlink. This environment
|
|
comprises a set of capabilities to initial L4Re objects that are
|
|
required to bootstrap and run this application. These
|
|
capabilities include:
|
|
|
|
\li A capability to an initial memory allocator for obtaining memory in the
|
|
form of data spaces
|
|
\li A capability to a factory which can be used to create additional kernel
|
|
objects
|
|
\li A capability to a Vcon object for debugging output and maybe input
|
|
\li A set of named capabilities to application specific objects
|
|
|
|
During the bootstrapping of the application, the loader establishes data
|
|
spaces for each individual region in the ELF binary. These include data spaces
|
|
for the code and data sections, and a data space backed with RAM for the stack
|
|
of the program's first thread.
|
|
|
|
One loader implementation is the \em Moe root task. Moe usually starts an \em
|
|
init process that is responsible for coordinating the further boot
|
|
process. The default \em init process is \em Ned, which implements a
|
|
script-based configuration and startup of other processes. Ned uses Lua
|
|
(http://www.lua.org) as its scripting language, see \ref l4re_servers_ned
|
|
"Ned Script example" for more details.
|
|
|
|
|
|
\subsection l4re_ns_config Configuring an application before startup
|
|
|
|
The default L4Re init process (Ned) provides a Lua script based configuration
|
|
of initial capabilities and application startup. Ned itself also has a set of
|
|
initial objects available that can be used to create the environment for an
|
|
application. The most important object is a kernel object factory that allows
|
|
creation of kernel objects such as IPC gates (communication channels), tasks,
|
|
threads, etc. Ned uses Lua tables (associative arrays) to represent sets of
|
|
capabilities that shall be granted to application processes.
|
|
|
|
\code
|
|
local caps = {
|
|
name = some_capability
|
|
}
|
|
\endcode
|
|
|
|
|
|
The 'L4' Lua package in Ned also has support functions to create application
|
|
tasks, region-map objects, etc. to start an ELF binary in a new task.
|
|
The package also contains Lua bindings for basic L4Re objects, for example, to
|
|
generic factory objects, which are used to create kernel objects and also
|
|
user-level objects provided by user-level servers.
|
|
|
|
\code
|
|
L4.default_loader:start({ caps = { some_service = service } }, "rom/program --arg");
|
|
\endcode
|
|
|
|
|
|
\subsection l4re_config_connection Connecting clients and servers
|
|
|
|
In general, a connection between a client and a server is represented by a
|
|
communication channel (IPC gate). That is available to the client and the
|
|
server. You can see the simplest connection between a client and a server
|
|
in the following example.
|
|
|
|
\code
|
|
local loader = L4.default_loader; -- which is Moe
|
|
local svc = loader:new_channel(); -- create an IPC gate
|
|
loader:start({ caps = { service = svc:full() }}, "rom/my_server");
|
|
loader:start({ caps = { service = svc:m("rw") }}, "rom/my_client");
|
|
\endcode
|
|
|
|
As you can see in the snippet, the first action is to create a new channel
|
|
(IPC gate) using \c loader:new_channel(). The capability to the gate is stored
|
|
in the variable \c svc. Then the binary \c my_server is started in a new task,
|
|
and full (\c :full()) access to the IPC gate is granted to the server as initial
|
|
object. The gate is accessible to the server application as "service" in the set of
|
|
its initial capabilities. Virtually in parallel a second task, running the client
|
|
application, is started and also given access to the IPC gate with less rights
|
|
(\c :m("rw"), note, this is essential). The server can now receive messages via the
|
|
IPC gate and provide some service and the client can call operations on the IPC gate
|
|
to communicate with the server.
|
|
|
|
Services that keep client specific state need to implement per-client server
|
|
objects. Usually it is the responsibility of some authority (e.g., Ned) to
|
|
request such an object from the service via a generic factory object that the
|
|
service provides initially.
|
|
|
|
\code
|
|
local loader = L4.default_loader; -- which is Moe
|
|
local svc = loader:new_channel():m("rws"); -- create an IPC gate with rws rights
|
|
loader:start({ caps = { service = svc:full() } }, "rom/my-service");
|
|
loader:start({ caps = { foo_service = svc:create(object_to_create, "param") }}, "rom/client");
|
|
\endcode
|
|
|
|
This example is quite similar to the first one, however, the difference is that
|
|
Ned itself calls the create method on the factory object provided by the server and
|
|
passes the returned capability of that request as "foo_service" to the client process.
|
|
|
|
\note The \c svc:create(..) call blocks on the server. This means the script execution
|
|
blocks until the my-service application handles the create request.
|
|
|
|
|
|
\if WORKING_SUBPAGES
|
|
\page l4re_concepts_stdio Program Input and Output
|
|
\else
|
|
\section l4re_concepts_stdio Program Input and Output
|
|
\endif
|
|
|
|
|
|
The initial environment provides a Vcon capability used as the standard
|
|
input/output stream. Output is usually connected to the parent of the
|
|
program and displayed as debugging output. The standard output is also used
|
|
as a back end to the C-style printf functions and the C++ streams.
|
|
|
|
Vcon services are implemented in Moe and the loader as well as by the Fiasco
|
|
kernel and connected either to the serial line or to the screen if
|
|
available.
|
|
|
|
\see \ref l4_vcon_api
|
|
|
|
|
|
\if WORKING_SUBPAGES
|
|
\page l4re_concepts_memalloc Initial Memory Allocator and Factory
|
|
\else
|
|
\section l4re_concepts_memalloc Initial Memory Allocator and Factory
|
|
\endif
|
|
|
|
The purpose of the memory allocator and of the factory is to provide
|
|
the application with the means to allocate memory (in the form of data spaces)
|
|
and kernel objects respectively.
|
|
An initial memory allocator and an initial factory are accessible via the
|
|
allocation L4Re environment.
|
|
|
|
\see \ref api_l4re_mem_alloc
|
|
|
|
|
|
The factory is a kernel object that provides the ability to create new
|
|
kernel objects dynamically. A factory imposes a resource limit for
|
|
kernel memory, and is thus a means to prevent denial of service attacks on
|
|
kernel resources. A factory can also be used to create new factory objects.
|
|
|
|
\see \ref l4_factory_api
|
|
|
|
|
|
|
|
\if WORKING_SUBPAGES
|
|
\page l4re_concepts_apps_svr Application and Server Building Blocks
|
|
\else
|
|
\section l4re_concepts_apps_svr Application and Server Building Blocks
|
|
\endif
|
|
|
|
So far we have discussed the environment of applications in which a single
|
|
thread runs and which may invoke services provided through their initial objects.
|
|
In the following we describe some building blocks to extend the
|
|
application in various dimensions and to eventually implement a server which
|
|
implements user-level objects that may in turn be accessed by other
|
|
applications and servers.
|
|
|
|
\if WORKING_SUBPAGES
|
|
\section l4re_concepts_app_thread Creating Additional Application Threads
|
|
\else
|
|
\subsection l4re_concepts_app_thread Creating Additional Application Threads
|
|
\endif
|
|
|
|
To create application threads, one must allocate a stack on which
|
|
this thread may execute, create a thread kernel object and setup
|
|
the information required at startup time (instruction pointer,
|
|
stack pointer, etc.). In L4Re this functionality is encapsulated in the
|
|
pthread library.
|
|
|
|
\if WORKING_SUBPAGES
|
|
\section l4re_concepts_service Providing a Service
|
|
\else
|
|
\subsection l4re_concepts_service Providing a Service
|
|
\endif
|
|
|
|
In capability systems, services are typically provided by
|
|
transferring a capability to those applications that are
|
|
authorised to access the object to which the capability refers to.
|
|
|
|
Let us discuss an example to illustrate how two parties can communicate with
|
|
each other:
|
|
Assume a simple file server, which implements an interface for accessing
|
|
individual files: read(pos, buf, length) and write(pos, data, length).
|
|
|
|
L4Re provides support for building servers based on the class
|
|
L4::Server_object. L4::Server_object provides an abstract interface to be
|
|
used with the L4::Server class. Specific server objects such as, in our
|
|
case, files inherit from L4::Server_object. Let us call this class
|
|
File_object. When invoked upon receiving a message, the L4::Server will
|
|
automatically identify the corresponding server object based on the
|
|
capability that has been provided to its clients and invoke this object's
|
|
\em dispatch function with the incoming message as a parameter. Based on
|
|
this message, the server must then decide which of the protocols it
|
|
implements was invoked (if any). Usually, it will evaluate a protocol
|
|
specific opcode that clients are required to transmit as one of the first
|
|
words in the message. For example, assume our server assigns the following
|
|
opcodes: Read = 0 and Write = 1. The \em dispatch function calls the
|
|
corresponding server function (i.e., \em File_object::read() or \em
|
|
File_object::write()), which will in turn parse additional
|
|
parameters given to the function. In our case, this would be the position
|
|
and the amount of data to be read or written. In case the write function was
|
|
called the server will now update the contents of the file with the data
|
|
supplied. In case of a read it will store the requested part of the file in
|
|
the message buffer. A reply to the client finishes the client request.
|
|
|
|
*/
|
|
|
|
|
|
/* This is some text we currently do not use:
|
|
|
|
\link api_l4re_dataspace Data spaces\endlink and the purpose of the \link
|
|
api_l4re_rm Region Map\endlink are explained in more detail in the following
|
|
section.
|
|
|
|
|
|
In Fiasco.OC capabilities are addressed in two different
|
|
ways.
|
|
|
|
A capability can be addressed with the help of a capability
|
|
descriptor \XXX Ref which identifies the position of one single
|
|
capability in the application's address space.
|
|
|
|
The second means to address a bunch of capabilities at once are
|
|
flexpages. A flexpage describes a region of the application's
|
|
address space that is of a power 2 size and size aligned. Thus
|
|
the name flexpage. When capabilities are to be transferred (see
|
|
IPC / MapItem) the flexpage decleared by the sender --- the send
|
|
flexpage --- specifies which capabilities are to be transferred.
|
|
These are at most those capabilities that are located within the
|
|
region described by the flexpage and precisely those in the
|
|
region that results from adjusting the flexpage with a possibly
|
|
smaller flexpage on the receiver side (see \XXX for more details
|
|
on how sender and receiver declared flexpages are adjusted). The
|
|
receiver declared flexpage --- the receive flexpage --- defines
|
|
where in the address space of the application capabilities are to
|
|
be received.
|
|
|
|
The key insight here is that applications are able to restrict
|
|
an invoked server such that it can only modify a part of the
|
|
applications address space --- the receive flexpage.
|
|
|
|
When invoking servers and when creating new objects one is faced
|
|
with the task to find not yet used parts in the address space of
|
|
the application at which the kernel or other servers may insert
|
|
capabilities. L4Re assists this task with the help of a capability
|
|
allocator.
|
|
|
|
*/
|