Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 127 additions & 49 deletions doc/source/architecture.rst
Original file line number Diff line number Diff line change
@@ -1,54 +1,132 @@
.. _architecture:

SO3 Architecture
################

SO3 follows a conventional operating-system organisation with a **user space**
and a **kernel space**. It is a *monolithic* kernel, like Linux: all subsystems
and device drivers live in the kernel.

.. figure:: img/so3_architecture.png
:width: 100%

Overview of the SO3 environment (standalone configuration).

The user space is a small set of lightweight applications (``init``, the shell,
``ls``, ``cat``, ``more``, ``ping``, …) built against the **MUSL** C library and
stored in a root filesystem. The kernel provides the usual subsystems —
process/thread management and scheduling, memory management, a virtual
filesystem, IPC, networking — plus a device-tree-driven device and driver model.

Source tree
===========

The kernel source lives under ``so3/`` and is organised by subsystem::

so3/
├── arch/ # architecture-specific code (arm32, arm64)
│ └── arm64/ # head/boot, exceptions, MMU, context switch, traps
├── kernel/ # processes, threads, scheduler, syscalls, time
├── mm/ # page frame allocator, kernel heap
├── fs/ # VFS, FAT, devfs, ELF loader
├── ipc/ # signals, pipes, semaphores, completions
├── net/ # networking, lwIP TCP/IP stack
├── devices/ # device model + drivers (irq, timer, serial, mmc, fb, …)
├── dts/ # device trees (*.dts → *.dtb)
├── avz/ # the AVZ hypervisor (built with CONFIG_AVZ)
├── soo/ # the SOO framework / SO3 capsules (CONFIG_SOO)
├── configs/ # defconfig files
└── lib/ # in-kernel helper libraries (libfdt, libroxml, …)

The user space lives under ``usr/`` and the surrounding tooling (bootloader,
emulator, root filesystem, deployment scripts) at the repository root — see
:ref:`build_system` and :ref:`user_space`.

Exception levels
================

The overall architecture of SO3 follows a conventional operating system organization with a user space and a kernel space.
The user space represents all user files and applications; everything is accessible via a *standard* directory tree
through a root filesystem.

The set of directories/sub-directories are generated by a well-known `buildroot <buildroot_>`__ tool used to generate
embedded Linux systems through cross-compilation. It provides a fully configurable (i.e. customizable) well-profiled
user environment and can lead to a minimal sized root filesystem.

.. figure:: img/SO3_Architecture.png
:scale: 50 %

Overview of the SO3 environment with user and kernel space

As depicted on this figure, SO3 is a monolithic OS like Linux where all subsystems and drivers reside in the kernel space.
Major subsystems like Inter-Process Communication (IPC), scheduling, filesystem, networking, etc. including device drivers
exist and provide basic functionalities. The user space is made of a very simple set of lightweight applications (*ls*,
*more*, *echo*, *cat*, etc.)

SO3 Kernel Space
----------------

The kernel space uses the device tree to have information related to hardware configuration like the RAM memory base and
size, device related information, and various system properties.

SO3 User Space
--------------

All user space files are located in ``usr/`` directory. Applications and the *libc* is in separated
directories.


The MUSL libc user space library
--------------------------------

SO3 integrates the `MUSL library <MUSL_libc_>`__ as *libc* for its user space application.

Not all functions are available in SO3, but the functions are enabled as soon as there is a necessity to have it.
Furthermore, more complex functions such as those used to manage ``pthreads`` for example are not activated since
only a minimal set of functionalities and features are present.

Please do not hesitate to start a discussion thread or simply ask for adding a new feature in ``musl`` as it becomes
necessary.



.. _MUSL_libc: https://musl.libc.org
.. _buildroot: https://buildroot.org


On ARM64, SO3 uses the architectural exception levels as follows.

.. figure:: img/so3_exception_levels.png
:width: 95%

Exception levels in the standalone and AVZ configurations.

* **Standalone** — user processes run at **EL0**, the SO3 kernel at **EL1**.
EL2 is unused. The kernel owns ``VBAR_EL1`` (exception vectors) and the
``TTBR0_EL1`` / ``TTBR1_EL1`` translation tables.
* **AVZ** — the **AVZ hypervisor** runs at **EL2** (``VBAR_EL2``, stage-2
translation via ``VTTBR_EL2``, ``HCR_EL2`` configured to trap and inject
interrupts). Each guest (the agency and any capsule) runs its own SO3 kernel
at EL1 with its user space at EL0, isolated by per-domain stage-2 tables.

The exact level a given build targets is selected by ``CONFIG_AVZ``. The same
C and assembly files implement both; EL2-specific code is guarded with
``#ifdef CONFIG_AVZ`` (for example the TLB-maintenance instructions in
``arch/arm64/cache.S`` and the vector handlers in ``arch/arm64/exception.S``).

Virtual address space
=====================

Each process owns a private set of page tables. On ARM64 the low half of the
address space (``TTBR0_EL1``) maps the current user process, and the high half
(``TTBR1_EL1``) maps the kernel.

.. figure:: img/so3_memory.png
:width: 100%

ARM64 virtual address space (standalone, EL1).

The kernel base address is configured by ``CONFIG_KERNEL_VADDR``:

.. flat-table::
:header-rows: 1
:widths: 30 40 30

* - Configuration
- ``CONFIG_KERNEL_VADDR``
- Notes
* - standalone (EL1)
- ``0xFFFF800000000000``
- kernel in the high (TTBR1) half
* - AVZ (EL2)
- ``0x0000100000000000``
- hypervisor address space

The ``__pa()`` / ``__va()`` macros (``include/memory.h``) translate between
kernel virtual addresses and physical addresses using ``CONFIG_KERNEL_VADDR``
and the physical RAM base discovered from the device tree. The initial user
program is mapped at ``USER_SPACE_VADDR`` (``0x1000``). See :ref:`kernel` for
the page-frame allocator and the MMU code.

Boot flow
=========

SO3 is started by **U-Boot**, which loads a **FIT image** (``.itb``) containing
the kernel, the device-tree blob and the root filesystem. In the AVZ
configuration the FIT also contains the hypervisor and the agency guest.

.. figure:: img/so3_boot.png
:width: 100%

Boot flow from U-Boot to the interactive shell.

1. **U-Boot** loads the FIT image from the boot medium and jumps to the entry
point of the first component.
2. In the **AVZ** configuration, control enters the hypervisor
(``avz_start()`` at EL2); AVZ sets up the stage-2 tables, *loads the guest
domain* and ``eret``\ s into the agency at EL1. In the **standalone**
configuration U-Boot jumps straight to the SO3 kernel entry
(``__start`` → ``kernel_start()``) at EL1.
3. The kernel brings itself up: ``memory_init()`` (frame table + MMU),
``devices_init()`` (device-tree probe, GIC, timer, serial), ``timer_init()``
and ``calibrate_delay()``, ``scheduler_init()``, then interrupts are enabled.
4. ``rest_init()`` runs as the first kernel thread and calls
``create_root_process()``, which maps the ``__root_proc`` trampoline at
``0x1000`` and enters EL0.
5. The root process ``execve()``\ s ``init.elf`` (the *SO3 Init Program*), which
in turn launches the shell (``sh.elf``) — the familiar ``so3%`` prompt.

The individual steps are described in :ref:`kernel`; the packaging of the FIT
image and the U-Boot/QEMU setup are covered in :ref:`build_system` and
:ref:`user_guide`.
179 changes: 179 additions & 0 deletions doc/source/avz.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
.. _avz:

AVZ Hypervisor
##############

**AVZ** (*Agency VirtualiZer*) is the type-1 hypervisor that ships inside the
SO3 tree. Built with ``CONFIG_AVZ``, the very same code base runs at **EL2** on
ARM64 and hosts guest *domains* at EL1. AVZ is small and Xen-inspired: it
provides stage-2 memory isolation, a domain scheduler, hypercalls, event
channels, grant tables and a virtual GIC, and nothing more.

AVZ is also the foundation of the :ref:`SO3 capsule <capsules>` model: an agency
domain owns the hardware while one or more lightweight capsule (S3C) guests run
beside it.

.. note::

In the demonstration shipped with this repository the agency is an **SO3**
kernel (enough to exercise the hypervisor). The full SO3 Capsule setup uses a
**Linux** agency, which — together with the SOO framework — lives in a
:ref:`separate repository <capsules>`.

.. figure:: img/so3_avz.png
:width: 100%

AVZ: domains isolated by stage-2 tables, and the EL2 services beneath them.

The code lives under ``so3/avz/`` (kernel, memory, scheduler, hypercalls, grant
tables, capsule build/inject) together with the EL2-specific parts of
``arch/arm64`` (``head.S`` MMU setup, ``exception.S`` EL2 vectors,
``context.S`` stage-2 switch, ``cache.S`` EL2 TLB ops) and the virtual GIC in
``devices/irq/``.

Boot and guest loading
======================

The hypervisor entry point is ``avz_start()`` (``avz/kernel/setup.c``). After
early CPU, memory and device initialisation it prints its banner and *loads the
guest domain*: it parses the FIT image provided by U-Boot, places the agency's
kernel and device tree in RAM, builds the agency's **stage-2** page tables and
sets the guest entry point. AVZ then ``eret``\ s to EL1, and the agency boots as
an ordinary SO3 kernel (``kernel_start()``). The console trace looks like::

********** Smart Object Oriented technology - AVZ Hypervisor **********
...
Now bootstraping the hypervisor kernel ...
***************** Loading Guest Domain *****************
...
********** Smart Object Oriented SO3 Operating System **********

Guest memory is organised in **memory slots** (``avz/include/avz/memslot.h``):
slot 0 is AVZ itself, slot 1 the agency, and the remaining slots are capsules.
Each slot maps a guest *intermediate physical address* (IPA) range to real
physical memory; ``ipa_to_pa()`` / ``pa_to_ipa()`` convert between them.

Domains
=======

A **domain** (``struct domain``, ``avz/include/avz/domain.h``) is a guest
instance: its virtual CPU state, its event-channel table, a pointer to the
shared info page and its scheduling metadata. Well-known identifiers
(``avz/include/avz/uapi/avz.h``):

.. flat-table::
:header-rows: 1
:widths: 35 65

* - Identifier
- Meaning
* - ``DOMID_AGENCY`` (0)
- the primary agency guest (owns the devices)
* - ``DOMID_AGENCY_RT`` (1)
- optional real-time agency subdomain
* - slots 2 …
- capsule domains
* - ``MAX_CAPSULE_DOMAINS``
- ``2 + 5`` — up to five capsules alongside the agencies

Each domain shares a page with the hypervisor — the ``avz_shared`` structure —
carrying its domain id, event-channel pending bits, the upcall state and the
guest's device-tree address.

Hypercalls
==========

Guests call into AVZ with the ``hvc`` instruction, which traps to the EL2
synchronous handler (``el12_sync_handler`` in ``arch/arm64/exception.S``) and is
dispatched by ``avz/kernel/hypercalls.c``. The generic hypercalls
(``avz/include/avz/uapi/avz.h``) are:

* ``AVZ_EVENT_CHANNEL_OP`` — allocate / bind / send / close event channels;
* ``AVZ_CONSOLE_IO_OP`` — console output for guests;
* ``AVZ_DOMAIN_CONTROL_OP`` — domain control (pause / unpause a capsule, …).

The capsule-management operations (inject, kill, read/write snapshot) used by the
SOO framework are built on top of these — see :ref:`capsules`.

Domain scheduling
=================

AVZ runs each domain on a CPU according to its role. The agency uses the
``sched_agency`` policy; capsules are scheduled by ``sched_flip``
(``avz/kernel/sched_flip.c``), a lightweight round-robin over the capsule
domains. A per-CPU ``current_domain`` pointer tracks the running guest; switching
domains saves and restores the EL1 register banks and reprograms ``VTTBR_EL2``
through ``__mmu_switch_vttbr()`` (``arch/arm64/context.S``).

Inter-domain communication
==========================

Event channels
--------------

Each domain has ``NR_EVTCHN`` (128) event-channel ports. A port can be
*unbound* (waiting for a peer), *interdomain* (bound to a remote domain's port)
or bound to a *virtual IRQ*. Sending an event sets a pending bit in the remote
domain's ``avz_shared`` page and, if needed, injects a virtual interrupt so the
guest is woken. Event channels are the signalling half of the split-driver
model.

Grant tables
------------

Grant tables (``avz/kernel/gnttab.c``) let one domain share specific memory
pages with another in a controlled way. A domain reserves a small set of grant
IPA pages; a peer maps a granted page by reference. This is how the shared rings
of the frontend/backend drivers and the capsule framebuffer are set up.

Virtual GIC
===========

Because guests must not touch the physical interrupt controller directly, AVZ
provides a **virtual GIC**.

* ``HCR_EL2.IMO`` routes all Group-1 physical IRQs to **EL2**. The EL2 IRQ
handler (``avz_el2_irq_handle()`` for GICv3, ``irq_handle()`` for GICv2)
decides what to do with each interrupt.
* The hypervisor's own interrupts — the EL2 timer (CNTHP, PPI 26) and the vGIC
**maintenance** interrupt (PPI 25) — are handled locally.
* All other interrupts destined for a guest are **injected** through the GIC
*list registers* (``ICH_LR*_EL2`` on GICv3, the GICH MMIO frame on GICv2). The
injected entry is *hardware-backed* (``HW = 1``) so that the physical
interrupt is deactivated automatically when the guest writes its own
end-of-interrupt — keeping hypervisor overhead minimal.
* Accesses by a guest to the physical GIC **distributor** are not mapped in the
guest stage-2 tables; they trap to EL2 and are emulated by the vGIC
(``devices/irq/vgic.c``), which forwards most register accesses and translates
SGI (software-generated interrupt) requests into AVZ's targeted-IPI helper.

EL2 vs EL1 in the shared code
=============================

Because the standalone and AVZ builds share the same files, a handful of
low-level operations differ by exception level and are guarded with
``#ifdef CONFIG_AVZ``:

.. flat-table::
:header-rows: 1
:widths: 30 35 35

* - Operation
- Standalone (EL1)
- AVZ (EL2)
* - TLB maintenance (``cache.S``)
- ``tlbi vmalle1`` / ``vae1is``
- ``tlbi alle2`` / ``vae2is``
* - MMU setup (``head.S``)
- ``ttbr0/1_el1``, ``sctlr_el1``
- ``ttbr0_el2``, ``tcr_el2``, ``sctlr_el2``
* - GIC CPU interface
- ``EOImode = 0``
- ``EOImode = 1`` (split priority-drop / deactivate)
* - sync/IRQ vectors (``exception.S``)
- ``el01_sync`` / ``el01_1_irq``
- ``el12_sync`` / ``el12_2_irq``

Getting these guards right is essential: an EL2-only instruction executed at EL1
(or vice versa) faults immediately at boot. See :ref:`debugging` for how such
issues are diagnosed under QEMU/GDB.
Loading
Loading