Skip to content

virtio-blk: blockdev-mirroring for live storage migration#175

Draft
Coffeeri wants to merge 32 commits into
cyberus-technology:gardenlinuxfrom
Coffeeri:blockdev-mirror-cyberus-synchronous-completions
Draft

virtio-blk: blockdev-mirroring for live storage migration#175
Coffeeri wants to merge 32 commits into
cyberus-technology:gardenlinuxfrom
Coffeeri:blockdev-mirror-cyberus-synchronous-completions

Conversation

@Coffeeri

Copy link
Copy Markdown

Note

This PR is still in draft while working on the the last TODOs. It's open now, though, to gather early feedback on the current implementation.

TLDR
Add blockdev-mirroring as live storage migration for the virtio-blk device. Without stopping the guest, the operator starts a mirror of a disk onto a destination file, waits for it to reach an in-sync state, then switches the VM to the destination. The whole flow runs over four REST endpoints.

Motivation

Some operators serve VM disk images from e.g. NFS shares mounted on the host. A share can fill up, and the operator then needs to migrate one disk image to another share with free space, without stopping the VM. Because the shares are mounted on the host, the VMM has filesystem-level access to both the source and the destination file.

We implement this in VMM rather than in a separate process. The mirror has to coordinate with VMM-level state: while it runs, the VMM must reject operations that would disturb the disk (see Design), and only the VMM can gate those.
A vhost-user-blk process could run the mirror itself and keep the swap transparent to the VMM, but it would still depend on the VMM for that gating, so we keep the mirror and the gating in one binary, which also keeps the libvirt integration simple. The vhost-user control interface is also too restricted to carry the start, progress, complete, and cancel commands and the mirror's state.

Design

  • CopyWorker: a background thread copies the source to the destination in 512 KiB blocks. All-zero blocks are punched as holes so sparse images stay sparse. Note: the granularity is somewhat arbitrarily chosen and needs to be discussed, especially regarding a finer granularity for hole punching.
  • MirroringAsyncIo: each virtqueue worker's AsyncIo backend is swapped for one that forwards reads to the source and every mutating op to both disks, and waits for both completions before acknowledging the write to the guest. A destination error degrades that queue to source-only and fails the mirror, so the guest never reads corrupted data.
  • RangeLockManager: exclusive per-range locks shared between the copy worker and the guest writes, so the background copy and a concurrent guest write never race on the same range and the destination stays consistent.

The mirror's phase is shared between the copy worker and the per-queue backends:

stateDiagram-v2
    direction LR
    [*] --> running: start
    running --> ready: copy done
    ready --> completing: complete
    completing --> completed: switched
    completed --> [*]
    running --> failed: I/O error
    ready --> failed: I/O error
    running --> cancelling: cancel
    ready --> cancelling: cancel
    failed --> cancelling: cancel
    cancelling --> [*]
Loading

A mirror can be cancelled at any time before completion, which reverts every virtqueue worker to the source and keeps the VM on the source disk. After completion it cannot be undone: by then some virtqueue workers may have switched to the destination and written there only, so there is no consistent state to roll back to without losing acknowledged writes.
A mirror that fails on a destination I/O error stays in failed, keeps the VM on the source disk, and must be cancelled to clear it.

While a mirror is active, the VMM rejects operations that would disturb the disk or the mirror's state:

  • snapshotting the VM
  • live-migrating the VM
  • resizing the mirrored disk
  • removing (hot-unplugging) the device
  • rebooting (vm.reboot), shutting down, or deleting the VM

Pausing the VM is allowed during an active mirror, but starting, completing, or cancelling a mirror is rejected while the device is paused (MirrorDevicePaused). An orderly guest reboot or shutdown resets the virtio-blk device, which cancels the mirror and reverts the queues to the source disk. The guest then restarts or powers off, with the mirror dropped rather than completed.

This approach is analogous to QEMU's blockdev-mirror with sync=full and copy-mode=write-blocking: a full background copy plus synchronous propagation of every guest write to the destination. Unlike QEMU it keeps no dirty bitmap or convergence loop, because with every write already current on the destination a single linear pass reaches a consistent state and a deterministic in-sync point. The trade-off is added write latency, which is fine for storage rebalancing.

API

All endpoints are PUT on the VMM API socket.

  • vm.disk-mirror-start - begin mirroring disk id onto destination_path.
  • vm.disk-mirror-status - report the current phase and copy progress.
  • vm.disk-mirror-complete - switch the VM to the destination (accepted only from ready).
  • vm.disk-mirror-cancel - abort and keep the VM on the source.

See docs/disk_mirroring.md for the operator workflow, failure handling, and full design.

Missing / TODO

  • ch-remote subcommands for the disk-mirror endpoints (currently API-only).
  • Expose destination handling on vm.disk-mirror-start (create a new disk vs. reuse an existing one), defaulting to requiring an existing destination.
  • Verify compatibility with block devices using direct I/O.
  • Verify that qcow2 backing files do not hinder the current implementation.
  • Add libvirt NixOS tests.

@Coffeeri Coffeeri requested review from phip1611 and scholzp June 26, 2026 10:29
@Coffeeri Coffeeri force-pushed the blockdev-mirror-cyberus-synchronous-completions branch 3 times, most recently from de5cd45 to fd7094f Compare June 26, 2026 11:02
Blockdev-mirroring for virtio-blk needs a home for its new types. Add
the mirror module with the lifecycle state, ahead of the logic that
fills it in.

MirrorPhase  the lifecycle: Running, Ready, Completing, Completed,
             Cancelling, Failed
MirrorState  the phase behind a Mutex, shared via Arc, with a guarded
             transition_to_phase that applies only the documented edges

Follow-up commits add the range lock for copy/write exclusion, the
AsyncIo wrapper that fans writes to both backends, the background copy
worker, and the virtio-blk and REST integration.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
@Coffeeri Coffeeri force-pushed the blockdev-mirror-cyberus-synchronous-completions branch from fd7094f to a307282 Compare June 26, 2026 11:18
Coffeeri added 5 commits June 26, 2026 13:35
Mirroring needs a per-queue AsyncIo that the virtio device can install
in place of the plain backend. Add the type now so later commits
introducing the shard locks and the write fan-out have something to
reference.

Every method delegates to source. alignment() is the exception
and returns max of source and dest. The request handler reads
alignment per request to choose bounce-buffer placement, and the
same iovec is later submitted to both backends, so the stricter
requirement has to win even before fan-out lands.

submit_batch_requests is left unimplemented and
batch_requests_enabled returns false.

Follow-ups add Shard-based mutual exclusion between the copy
worker and mirror writes, then rewrite write_vectored,
punch_hole, write_zeroes, fsync, and next_completed_request to
fan out to destination and pair completions via a synthetic
dest-side user_data.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
The copy worker and the virtqueue workers can both target the same
destination bytes during a mirror. Without coordination a destination
block can mix bytes from both. Add a primitive both will use to
serialise overlapping ranges.

RangeLockManager wraps a Mutex<BTreeMap<start, end_excl>> and a
Condvar.

    lock_range blocks while any held range overlaps.
    lock_iovecs locks the contiguous span of iovecs using lock_range

Wired into the `AsyncIo` impl of `MirroringAsyncIo` and the copy worker
in follow-up commits.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
The copy worker and the per-queue mirror writes need to block until an
AsyncIo backend's notifier eventfd has a completion to read. Every
backend creates its notifier with EFD_NONBLOCK, so it needs to be
polled, and the virtio-block seccomp filter allows epoll_* but not
poll/ppoll.

Add EpollWaiter, a wrapper around vmm_sys_util::epoll::Epoll that
registers one fd for readability at construction. wait() blocks until
the fd is readable, retrying on EINTR. next_completion() loops over a
backend's completions, blocking on wait() and draining the eventfd
between rounds.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Mirroring moves a virtio-blk disk to a new backend path while the guest
keeps running.
Guest writes during the move have to land on both disks. A background
copy worker streams the existing data across. The guest write path and
the copy worker can target the same byte range at the same time.

Mirror mutating guest requests to both backends via virtqueues, and
hold a range lock for each request so neither side touches the same
bytes at once.
Read requests are not mirrored and wired to the source disk.
The guest sees source's result, a destination failure moves MirrorPhase
to Failed.

Follow-ups: the copy worker that takes the same lock, and the
coordinator that handles start, cancel, complete, and rollback on
Failed.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
The blockdev-mirror, replicates guest writes to the destination, but the
destination still misses everything that was on source before the mirror
virtqueues started.
A background worker has to copy those existing bytes while the guest
keeps running.
The mirror's per-queue writers and this worker can target the same byte
range, so the worker takes the same range lock the writers hold.

CopyWorker submits reads and writes through AsyncIo so it works with any
disk format the trait supports. Each block is held under a RangeGuard
during a sequential read on source and write on destination.
Completions are awaited via EpollWaiter on the non-blocking notifier
eventfd.

On success the phase transitions to Ready. An I/O error or a spawn
failure transitions to `Failed` via the internal state machine.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
@Coffeeri Coffeeri force-pushed the blockdev-mirror-cyberus-synchronous-completions branch from a307282 to 7f2e7ba Compare June 26, 2026 11:39
Coffeeri added 17 commits June 26, 2026 15:59
Blockdev-mirroring needs to install a MirroringAsyncIo on each virtqueue
worker without restarting the threads. Restarting would be guest-visible
(a device reset) and would need to drain in-flight I/O first. Per-queue
worker state cannot be mutated from another thread, so the swap has to
happen on the worker's own thread in response to a signal.

Add the receiving side. Each virtqueue gets a BlockQueueCommandReceiver
holding a single-command slot and an eventfd. The API thread fills the
slot (from a future Block::start_mirror) and writes the eventfd. The
worker wakes on BLOCK_COMMAND_EVENT, takes the command, and applies it
via apply_block_queue_command: swap disk_image and re-register the
completion notifier on the worker's epoll set.

A BlockQueueCommand carries its kind (InstallMirror,
CompleteToDestination, CancelToSource), the replacement AsyncIo, and an
acknowledgement channel. The acknowledgement stays unused here and is
wired up when Block::start_mirror lands.

cmd_receiver is Option<BlockQueueCommandReceiver> on BlockEpollHandler,
None at construction, so the non-mirror path is unchanged: no event is
registered and no new branches fire.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
The previous commit added the receiving side of the queue command
channel to BlockEpollHandler. For Block::start_mirror to fill a
slot and write an eventfd, those handles have to exist at
activation time and be reachable from both the virtqueue worker
and Block itself.

Build a BlockQueueCommandReceiver per virtqueue when the device is
activated. A clone of the slot Arc and a clone of the eventfd are
stored on the new Block.queue_cmd_senders field. The receiver with
its eventfd clone is handed to BlockEpollHandler. The slot starts
empty and the eventfd is silent, so BLOCK_COMMAND_EVENT does not
fire and behaviour is unchanged.

queue_cmd_senders is a Vec indexed by virtqueue. It is cleared at
the start of every activation and re-populated for that
activation's virtqueues.

Follow-up: Block::start_mirror that fills each slot with an
InstallMirror command carrying a MirroringAsyncIo, writes each
evt, and spawns the copy worker.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
The device side needs something to retain while a blockdev-mirror is
active: the shared mirror state for status queries and the copy worker
handle so the thread is joined on drop.

Add BlockMirrorHandle bundling Arc<MirrorState> and CopyWorkerHandle.

Follow-up: Block::start_mirror that wires these up.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Building a MirroringAsyncIo for a virtqueue takes more than the
AsyncFullDiskFile trait offers: it needs the shared MirrorState to pair
with the copy worker, and it sets up its own waiters on the source and
destination notifiers so a mirrored write can wait for both completions
inside the write call.

Add MirroringAsyncIo::create, an associated function on the type it
builds. The caller passes source and destination as &dyn
AsyncFullDiskFile along with the MirrorState and ring depth, and the
function returns the boxed AsyncIo. The destination notifier is read
only inside MirroringAsyncIo, so the virtqueue worker keeps watching
just the source notifier.

BlockMirrorHandle gains a destination field: Block.disk_image stays the
source for the lifetime of the mirror, so the destination disk is owned
by the handle.

Follow-up: Block::start_mirror that wires up the new API.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Each virtqueue has its own AsyncIo, so mirroring needs one
MirroringAsyncIo per virtqueue, installed through the per-queue command
channel added earlier. start_mirror drives that handover and returns a
BlockMirrorHandle for the device manager to own.

It rejects a destination smaller than the source, then builds all
per-virtqueue InstallMirror commands before sending any, so a
construction failure leaves the device unchanged. It sends the commands,
waits for all acknowledgements with a timeout, and spawns the copy
worker, so a failure cannot leak a thread whose Drop blocks on join. The
worker side that acknowledges each swap after draining its old backend
lands in a follow-up commit.

On any failure after the first command was sent, the queues are reverted
to plain AsyncIo on the source disk via CancelToSource commands. A
failed revert is logged and does not mask the install error.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
After starting the blockdev-mirror, the operator must be able to observe
the current progress of the copy worker to decide whether to complete to
the destination disk or to react to a failure.

Introduce a Block::mirror_handle field and a dedicated MirrorStatus
struct, to be used in the upcoming API endpoint.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Start exposes the lifecycle entrypoint that the upcoming REST endpoint
will route to. The destination file must not exist yet. It is created
with the same image format and backend flags as the source disk so the
mirror can fan writes out to a backend that behaves identically.

Status surfaces the shared mirror state through the device manager so
operators can poll and observe progress.

Adds three DeviceManagerError variants for the new failure modes and
a BlockErrorKind::AlreadyExists for the create_disk pre-condition.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Wire disk mirroring through to the HTTP API so a running VM can be told
to start mirroring a disk onto a destination path.

Add the /vm.disk-mirror-start endpoint and its request handler, the
vm_disk_mirror_start dispatch on the VMM, and the Vm::mirror_disk
wrapper that locks the device manager and maps its error into the
vmm error type. A new DiskMirrorStart error covers the case where no
VM owns the device manager.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Expose the disk mirror status operation as a REST entrypoint so
operators can poll progress and detect terminal phases. The endpoint
returns the current phase, copied bytes, total bytes, and a failure
reason when the mirror is in the failed phase.

The PutHandler maps unknown disk id and inactive mirror to 404 so
management layers can distinguish operator errors from server faults.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
The virtio-blk queue worker calls submit_batch_requests unconditionally
on the disk image, ignoring batch_requests_enabled. The previous stub
panicked, which crashed the VM as soon as a mirrored disk processed a
batched read or write.

Dispatch In and Out to the existing read_vectored and write_vectored
methods, which already fan out to source and destination. Other request
types do not reach this path under the current request pipeline.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Complete and cancel swap a virtqueue worker's disk_image and need to
ensure no inflight request and completion pairs are pending.
Expose whether the implementation still holds request pairings the
worker has to wait on before the swap.
This is only relevant for the `MirroringAsyncIo`, as it needs to
serialize the requests and completions to two children `AsyncIo`s.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
A virtqueue worker may still hold source and destination write-pairs
when complete or cancel arrives. Swapping the wrapper out at that moment
would orphan the pending completions, leaving the guest waiting on
writes that will never be acked.

Stage the incoming BlockQueueCommand in pending_block_queue_command and
apply it only once neither the handler nor the backend reports in-flight
requests. The submit path is gated for the duration of the drain,
otherwise sustained guest writes would keep the in-flight count from
ever reaching zero.

After applying the command the worker sends the acknowledgement
and processes the avail ring directly: while the command was
pending, QUEUE_AVAIL_EVENT handling consumed the guest's kicks
without submitting, and the guest will not kick again for
descriptors it already queued.

The same protocol is reused by the upcoming complete and cancel
endpoints.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
After the copy worker reports the mirror ready, the operator needs to
switch the device to the destination disk so the source can be
detached.

Completion is allowed from MirrorPhase `Ready`, or from `Completing` as
a retry. A plain destination AsyncIo per virtqueue is pre-built before
any command is sent, so a create_async_io failure leaves the device
unchanged and the operator can retry. The state transitions to
`Completing` before the first CompleteToDestination command goes out:
from that point a queue may already write to the destination only, so
the source stops being a safe fallback and cancel is no longer allowed.
The drain protocol from the previous commit makes each worker finish
its in-flight pairings before swapping.

Failures of the command send or the acknowledgement wait panic instead
of returning an error. A partial completion splits the queues between
destination-only writers and source readers, which can serve stale
reads to the guest. There is no revert that does not lose acknowledged
writes, so we prefer the panic.

After all acknowledgements the state becomes `Completed`, the handle is
dropped and the control plane disk_image is replaced by the destination.

Two BlockErrorKind variants cover the operator-visible preconditions.
Device manager and REST plumbing follow.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Operators trigger the complete stage of blockdev-mirroring through this
entrypoint. The endpoint switches the device to the destination disk
after the copy worker reports the mirror ready.

The PutHandler maps device manager errors to HTTP status codes so
management layers can distinguish operator errors (404 for unknown disk
or no active mirror, 400 for not-yet-ready) from server faults.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Cancel reverts every virtqueue worker to a plain AsyncIo on the source
disk, transitions the mirror to Cancelling and drops the handle,
joining the copy worker and releasing the destination. The copy worker
now exits before the next block once the phase is terminal instead of
copying the remainder.

Cancel is rejected once a completion was attempted: a queue may already
write to the destination only, so reverting would lose acknowledged
guest writes. A guest-initiated device reset cancels an active mirror
before VirtioCommon::reset tears down the virtqueue workers, which must
still be alive to acknowledge the revert.

The REST plumbing for cancel comes in a follow-up commit.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Resizing or snapshotting the disk, removing the device, shutting down,
rebooting or deleting the VM and starting a live migration all
invalidate an active mirror: the destination silently falls behind or
the mirror state is lost, since it is not migratable. Reject these
operations while a mirror is active so the operator has to complete or
cancel first.

DeviceManager::active_block_mirrors lists the active mirrors and backs
the new Vm::any_active_block_mirrors check.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Block::cancel_mirror was only reachable through a guest-initiated device
reset. Give the operator a way to abort a mirror and keep the guest on
the source disk.

Wire the call through the layers:
DeviceManager::mirror_disk_cancel resolves the device and maps errors,
Vm and the RequestHandler forward the call, and a new VmDiskMirrorCancel
action backs the PUT /vm.disk-mirror-cancel endpoint.
Unknown device ids and inactive mirrors map to 404, a cancel after an
attempted completion maps to 400, and revert failures surface as
internal errors.

A failed cancel keeps the mirror handle and leaves the mirror in the
Cancelling phase. Cancel accepts that phase as a retry, so the request
can simply be retried. CancelToSource commands are idempotent per
virtqueue.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Coffeeri added 9 commits June 26, 2026 15:59
During block-dev mirroring, the CopyWorker reads each block from the
source disk and writes it to the destination. It used to write
zero-filled blocks as well, which allocates storage on the
destination for regions that hold no data.

When the destination supports sparse operations, we now check whether
a block is all zeros. If it is, we call punch_hole instead of
write_vectored. This keeps the destination as sparse as the source.

The check currently looks at the full MIRROR_BLOCK_SIZE block, so only
all-zero blocks become holes. A smaller granularity would also punch
holes inside partly-zero blocks and save more space, at the cost of
more compute per block. We leave this for a later change.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Cover the range-lock and passthrough behaviour of MirroringAsyncIo with
a mock AsyncIo, so the synchronization invariants are checked without
real disk I/O.

The tests cover:
- overlapping mirror writes complete in order under the range lock
- a copy-worker range hold blocks an overlapping guest write until it is
  released
- reads pass through to the source only
- a destination submit failure degrades the mirror to source passthrough

A watchdog thread fails a test on timeout, so a locking regression
surfaces as a failure rather than a hang.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
The synchronous mirrored write holds its range guard from acquisition
through both the source and destination completions. Nothing else pins
that lifetime, so a regression to dropping the guard early (`let _`
instead of `let _guard`) would let an overlapping lock_range acquire
while the write is still in flight, the exact race the range lock exists
to prevent.

Add guard_is_held_across_submit_and_wait and a GatedMockAsyncIo backend
whose destination completion is withheld until released from another
thread. The write parks in wait_for_completions holding its guard while
the test asserts an overlapping lock_range blocks, then acquires only
once the completion is released and the write drops the guard.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
A paused virtqueue worker is parked on its pause barrier and never
reaches its epoll loop, so it cannot pick up a staged BlockQueueCommand.
start_mirror, complete_mirror, and cancel_mirror staged the command
anyway and blocked in wait_for_mirror_queue_command_acks until the ack
timeout, then returned an error while the command lingered in the slot
and was applied late once the VM resumed, leaving the mirror
half-installed. complete_mirror additionally panicked on that timeout.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Cloud Hypervisor holds the disk image lock process-wide, so re-opening
the destination here would not trip the lock. Compare canonicalized
paths instead and refuse a destination that already backs one of the
VM's disks.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
A virtio-blk device activated by the guest, and the blockdev mirror
rebuilding a backend on a guest-initiated reset, set up io_uring rings
and eventfds on the activating vcpu thread, after that thread's seccomp
filter is installed. Allow io_uring_setup and io_uring_register plus
eventfd2 for the notifier, matching what the vmm thread already permits,
so the backend is not killed with SIGSYS.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
submit_batch_requests serializes each entry and queues a completion per
write, a submit failure mid-batch must still return Ok with one
completion per entry. Otherwise the virtqueue worker, which records the
batch as in-flight only on Ok, strands the completions already queued
for earlier entries and dies with MissingEntryRequestList.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Test the MirrorState phase state machine (allowed transitions, rejected
ones, terminal Completed, and Failed keeping its first reason), the
tracked-vs-barrier fsync split, write_zeroes mirroring, and that a
degraded mirror passes every op through to the source alone.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
Document the operator workflow (start, status, complete, cancel, failure
handling, unrecoverable errors, and conflicting operations) and the
design behind it: the CopyWorker, the MirroringAsyncIo write fan-out,
and the range lock.

On-behalf-of: SAP leander.kohler@sap.com
Signed-off-by: Leander Kohler <leander.kohler@cyberus-technology.de>
@Coffeeri Coffeeri force-pushed the blockdev-mirror-cyberus-synchronous-completions branch from 7f2e7ba to 9a65511 Compare June 26, 2026 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant