refactor: one Array class with sync+async APIs#4049
Conversation
Alternative to zarr-developers#4034 that delivers every zarr-developers#4027 goal except removing the public AsyncArray class. - Array gains a user-friendly constructor: Array(metadata, store_path, config=...) builds the wrapped AsyncArray internally; Array(async_array) still works for back-compat. - Array gains the full public async surface (*_async twins for every sync method), so callers no longer reach into the private _async_array. - Each operation has a single implementation: the async twin holds the logic and the sync method delegates via sync(). No drift between the pairs; a parity test guards their signatures. - No Runner protocol and no _AsyncArrayView/_rebind_state shared-state machinery: Array still wraps one AsyncArray, so async_array is the held object and mutation coherence is free. - getitem_async/setitem_async mirror the full basic-selection parameter surface (out=, fields=, prototype=). Tests: integrates the async/sync boundary suite (shared-state coupling, codec-pipeline-built-once, selection parity, out= zero-copy, full parameter surface). The loop-affinity test is xfail (strict) as a design-independent limitation of mixing zarr's background loop with a user-owned loop on stores with event-loop affinity. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4049 +/- ##
==========================================
+ Coverage 93.53% 93.55% +0.02%
==========================================
Files 88 88
Lines 11894 11939 +45
==========================================
+ Hits 11125 11170 +45
Misses 769 769
🚀 New features to boost your workflow:
|
mkitti
left a comment
There was a problem hiding this comment.
I think we have the pattern backwards here. If the objective is to lower overhead for the synchronous API,
- Instead of calling
syncon the asynchronous API ... - Use
asyncio.to_threadto call a synchronous function.
This way the synchronous API has as little overhead as possible. Meanwhile, we still have an asynchronous API that is still asynchronous.
| 6 | ||
| """ | ||
| return sync(self.async_array.nchunks_initialized()) | ||
| return sync(self.nchunks_initialized_async()) |
There was a problem hiding this comment.
If all the "synchronous" calls just call sync on their asynchronous versions, then we still incur all the overhead of scheduling an async task.
| delete_outside_chunks : bool, optional | ||
| If True (default), chunks that fall entirely outside the new array | ||
| shape are deleted from the underlying store. If False, those chunks | ||
| are left in place. |
There was a problem hiding this comment.
Should we be exposing new keyword arguments in the synchronous API at this time? Could we defer this change to a later pull request?
|
thanks for your review @mkitti! I should've marked this PR as a draft. It's one of a few possible solutions related to #4027, including #4034 and generated sync code. I'd be curious for your overall take on the different options. |
|
My understand of the situation is there is a technical design flaw currently in zarr-python in that it uses asynchronous tasks excessively creating a bottleneck in the scheduler. The objective to resolve this is to reduce the overhead of asynchronous calls by providing a low-overhead synchronous API. As this StackOverflow answer demonstrates the act of simply creating an asynchronous task has some significant overhead, especially when the asynchronous task is small. In the case of the SO example, the async task is just a If we want to eliminate code duplication, rather than implementing an asynchronous function and calling In the medium term, we would want to aggregate async functionality into larger tasks in order to create less burden on the async scheduler. In the long term, I recommend exploring how the API could be best adopted for true free threading. As free threading becomes more popular, using a thread pool via |
Alternative to #4034 that delivers every #4027 goal except removing the public
AsyncArrayclass.Array(metadata, store_path, config=...)builds the wrappedAsyncArrayinternally;Array(async_array)still works for back-compat.Arraygains the full public async surface (*_asynctwins for every sync method), so callers no longer reach into the private _async_array.sync(). No drift between the pairs; a parity test guards their signatures.Runnerprotocol and no_AsyncArrayView/_rebind_stateshared-state machinery:Arraystill wraps oneAsyncArray, soasync_arrayis the held object and mutation coherence is free.getitem_async/setitem_asyncmirror the full basic-selection parameter surface (out=,fields=,prototype=).Tests: integrates the async/sync boundary suite (shared-state coupling, codec-pipeline-built-once, selection parity, out= zero-copy, full parameter surface). The loop-affinity test is xfail (strict) as a design-independent limitation of mixing zarr's background loop with a user-owned loop on stores with event-loop affinity.
TODO:
docs/user-guide/*.mdchanges/