Parcels-code · willirath · Jun 17, 2026 · Jun 17, 2026 · Jun 17, 2026 · Jun 17, 2026
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,6 @@
+*.parquet
+*.zarr/
+
+# local agent/editor state
+.claude/
+__pycache__/
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -31,10 +31,6 @@ repos:
     rev: v3.8.3
     hooks:
       - id: prettier
-  - repo: https://github.com/kynan/nbstripout
-    rev: 0.9.1
-    hooks:
-      - id: nbstripout
   - repo: https://github.com/ComPWA/taplo-pre-commit
     rev: v0.9.3
     hooks:

diff --git a/cmems_global/.gitattributes b/cmems_global/.gitattributes
@@ -0,0 +1,2 @@
+# SCM syntax highlighting & preventing 3-way merges
+pixi.lock merge=binary linguist-language=YAML linguist-generated=true -diff
diff --git a/cmems_global/.gitignore b/cmems_global/.gitignore
@@ -0,0 +1,10 @@
+# pixi environments
+.pixi/*
+!.pixi/config.toml
+
+# python bytecode
+__pycache__/
+
+# regenerable simulation outputs
+*.parquet
+*.zarr/
diff --git a/cmems_global/README.md b/cmems_global/README.md
@@ -0,0 +1,218 @@
+# cmems_global
+
+> **Disclaimer:** Most of the code in this directory was written by Claude
+> (Anthropic's Claude Code), under human direction.
+
+CMEMS global ocean experiments driven by [OceanParcels](https://github.com/parcels-code/Parcels).
+
+## Layout
+
+```
+cmems_global/
+├── pixi.toml / pixi.lock     # one pixi env per parcels git rev (see below)
+├── scripts/                  # env setup (run from the cmems_global/ dir)
+│   ├── configure_pixi.sh
+│   └── register_kernels.sh
+├── sandbox/                  # throwaway exploration / benchmarks (not prod)
+│   └── parallel_exploration/ # numba-JIT kernel + parallelism experiments
+└── notebooks/                # jupytext-paired .py / .md / .ipynb
+    ├── 01_retrieve_data.*
+    ├── 01a_zarr_v2_copy.*
+    ├── 02a_run_parcels.*
+    ├── 02b_run_parcels.*
+    ├── 02c_run_parcels.*
+    ├── 02d_run_parcels.*
+    ├── 02e_run_parcels.*
+    └── 02f_run_parcels.*
+```
+
+- `notebooks/01_retrieve_data` — pull CMEMS global `uo`/`vo` via
+  `copernicusmarine`, fill land NaNs with 0, and store as unpacked `float32`
+  zarr (so the raw-zarr reader in `02c` sees real velocities, not packed int16).
+  Writes `cmems_uovo_2001.zarr` in **zarr format 3** (the modern stack default).
+- `notebooks/01a_zarr_v2_copy` — write a **zarr format 2** copy
+  (`cmems_uovo_2001_zarr2.zarr`) of the field store. Needed because `02f`'s
+  parcels-v3 env is pinned to `zarr < 3`, which cannot read the zarr-v3 original.
+  Runs on the `main` env (zarr 3 reads the v3 source and writes a v2 copy); the
+  original is left untouched, so `02a`–`02e` are unaffected.
+- `notebooks/02a_run_parcels` — advect 1000 particles on the `main` build (plain
+  `FieldSet`).
+- `notebooks/02b_run_parcels` — advect 100k particles on the windowed-array build
+  (PR #2671), via `fieldset.to_windowed_arrays(...)`.
+- `notebooks/02c_run_parcels` — advect 1000 particles on the raw-zarr build
+  (PR #2668), loading the store with `parcels.open_raw_zarr` behind a zarr
+  `CacheStore` (in-memory, dask-free). Mirrors the `zarr-with-cache` mode of
+  `raw_zarr_testing/raw_zarr_profiling.py` on the `raw_zarr_profiling` branch.
+- `notebooks/02d_run_parcels` — advect **1M** particles on the windowed-array
+  build (PR #2671), but replace the single-threaded parcels kernel with a
+  `numba.njit(parallel=True)` fused `AdvectionRK4` over all cores. The windowed
+  fieldset is used only as the IO layer (load each time-level slab once per
+  window); the JIT kernel does the index search + trilinear interp + RK4 combine.
+  Driver-level only (no parcels patch); specific to this regular A-grid.
+- `notebooks/02e_run_parcels` — same JIT kernel as `02d`, but with the `02c`
+  zarr-`CacheStore` IO layer (PR #2668) instead of windowed arrays — so the two
+  IO strategies can be compared under the identical fast kernel.
+- `notebooks/02f_run_parcels` — advect **1M** particles using **native parcels
+  v3 JIT** (`parcels.JITParticle` + `parcels.AdvectionRK4`, v3's own JIT-compiled
+  C kernel), on a `FieldSet.from_xarray_dataset` whose fields are **eager-loaded
+  once** up front (top 2 depth levels, all times) so there is no IO during the
+  run. The idiomatic-v3 reference point for the series (no custom kernel, no
+  driver loop). Reads the **zarr-v2** copy from `01a` (its env pins `zarr < 3`).
+  Output is buffered in an in-memory `zarr.MemoryStore` during the run and dumped
+  to the on-disk `.zarr` in a single `zarr.copy_store` (Lustre-friendly: written
+  once, not streamed).
+
+The notebooks are jupytext-paired (`.py` / `.md` / `.ipynb`); the `.py`
+(py:percent) is the source of truth — see [Notebooks](#notebooks-jupytext) below.
+
+## Timings — 9-day advection, dt 2 h (one Levante node, 28 cores)
+
+Wall time of the **advection run** in each notebook — the `pset.execute` /
+integration loop only, excluding the one-time field load and plotting. The numba
+notebooks (`02d`/`02e`) use all 28 cores via `njit(parallel=True)`; every other
+row is single-threaded.
+
+| nb    | particles | kernel                         | IO layer                           | run wall                  |
+| ----- | --------- | ------------------------------ | ---------------------------------- | ------------------------- |
+| `02a` | 1,000 ¹   | parcels v4 native (Python)     | plain eager `FieldSet` (`main`)    | 311 s (5:11)              |
+| `02b` | 1,000,000 | parcels v4 native (Python)     | windowed array (PR #2671)          | 540 s (9:00)              |
+| `02c` | 1,000,000 | parcels v4 native (Python)     | raw zarr + `CacheStore` (PR #2668) | 974 s (16:14)             |
+| `02d` | 1,000,000 | **numba** `njit(parallel)` (C) | windowed array                     | **20.7 s** (kernel 7.5 s) |
+| `02e` | 1,000,000 | **numba** `njit(parallel)` (C) | raw zarr + `CacheStore`            | **15.4 s** (kernel 8.2 s) |
+| `02f` | 1,000,000 | parcels **v3 native JIT** (C)  | eager full-load                    | 152 s (2:29)              |
+
+¹ `02a` runs only 1,000 particles, so it is not comparable to the 1M rows; the
+~5 min is dominated by per-step overhead that is largely independent of particle
+count (the `np.stack` field re-gather in v4's interpolation).
+
+At 1M particles the parcels **v4 native** Python kernel takes 9–16 min (IO-layer
+dependent); parcels **v3's native C JIT** does it in ~2.5 min; and a **numba
+`njit(parallel)`** kernel across 28 cores brings the advection itself to ~15–21 s
+— the kernel compute alone is ~8 s, the rest is field IO. (Single runs on one
+node; indicative, not rigorous benchmarks.)
+
+## Pixi environments — one per parcels git rev
+
+This workspace keeps a single shared conda stack (Python, xarray, dask,
+copernicusmarine, jupyterlab, …) and layers several pinned **parcels** builds on
+top of it as separate pixi environments. This lets us compare parcels revisions
+side by side from one directory, each as its own JupyterHub kernel.
+
+| pixi env                | parcels rev                                                                                                                                                                              | SHA (resolved 2026-06-18) |
+| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------- |
+| `main`                  | `parcels-code/Parcels` `main`                                                                                                                                                            | `481decc`                 |
+| `pr2671-windowed-array` | PR [#2671](https://github.com/parcels-code/Parcels/pull/2671) "Issue 2656 windowed array" head                                                                                           | `8136bf5`                 |
+| `pr2668-open-raw-zarr`  | PR [#2668](https://github.com/parcels-code/Parcels/pull/2668) "Add `open_raw_zarr` helper" head                                                                                          | `97c3324`                 |
+| `v3`                    | conda-forge `parcels` v3 release (`>=3.1,<4`) — native v3 JIT reference for `02f`. **Self-contained** (`no-default-feature`): pins `zarr<3` + Python 3.12, the combo parcels 3.1 targets | `3.1.0` (installed)       |
+
+The shared deps live in pixi's implicit `default` feature, which is merged into
+every environment automatically (see `pixi.toml`). The bare `default`
+environment carries no parcels and gets no kernel (`register_kernels.sh` skips
+it); `pixi install --all` does still build it, but it is otherwise unused.
+
+## Setup on DKRZ Levante
+
+Run from the `cmems_global/` directory, in order (each step is safe to re-run):
+
+```bash
+bash scripts/configure_pixi.sh     # 1. global pixi config: envs on $HOME, cache on /scratch
+pixi install --all                 # 2. install every named env (uses the /scratch cache)
+bash scripts/register_kernels.sh   # 3. register one JupyterHub kernel per env
+```
+
+- `scripts/configure_pixi.sh` parks per-project envs on `$HOME` (VAST) and points
+  the default package cache at `/scratch` (DKRZ purges it; `$HOME` quota stays
+  clean). Global and idempotent.
+- `pixi install --all` installs every environment in `pixi.toml` into its
+  detached `$HOME` prefix. No wrapper script is needed once the cache lives on
+  `/scratch` — the tarballs land there and get purged on DKRZ's schedule, while
+  the installed envs on `$HOME` are self-contained and survive a cache purge.
+- `scripts/register_kernels.sh` writes one `kernel.json` per environment, each
+  launching `pixi run --environment <env>` so pixi activation applies; `PATH` is
+  pinned in the spec because the DKRZ spawner does not source `~/.bashrc`. It
+  resolves the workspace root from its own location, so `$PWD` doesn't matter.
+
+In JupyterHub, the kernels appear as:
+
+- `Pixi: cmems_global (main)`
+- `Pixi: cmems_global (pr2671-windowed-array)`
+- `Pixi: cmems_global (pr2668-open-raw-zarr)`
+
+Pick the kernel matching the parcels rev you want to run a notebook against.
+
+To run a notebook headless against a specific env:
+
+```bash
+pixi run --environment pr2671-windowed-array python -m ipykernel_launcher ...
+# or drop into a shell:
+pixi shell --environment main
+```
+
+## Notebooks (jupytext)
+
+Each notebook exists in three jupytext-paired forms; **edit the `.py`** (it is
+the source of truth) and re-sync — never hand-edit the `.md` or `.ipynb`:
+
+- `<nb>.py` — py:percent source of truth (plain Python; lint/run it directly).
+- `<nb>.md` — markdown rendering for readable diffs (generated).
+- `<nb>.ipynb` — Jupyter/JupyterHub execution form (generated). Execution outputs
+  are kept (the repo's `nbstripout` pre-commit hook was removed).
+
+Each notebook pins its kernel in the `.py` frontmatter: `01`/`02a` use
+`cmems_global-main`, `02b`/`02d` use `cmems_global-pr2671-windowed-array`,
+`02c`/`02e` use `cmems_global-pr2668-open-raw-zarr`, and `02f` uses
+`cmems_global-v3` (the conda-forge `parcels` v3 release `3.1.0`). The `02d`/`02e`
+JIT notebooks reach into parcels _private_ internals (windowed-array cache; the
+raw zarr handle), so each pins the exact parcels commit it was verified against
+in a note near the top. `02f` uses only the **public** parcels v3 API
+(`FieldSet.from_xarray_dataset`, `JITParticle`, `AdvectionRK4`, `ParticleFile`),
+so it pins the conda-forge release version rather than a git commit.
+
+The `02*` notebooks read the input store through a papermill `parameters`-tagged
+cell — `data_dir = "/work/bk1450/b381575/elphe-hackathon_data"` (absolute) — so
+the path can be overridden per run without editing the body.
+
+```bash
+# after editing a .py, propagate to .md and .ipynb (no execution):
+pixi run -e main jupytext --sync notebooks/02a_run_parcels.py
+
+# run a quick headless sanity check (writes its own outputs, e.g. 02a_trajectories.parquet):
+MPLBACKEND=Agg pixi run -e main python notebooks/02a_run_parcels.py
+```
+
+`jupytext` is part of the shared conda deps, so it is available in every env.
+Sanity-check each notebook on its own env, e.g.
+`pixi run -e pr2671-windowed-array python notebooks/02b_run_parcels.py` or
+`pixi run -e pr2668-open-raw-zarr python notebooks/02c_run_parcels.py`.
+
+## Adding or bumping a parcels rev
+
+1. Resolve the rev to a full SHA (reproducible even after a branch moves):
+
+   ```bash
+   git ls-remote https://github.com/parcels-code/Parcels.git refs/heads/main
+   git ls-remote https://github.com/parcels-code/Parcels.git refs/pull/<N>/head
+   ```
+
+2. In `pixi.toml`, add/update a `[feature.<name>.pypi-dependencies]` block
+   pinning `parcels` to that SHA, and add a matching entry under
+   `[environments]`.
+3. Re-run `pixi install --all` then `bash scripts/register_kernels.sh` to
+   re-solve the lock, install the env, and register its kernel.
+
+## Background / references
+
+Why envs live on `$HOME` and the cache on `/scratch`:
+
+- DKRZ recommends `$HOME` (VAST) for conda-style envs and discourages `/work`
+  (Lustre): <https://docs.dkrz.de/doc/levante/code-development/python.html>,
+  <https://docs.dkrz.de/doc/levante/file-systems.html>
+- `/scratch` has a 14-day purge, so an ephemeral per-run cache avoids stale
+  mtime/atime issues and `$HOME` quota use:
+  <https://docs.dkrz.de/doc/levante/containers/singularity.html>
+- JupyterHub kernels on DKRZ:
+  <https://docs.dkrz.de/doc/software&services/jupyterhub/kernels.html>
+- Reference parcels DKRZ setup:
+  <https://github.com/geomar-od-lagrange/2025_dkrz_setup>
+- pixi config knobs (`detached-environments`, `cache.root`):
+  <https://pixi.sh/latest/reference/pixi_configuration/>
diff --git a/cmems_global/notebooks/01_retrieve_data.ipynb b/cmems_global/notebooks/01_retrieve_data.ipynb
@@ -0,0 +1,102 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "3d96fd30",
+   "metadata": {},
+   "source": [
+    "# Retrieve CMEMS global fields\n",
+    "\n",
+    "Pull daily global `uo`/`vo` (2001-01-01..2001-01-10) from CMEMS via\n",
+    "`copernicusmarine` and write them to a local zarr store.\n",
+    "\n",
+    "Land NaNs are filled with 0 and the fields are stored as plain `float32`\n",
+    "(`drop_encoding` removes the source int16 packing) so that `02c`, which reads\n",
+    "the store raw via `parcels.open_raw_zarr` (no CF-decoding), sees real\n",
+    "velocities rather than packed integers."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c81cf681",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pathlib import Path\n",
+    "\n",
+    "import copernicusmarine"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f2041965",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "output_path = \"/work/bk1450/b381575/elphe-hackathon_data\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "931bc280",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ds = copernicusmarine.open_dataset(dataset_id=\"cmems_mod_glo_phy_my_0.083deg_P1D-m\")\n",
+    "ds"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fee65309",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ds = ds[[\"uo\", \"vo\"]].sel(time=slice(\"2001-01-01\", \"2001-01-10\"))\n",
+    "ds"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "72a0f87c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ds = ds.fillna(0.0)\n",
+    "ds[\"uo\"] = ds[\"uo\"].astype(\"float32\")\n",
+    "ds[\"vo\"] = ds[\"vo\"].astype(\"float32\")\n",
+    "ds.drop_encoding().to_zarr(Path(output_path) / \"cmems_uovo_2001.zarr/\", mode=\"w\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "cell_metadata_filter": "-all",
+   "formats": "py:percent,md,ipynb"
+  },
+  "kernelspec": {
+   "display_name": "Pixi: cmems_global (main)",
+   "language": "python",
+   "name": "cmems_global-main"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.14.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# SCM syntax highlighting & preventing 3-way merges
		pixi.lock merge=binary linguist-language=YAML linguist-generated=true -diff