Skip to content

Version 3 with cached cross chunk edges#454

Draft
akhileshh wants to merge 351 commits into
mainfrom
pcgv3
Draft

Version 3 with cached cross chunk edges#454
akhileshh wants to merge 351 commits into
mainfrom
pcgv3

Conversation

@akhileshh

@akhileshh akhileshh commented Aug 6, 2023

Copy link
Copy Markdown
Contributor
  • Adds a new column family for cached cross chunks edges.
  • Adds MaxAgeGCRule for previous column family with supervoxel cross chunk edges; only needed during ingest and they get deleted eventually.
  • Edits make use of cached cross chunk edges.

Summary of changes in pychunkedgraph.ingest:

  • Layer 2 creation is mostly unchanged; stores cross chunk edges with supervoxels
    • The column family used to store these edges now has a max age garbage collection rule
    • During ingest, these edges can be used to cache higher layer cross chunk edges; will be deleted eventually by BigTable's garbage collection routines.
  • When ingesting layer 3, cross edges for children (layer 2) get updated and "lifted" by using the previously mentioned supervoxel cross chunk edges, these have a different column family so they're retained forever.
    • At the same time, cross edges for parents at layer 3 will get created by merging cross edges of their children, these are intermediate and will be lifted when ingesting the next parent layer.
  • For each layer > 3 until root layer:
    • Update children cross chunk edges by "lifting" the edges created during the previous layer ingest.
    • Add parent cross chunk edges by merging children cross chunk edges; they will be updated when ingesting the next layer.

This assumes all chunks at lower layer have been created before creating the current layer so we can no longer queue parent chunk jobs automatically when its children chunks are complete.

We must now ingest/create one layer at a time.

Summary of changes in pychunkedgraph.graph.edits:

  • Edits are expected to be faster now; going to layer 2 to extract cross chunk edges is no longer necessary since they're cached at each layer.
  • During an edit, these cached cross chunk edges must be updated from both directions - to and from the newly created nodes and its existing neighbors.
    • Most changes in this module are to handle this step.
    • Caching these edges has also made the edits logic simpler and cleaner.
    • When updating new cross edges, we need to ensure descendants get replaced by the highest parent.
    • For splits, we need to filter out inactive cross edges after the local graph is read from bucket storage.

@akhileshh akhileshh requested a review from sdorkenw August 6, 2023 20:04
@akhileshh akhileshh changed the title WIP WIP V3 Aug 11, 2023
@akhileshh akhileshh marked this pull request as ready for review August 23, 2023 22:56
@akhileshh akhileshh changed the title WIP V3 Version 3 with cached cross chunk edges Aug 23, 2023
@akhileshh akhileshh requested a review from fcollman August 24, 2023 00:43
Comment thread pychunkedgraph/graph/cache.py Outdated
return cross_edges_decorated(node_id)

def parents_multiple(self, node_ids: np.ndarray, *, time_stamp: datetime = None):
node_ids = np.array(node_ids, dtype=NODE_ID)

@nkemnitz nkemnitz Sep 6, 2023

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw this here (and some other places) - same as in #458: np.array will by default create a copy. np.asarray will avoid copies, if the requirements are already met.

@sdorkenw sdorkenw left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good besides the one point - a tricky one though - that I marked

Comment thread pychunkedgraph/graph/edits.py Outdated
new_cx_edges_d[layer] = edges
assert np.all(edges[:, 0] == new_id)
cg.cache.cross_chunk_edges_cache[new_id] = new_cx_edges_d
entries = _update_neighbor_cross_edges(

@sdorkenw sdorkenw Sep 8, 2023

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this here can introduce problems if a neighboring node is a neighbor to multiple new_l2_ids.

_update_neighbor_cross_edges looks right to me. It writes a complete new set of L2 edges for a node. But if the same node is updated multiple times, then only the last update is reflected. Maybe the logic here takes care of this somehow but then it still introduces multiple unnecessary writes.

So, if I am correct about this, the solution would be to consolidate this call across all new_l2_ids to only make one call per neighboring node id.

Comment thread pychunkedgraph/graph/edits.py Outdated
new_cx_edges_d[layer] = edges
assert np.all(edges[:, 0] == new_id)
cg.cache.cross_chunk_edges_cache[new_id] = new_cx_edges_d
entries = _update_neighbor_cross_edges(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue as above

akhileshh and others added 30 commits June 7, 2026 22:14
The consumer multiplies vol in place by the mask and background
voxels are already 0, so `(vol != 0) &` adds no information.
Also swap the final np.unique reduction in seg_unique to
fastremap.unique for consistency with the per-chunk path.
… conda

The slim base ships its own /usr/local/lib/libpython, which the loader
otherwise pairs with conda-built C extensions and segfaults in
PyObject_Hash on first import.
uwsgi master checks RSS in uwsgi_close_request() — the response is
fully flushed and the worker is between requests when the check
fires. Bounds per-worker allocator residue from heavy edit ops
without affecting in-flight work in preforking sync mode.
Edit operations dump {WATERSHED}/graphene_errors/{cg.graph_id}/{op_id}.json
on AssertionError/RuntimeError/unknown Exception with op type, user,
inputs, exception class+message, traceback. err_dump.read_err_artifact
reads it back. Assertion messages across cutting, edits, edges/stale,
sv_split/edges, sv_split/edits, and operation now carry the values that
disagreed (root→l2_count, parents, duplicates, new_id vs got, chunk
mismatch pairs, stale nodes) and the broken positional logger.error in
CreateParentNodes is rewritten as an f-string so id/parent/root actually
reach the log.
… future work

NOTES.md captures the seg-read-union + subgraph-union dedups (opt-in
when n_tasks > 1, byte-equal single-rep path) so the plan stays
discoverable in-repo. README §9 surfaces the new
{WATERSHED}/graphene_errors/{cg.graph_id}/{op_id}.json artifact path
and the read_err_artifact helper.
Add pychunkedgraph/pipeline/: a workload-agnostic core (grid scatter, per-chunk
Bigtable lock, exit-code contract, worker harness) shared by ingest and meshing
subpackages. Ingest builds L2/parent chunks under a per-chunk lock; meshing runs
marching cubes / sharded stitching, idempotent, plus one-shot mesh-metadata setup.
Self-contained except the chunk-compute functions; dispatch.py is the only
branch-specific shim. Entrypoints: python -m pychunkedgraph.pipeline.{ingest,meshing}[.setup].
Adapt ingest dispatch/setup to pcgv3 chunk builders + graph classes; add migrate +
migrate_cleanup (--clean) workloads. Upgrade clean is a function arg, earliest_ts
reads cached meta (set in migrate setup); no CLEAN_CHUNKS/EARLIEST_TS env.
The bigtable.data client leaves a non-daemon channel-refresh thread that atexit
join()s forever, so workers hung after finishing a batch until the pod grace
period SIGKILLed them (exit 137, breaking exit-42 FailIndex). os._exit with the
real return code once stdio is flushed.

Co-Authored-By: Claude <noreply@anthropic.com>
lock.py used the classic bigtable client (conditional_row) which the data
client doesn't have -> AttributeError on acquire. Rewrite acquire/renew/release
on kvdbclient lock_by_row_key, keep the done marker via mutate_row/_read_byte_row;
worker passes cg.client.

Co-Authored-By: Claude <noreply@anthropic.com>
The package handler propagated to root, so entrypoints with a root
handler printed every record twice; chunk coords rendered as np.int64.

Co-Authored-By: Claude <noreply@anthropic.com>
The root-layer pod runs the sanity suite as its final step, so every
ingest ends verified; existence() now raises instead of only printing
diagnostics. A failed check fails the pod without re-opening the chunk.

Co-Authored-By: Claude <noreply@anthropic.com>
The .setup one-shots returned normally, so graph I/O left the bigtable.data
channel thread hanging the pod (mesh-meta stalled ~20m). Add run_and_exit in
pipeline/__init__ — main() code or 0, SystemExit code, else traceback+1, then
flush + os._exit — and call it from all six worker/setup entrypoints, replacing
the three inline copies.

Co-Authored-By: Claude <noreply@anthropic.com>
setup's cg.create() raises ValueError on an existing table; --exist-ok
catches it and skips (resume-safe), without it the error surfaces.

Co-Authored-By: Claude <noreply@anthropic.com>
Drop the vendored grid/harness/lock/exit_codes for the shared
cave_pipeline.distribution package so the operator and every worker compute
the same chunk-scatter bijection from one source; workers inject cg_factory
and layer_bounds into the generic harness.

Co-Authored-By: Claude <noreply@anthropic.com>
Picks up the 1-byte chunk-done marker the ingest workers write.

Co-Authored-By: Claude <noreply@anthropic.com>
meta resolution/bounds derive from the watershed info JSON; sv lookup
and the seg fallback read voxels via a neuroglancer_precomputed handle.
cloud-volume stays a lazy ws_cv hatch (meshing/diagnostics).

Co-Authored-By: Claude <noreply@anthropic.com>
nested imports (graph_tool via a _graph_tool shim) keep graph_tool,
scipy, pandas, networkx, and cloudfiles off the cold import path; first
use pays the load. bump kvdbclient to 0.7.1 (drops its cloud-volume).

Co-Authored-By: Claude <noreply@anthropic.com>
…ds_by_label

fastremap 1.20.0 emits 6-conn boundary voxels per label natively, so the
`_label_boundary_mask` axial-diff pass and the `vol *= mask` mutation
both go away. The point_cloud output (and therefore downstream KDTree
min-distance queries) is unchanged.

Co-Authored-By: Claude <noreply@anthropic.com>
ws_ts_scale(mip) reads the target scale (non-OCDBT mip>0 read mip 0).
Mesh block size derives per-axis from the watershed pyramid, so
chunk_size leaves mesh_config; setup rejects an out-of-range mip.
Tests mock the tensorstore watershed reads.

Co-Authored-By: Claude <noreply@anthropic.com>
The pipeline entrypoint carried a verbatim copy of setup_mesh_meta and
MeshConfig that still required mesh_config.chunk_size, so mesh-meta
failed once chunk_size was dropped from the dataset yaml. Import the
single source of truth from pychunkedgraph.meshing instead.

Co-Authored-By: Claude <noreply@anthropic.com>
PCG isn't on PyPI, but the image must report the pushed tag. The image
pip-installs the package (--no-deps) with the tag fed to setuptools_scm
via a cloudbuild build-arg; the hand-bumped literal + bumpversion are gone.

Co-Authored-By: Claude <noreply@anthropic.com>
Non-meshing modules (app routes, sv-split profiler, pipeline/ingest
entrypoints) imported meshing eagerly, pulling cloudvolume on import.
Nest those imports so cv loads only when meshing actually runs.

Co-Authored-By: Claude <noreply@anthropic.com>
Meshing splits initial from edited roots at a timestamp boundary, but
sampling one root is unreliable: skip connections spread root creation
times across layers. Instead, stamp earliest_ts when the root layer is
written — the explicit cell timestamp shared by every root, lifted +500ms
so the boundary sits strictly above them. get_earliest_timestamp returns
it pre-edit; derive_initial_ts consumes it. Migrate no longer clobbers an
ingest-stamped value.

Co-Authored-By: Claude <noreply@anthropic.com>
setuptools_scm rejects non-PEP-440 strings, so pushing a build-label tag
(not a semver) failed the image build. Pass the version build-arg only for
version-like tags; other tags build with the Dockerfile default untouched.

Co-Authored-By: Claude <noreply@anthropic.com>
A table copied/restored under a new graph_id must not let its meshes
alias the source's. Hinge on dynamic_mesh_dir: an explicit graph-suffixed
value shares initial meshes, so re-derive only the dynamic subdir; a bare
"dynamic" or unset value gives the copy a private per-graph top-level dir.
Move the whole rewrite into ChunkedGraphMeta.for_copied_graph so the graph
class makes a single call.

Co-Authored-By: Claude <noreply@anthropic.com>
Replace setuptools_scm (built 0.0.0 without a reachable semver tag) with a
committed _version.py the release workflow bumps, commits, and tags in
lockstep; setup.py reads it and the image build drops the version arg. Gate
the workflow's Helm-chart update behind an opt-in input and add a workflows
README. Versioning is per branch: main 2.x, pcgv3 3.x.

Co-Authored-By: Claude <noreply@anthropic.com>
At chunk_layer + 1 == layer_count the parent column is rank-1 and the
existing layer_agreement / np.where path returns chunk_layer instead of
the root. Short-circuit to layer_count so meshes traversing up from the
second-to-top layer reach the root.

Co-Authored-By: Claude <noreply@anthropic.com>
Supervoxel splitting with base+fork support and locks
scipy 1.17.1's Python 3.14 wheel ships without the scipy._external
subpackage, so `from scipy import ndimage` raises ModuleNotFoundError
and every meshing/sv-split test errors during collection.

Co-Authored-By: Claude <noreply@anthropic.com>
numpy is C-ABI-bound to conda's graph-tool-base, yet pip-compile also pinned it,
so the image ran conda's latest against pip's downgraded lock — a broken mixed
install. Pin it once in requirements.yml; [tool.pip-tools] unsafe-package keeps
pip-compile from re-adding it to requirements.txt.

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants