#290/#293: cross-filtered facet counts (facet_tree_cross_filter cube) + multi-tree filter collapse#298
Merged
rdhyee merged 3 commits intoJun 18, 2026
Conversation
Replace N AND-ed membership subqueries (one per active tree dim) with a single read_parquet scan using the relational-division pattern: OR within the scan, then GROUP BY pid HAVING COUNT(DISTINCT facet_type) = <#active tree dims> to enforce AND across dims. Semantics are identical (parts are AND-ed at the call site); single-dim collapses to the same one scan as before. Helps narrow/specific multi-selections; broad multi-tree selections still hit the WASM data-scale wall (real fix is the precomputed facet_tree_cross_filter cube, tracked for isamplesorg#293/isamplesorg#290). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l cross-filtered tree counts Precompute a single-active-filter cross-filter COUNT cube over the 3 SKOS trees (material/context/object_type, concept_uri keys, subtree semantics via membership) + the flat source dim. For every single active filter it stores COUNT(DISTINCT pid) for every OTHER dim's node/value, plus a baseline. Schema mirrors facet_cross_filter (~1k rows). Fixes isamplesorg#290's global cross-filtered tree counts: describeCrossFilters zeroes tree-dim selections at global view (to avoid the live membership near-full scan that hits the DuckDB-WASM data-scale wall, isamplesorg#293). The explorer now reads the precomputed cube for a single active filter at global view instead — the effective single filter is read directly from the controls (so it sees tree nodes even when zeroed), ahead of the baseline early-return. Any miss/error (incl. cube not yet published on R2) falls through to existing paths unchanged. Builder: build_facet_tree_cross_filter() (self-join of membership ∪ source). Validator: AI-free cross-file gate — re-derives the cube from the written membership + facets and diffs symmetrically; baseline == tree_summaries. Tests: tree fixture (vocab + samples) asserting explicit known cube counts (catches builder-logic bugs) + validator-gate-bites-on-corruption. 16 pass. Verified against deployed isamples_202608 data: cube == live isamplesorg#290 formula (exact), 1,018 rows, validator green, corruption caught. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cube P1 (regression): the explorer cube fast-path now runs ONLY when all tree dims are rendered as trees (TREE_DIM_KEYS.every(treeActive)). In flat mode (?facets=flat) the cube's subtree-membership semantics are wrong for a flat dim and a flat-mode selection isn't representable — so defer to the flat-cube/slow paths entirely (no flat-count regression). P2 (builder): --only facet_tree_cross_filter silently built nothing (hierarchy guard omitted it). Added to the build guard + explicit-vocab requirement set; tests for both --only success and the no-vocab loud failure. P3 (validator): EXCEPT is set-semantics so a DOUBLED cube passed. Added a grain/uniqueness gate over (all filter cols, facet_type, facet_value); test proves it bites on a doubled cube. 19 tests pass; all 3 cube gates green on deployed isamples_202608 data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 18, 2026
rdhyee
added a commit
to rdhyee/isamplesorg.github.io
that referenced
this pull request
Jun 20, 2026
… global view via mask index The single-filter cube (isamplesorg#298) serves correct cross-filtered counts for exactly ONE active filter at global view. With 2+ filters, effectiveSingleFilter() returns null, the cube is skipped, and updateCrossFilteredCounts falls to the legacy early-return that reverts every tree dim to the UNFILTERED baseline (explorer.qmd) — the isamplesorg#304 bug (e.g. Material=anthropogenic + Object=artifact showed 261,086 artifacts instead of the cross-filtered 145,770). Fix: route the global / full-tree-mode / no-search / multi-filter case through the complete per-pid index (sample_facet_index, isamplesorg#305/isamplesorg#306) with the node_bits bitmask predicate: - directFilterSnapshot(): reads ALL selections from the controls WITHOUT zeroing tree dims at global view (that zeroing in describeCrossFilters is the root cause). - maskIndexWhere(): per-dim cross-filter predicate (OR within a dim via the bit mask, AND across dims, exclude-self); source-impossible zeroes tree targets but not source's own histogram (the plan's special case). - applyMaskIndexCounts(): one columnar bitmask query per dim, applied ATOMICALLY after a single facetCountsReqId stale check; COUNT(*) over the one-row-per-pid index == distinct pids. - HONESTY RULE: on query failure → 'unavailable' (a "(—)" dash, .count-unavailable), NEVER the baseline. If the index/bitmap isn't usable yet (unpublished) → 'fallthrough' to the legacy paths unchanged, so this is safe to ship first. - facetIndexReady preflight: index present + schema_version==1 + build_id membership-half == node_bits generation; else stays unavailable, not wrong. Single-filter cube kept as a pure optimization. Viewport + search still take the live membership path (Phase 3). Verified on real 202608 data: index covers 6,026,242 located pids vs masks' 5,996,325 — exactly the 29,917 isamplesorg#306 samples recovered; full validator passes; Eric's case returns 145,770 (cross-filtered) not 261,086 (baseline). Refs isamplesorg#305 isamplesorg#304 isamplesorg#306
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #296 (facet values/counts now recompute under a filter at global view).
What
Promotes two #293-track changes, verified green on rdhyee staging:
facet_tree_cross_filtercube (explorer: live viewport/cross-filtered counts for the Material facet tree (#281 follow-up) #290) — precomputed single-active-filtercross-filter COUNT cube over the 3 SKOS trees (material/context/object_type,
concept_uri keys, subtree semantics via membership) + the flat
sourcedim.The explorer reads it at global view for a single active filter, so selecting
e.g. Specimen Type "Artifact" now updates the other facets' counts instantly
instead of leaving them at the unfiltered baseline. This is exactly the
pre-cache-single-value-per-facet design Eric proposed (2026-04-08), with
on-the-fly fallback for uncached (multi-value/zoomed/search) cases.
build_facet_tree_cross_filter()(self-join of membership ∪ source).membership + facets and diffs symmetrically; grain-uniqueness gate; baseline
== tree_summaries.
corruption +
--onlyorchestration. 19 pass.isamples_202608_facet_tree_cross_filter.parquetalready published toR2 (1,018 rows, validated == live formula on the deployed data).
Multi-tree filter collapse (explorer: multi-tree facet filtering does N membership scans (slow in WASM at scale) — combine into one scan / cube #293) — collapse N AND-ed membership subqueries
into ONE scan (relational division: OR within,
GROUP BY pid HAVING COUNT(DISTINCT facet_type)=Nacross). Identical semantics; helps narrowmulti-selections. Broad multi-tree map filtering still hits the WASM data-scale
wall — the real fix there (a map/h3 cross-filter artifact) is a separate phase.
Scope / honesty
multi-value, viewport-scoped, and active-search cases fall through to the
existing live path (correct, just not precomputed).
prior baseline/flat-cube/slow paths (no regression; flat mode
?facets=flatexplicitly deferred).
Verification
🤖 Generated with Claude Code