Skip to content

#290/#293: cross-filtered facet counts (facet_tree_cross_filter cube) + multi-tree filter collapse#298

Merged
rdhyee merged 3 commits into
isamplesorg:mainfrom
rdhyee:promote/293-tree-cross-filter-cube
Jun 18, 2026
Merged

#290/#293: cross-filtered facet counts (facet_tree_cross_filter cube) + multi-tree filter collapse#298
rdhyee merged 3 commits into
isamplesorg:mainfrom
rdhyee:promote/293-tree-cross-filter-cube

Conversation

@rdhyee

@rdhyee rdhyee commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Closes #296 (facet values/counts now recompute under a filter at global view).

What

Promotes two #293-track changes, verified green on rdhyee staging:

  1. facet_tree_cross_filter cube (explorer: live viewport/cross-filtered counts for the Material facet tree (#281 follow-up) #290) — precomputed single-active-filter
    cross-filter COUNT cube over the 3 SKOS trees (material/context/object_type,
    concept_uri keys, subtree semantics via membership) + the flat source dim.
    The explorer reads it at global view for a single active filter, so selecting
    e.g. Specimen Type "Artifact" now updates the other facets' counts instantly
    instead of leaving them at the unfiltered baseline. This is exactly the
    pre-cache-single-value-per-facet design Eric proposed (2026-04-08), with
    on-the-fly fallback for uncached (multi-value/zoomed/search) cases.

    • Builder: build_facet_tree_cross_filter() (self-join of membership ∪ source).
    • Validator: AI-free cross-file gate — re-derives the cube from the written
      membership + facets and diffs symmetrically; grain-uniqueness gate; baseline
      == tree_summaries.
    • Tests: tree fixture asserting explicit known cube counts + gate-bites-on-
      corruption + --only orchestration. 19 pass.
    • Data: isamples_202608_facet_tree_cross_filter.parquet already published to
      R2 (1,018 rows, validated == live formula on the deployed data).
  2. Multi-tree filter collapse (explorer: multi-tree facet filtering does N membership scans (slow in WASM at scale) — combine into one scan / cube #293) — collapse N AND-ed membership subqueries
    into ONE scan (relational division: OR within, GROUP BY pid HAVING COUNT(DISTINCT facet_type)=N across). Identical semantics; helps narrow
    multi-selections. Broad multi-tree map filtering still hits the WASM data-scale
    wall — the real fix there (a map/h3 cross-filter artifact) is a separate phase.

Scope / honesty

  • The cube covers ONE value per dim at global view (matches Eric's stated design);
    multi-value, viewport-scoped, and active-search cases fall through to the
    existing live path (correct, just not precomputed).
  • The explorer fast-path is defensive: any cube miss/error falls through to the
    prior baseline/flat-cube/slow paths (no regression; flat mode ?facets=flat
    explicitly deferred).

Verification

🤖 Generated with Claude Code

rdhyee and others added 3 commits June 18, 2026 11:58
Replace N AND-ed membership subqueries (one per active tree dim) with a
single read_parquet scan using the relational-division pattern:
OR within the scan, then GROUP BY pid HAVING COUNT(DISTINCT facet_type)
= <#active tree dims> to enforce AND across dims.

Semantics are identical (parts are AND-ed at the call site); single-dim
collapses to the same one scan as before. Helps narrow/specific
multi-selections; broad multi-tree selections still hit the WASM
data-scale wall (real fix is the precomputed facet_tree_cross_filter
cube, tracked for isamplesorg#293/isamplesorg#290).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l cross-filtered tree counts

Precompute a single-active-filter cross-filter COUNT cube over the 3 SKOS
trees (material/context/object_type, concept_uri keys, subtree semantics via
membership) + the flat source dim. For every single active filter it stores
COUNT(DISTINCT pid) for every OTHER dim's node/value, plus a baseline. Schema
mirrors facet_cross_filter (~1k rows).

Fixes isamplesorg#290's global cross-filtered tree counts: describeCrossFilters zeroes
tree-dim selections at global view (to avoid the live membership near-full
scan that hits the DuckDB-WASM data-scale wall, isamplesorg#293). The explorer now reads
the precomputed cube for a single active filter at global view instead — the
effective single filter is read directly from the controls (so it sees tree
nodes even when zeroed), ahead of the baseline early-return. Any miss/error
(incl. cube not yet published on R2) falls through to existing paths unchanged.

Builder: build_facet_tree_cross_filter() (self-join of membership ∪ source).
Validator: AI-free cross-file gate — re-derives the cube from the written
  membership + facets and diffs symmetrically; baseline == tree_summaries.
Tests: tree fixture (vocab + samples) asserting explicit known cube counts
  (catches builder-logic bugs) + validator-gate-bites-on-corruption. 16 pass.

Verified against deployed isamples_202608 data: cube == live isamplesorg#290 formula
(exact), 1,018 rows, validator green, corruption caught.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cube

P1 (regression): the explorer cube fast-path now runs ONLY when all tree dims
  are rendered as trees (TREE_DIM_KEYS.every(treeActive)). In flat mode
  (?facets=flat) the cube's subtree-membership semantics are wrong for a flat
  dim and a flat-mode selection isn't representable — so defer to the
  flat-cube/slow paths entirely (no flat-count regression).
P2 (builder): --only facet_tree_cross_filter silently built nothing (hierarchy
  guard omitted it). Added to the build guard + explicit-vocab requirement set;
  tests for both --only success and the no-vocab loud failure.
P3 (validator): EXCEPT is set-semantics so a DOUBLED cube passed. Added a
  grain/uniqueness gate over (all filter cols, facet_type, facet_value); test
  proves it bites on a doubled cube.

19 tests pass; all 3 cube gates green on deployed isamples_202608 data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rdhyee rdhyee merged commit 3fc324e into isamplesorg:main Jun 18, 2026
3 checks passed
rdhyee added a commit to rdhyee/isamplesorg.github.io that referenced this pull request Jun 20, 2026
… global view via mask index

The single-filter cube (isamplesorg#298) serves correct cross-filtered counts for exactly
ONE active filter at global view. With 2+ filters, effectiveSingleFilter()
returns null, the cube is skipped, and updateCrossFilteredCounts falls to the
legacy early-return that reverts every tree dim to the UNFILTERED baseline
(explorer.qmd) — the isamplesorg#304 bug (e.g. Material=anthropogenic + Object=artifact
showed 261,086 artifacts instead of the cross-filtered 145,770).

Fix: route the global / full-tree-mode / no-search / multi-filter case through
the complete per-pid index (sample_facet_index, isamplesorg#305/isamplesorg#306) with the node_bits
bitmask predicate:
- directFilterSnapshot(): reads ALL selections from the controls WITHOUT zeroing
  tree dims at global view (that zeroing in describeCrossFilters is the root cause).
- maskIndexWhere(): per-dim cross-filter predicate (OR within a dim via the bit
  mask, AND across dims, exclude-self); source-impossible zeroes tree targets but
  not source's own histogram (the plan's special case).
- applyMaskIndexCounts(): one columnar bitmask query per dim, applied ATOMICALLY
  after a single facetCountsReqId stale check; COUNT(*) over the one-row-per-pid
  index == distinct pids.
- HONESTY RULE: on query failure → 'unavailable' (a "(—)" dash, .count-unavailable),
  NEVER the baseline. If the index/bitmap isn't usable yet (unpublished) →
  'fallthrough' to the legacy paths unchanged, so this is safe to ship first.
- facetIndexReady preflight: index present + schema_version==1 + build_id
  membership-half == node_bits generation; else stays unavailable, not wrong.

Single-filter cube kept as a pure optimization. Viewport + search still take the
live membership path (Phase 3).

Verified on real 202608 data: index covers 6,026,242 located pids vs masks'
5,996,325 — exactly the 29,917 isamplesorg#306 samples recovered; full validator passes;
Eric's case returns 145,770 (cross-filtered) not 261,086 (baseline).

Refs isamplesorg#305 isamplesorg#304 isamplesorg#306
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Facet values and counts need to update with filters

1 participant