Skip to content

interactive: port the data model from flat [i64] to a Value ADT + Term scalar language#760

Merged
frankmcsherry merged 6 commits into
TimelyDataflow:master-nextfrom
frankmcsherry:value-next
Jun 16, 2026
Merged

interactive: port the data model from flat [i64] to a Value ADT + Term scalar language#760
frankmcsherry merged 6 commits into
TimelyDataflow:master-nextfrom
frankmcsherry:value-next

Conversation

@frankmcsherry

Copy link
Copy Markdown
Member

Replaces the flat [i64]/FieldExpr data model with a Value ADT (Int/Tuple/Variant/List) and a Term scalar language, on top of master-next's scope-tree IR +
substrate-generic backend, and brings the explanation rewrite back online over the new model. Four reviewable commits:

  1. Value data model — Value + the tree-walking Term interpreter in ir.rs; Projection becomes {key: Term, val: Term}; Reducer gains Collect, Expr/LinearOp gain FlatMap;
    both parsers parse the full Term grammar (tuples/lists/spread, proj, inject/case, fold, builtins, named constructors). backend/vec.rs evaluates Terms over Value rows.
    Existing programs verified (reach → 4 reachable; scc → 3 cycle edges).
  2. ADT example programs — unnest (flatmap/collect round-trip), binders (fold + named case binders), adt (constructors/case), congruence + eqsat (variable-arity e-node
    congruence and the full equality-saturation fixpoint), cse_tree.
  3. Explain back online — implements the decoupled RowModel/Dataflow traits for Value/Term. The demand envelope is a flat value tuple [V | chain | q] matching the host
    lift; time_le/strip are inlined and folded retired. All sufficiency tests pass, including the --ignored sweeps (scc 100/110, the join partner-time regression at
    1000/1100, tc/reach fuzz).
  4. Explain CLI — dump_explain and ddir_vec --explain restored.

Deferred: the columnar substrate (backend::col/ddir_col) needs a Columnar story for Value.

🤖 Generated with Claude Code

frankmcsherry and others added 6 commits June 15, 2026 08:00
Replace the flat [i64]/FieldExpr data model with the Value ADT (Int/Tuple/
Variant/List) and the Term scalar language, on master-next's scope-tree IR +
substrate-generic backend.

- ir.rs: Value + the tree-walking Term interpreter (eval); LinearOp gains
  FlatMap, Filter/EnterAt now carry Term. Drops RowLike/FieldExpr eval and the
  arity transfer functions (those were explain-only).
- parse: Projection is now {key: Term, val: Term}; Reducer gains Collect; Expr
  gains FlatMap. Both front-ends parse the full Term grammar (tuples/lists/
  spread, proj, inject/case, fold, builtins) plus named constructors + pattern
  `case` (pipe), reconciled with master-next's import/export syntax.
- backend/vec.rs: Row = Value; render_linear/join/reduce evaluate Terms;
  Collect NEST reducer. Value derives serde (ExchangeData bound).
- gen_row produces (Tuple[Int;arity], unit); ddir_vec gains EDGES_FILE input.

Deferred to later stages: explain + its folded helper (need RowModel for
Value/Term), and the col substrate (needs a Columnar story for Value).

Verified: lib tests pass; reach.ddp (root 0, chain 0-1-2-3) -> 4 reachable;
scc.ddp (cycle 0-1-2 + trivial 3-4) -> 3 cycle edges.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port the Value/ADT example programs onto the scope-tree base (old `result …;`
-> `export "result" = …;`), exercising the new scalar language end-to-end:

- unnest.ddp  — flatmap (UNNEST) / collect (NEST) list round-trip
- binders.ddp — fold with named pattern-`case` binders
- adt.ddp     — named constructors + pattern `case`
- congruence.ddp / eqsat.ddp — variable-arity e-node congruence and the full
  equality-saturation fixpoint
- cse_tree.ddp — common-subexpression sharing over expression trees

Verified on master-next: eqsat reproduces both scenarios (pure congruence
5~1 then mul(5,2)~mul(1,2); and the a~b cascade collapsing all three muls);
unnest round-trips position-ordered; adt yields the same 98/102 buckets.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Re-enable the explanation rewrite on the Value data model by implementing the
decoupled `RowModel`/`Dataflow` traits for `Value`/`Term`.

- explain/mod.rs: a `Val` RowModel whose demand envelope is a flat value tuple
  `[V | chain (innermost-first) | q]` — matching the host lift's `append_iter`.
  Each rule builds `Term`-based projections/predicates over field indices
  (replacing the flat `[i64]` `FieldExpr` column ranges); `time_le`/`strip` are
  inlined (the `folded` algebra), and a `Spread`-bounding `expand_value_fields`
  keeps bare-row refs from pulling in chain coords. `Sb`'s `Dataflow` predicate
  is now `Term`. The clone/resolve/shape machinery is unchanged; the shape pass
  is `Term`-arity.
- Count now yields a one-field tuple `(count)`, keeping "a value is a tuple" so
  `$1[0]` and the explain envelope hold uniformly.
- decouple.rs: drop the flat executable contract; the `nested_contract`
  model-agnostic proof remains the runnable spec. `folded.rs` retired.
- tests/explain.rs restored, ported to Value rows + the flat query envelope.

Verified: all 8 sufficiency tests pass, plus the heavy --ignored sweeps
(scc 100/110, the join partner-time regression at 1000/1100, tc/reach fuzz).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- dump_explain: re-enabled (prints the scope-tree IR before/after the rewrite);
  it has no data-model dependencies and works as-is now that explain is online.
- ddir_vec --explain / --query=K:V[,q] / --debug-demand: re-enabled. The query
  input is seeded with the flat demand envelope `(key ; val ++ q)`; demand
  collections can be tapped with --debug-demand.

The CLI assigns every source the uniform shape (arity, 0), so --explain is for
single-input-arity programs (e.g. scc); mixed-arity programs (reach's arity-1
roots) need explicit per-input shapes, as the integration tests use. Verified:
scc.ddp --explain demands the cycle edges that produced the queried output.

The columnar substrate (ddir_col / backend::col) stays deferred — it needs a
Columnar story for Value.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(stage 5)

Restore a unit-level by-example spec for the reverse rules — but over the model
the crate actually evaluates. The removed `[i64]` `contract` tested `Flat` via
`eval_fields`/`eval_condition`; this `value_contract` runs the same six specs on
real `Value` rows in `Val`'s flat envelope `[V | chain | q]`, through an
in-memory `Value` dataflow against `explain::Val`.

`nested_contract` (a different, nested layout over a toy model) stays as the
proof that the *rules* are model-agnostic; `value_contract` pins the *model*
the backend runs, closing the unit-coverage gap the deletion opened.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Consolidate the front-end language docs into one reference, on the `pipe`
module (the .ddp front-end): the collection language (sources, pipe operators
incl. flatmap/collect, statements, `con` decls) and the scalar `Term` language
(row/field access, arithmetic, products/lists/sums, named constructors,
pattern `case`, `fold` with `^0`/`^1`, binders, `if`). Doc-only; previously
this had to be teased out of the `Term` variants, `build_builtin`, and example
programs. `Term`'s doc now points here for the concrete syntax.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@frankmcsherry frankmcsherry marked this pull request as ready for review June 16, 2026 15:16
@frankmcsherry frankmcsherry merged commit a32f866 into TimelyDataflow:master-next Jun 16, 2026
12 checks passed
@frankmcsherry frankmcsherry mentioned this pull request Jun 17, 2026
frankmcsherry added a commit that referenced this pull request Jun 17, 2026
* decouple: re-ground with the universal-backstop flatmap test

Re-land the proof-of-concept (removed from #760 as PR-scoped) on the follow-up
branch where it belongs: the universal backstop reverses `flatmap` — the op the
live rewrite still panics on — via the existing Dataflow primitives (forward
pair table, join on the output, REFORM the whole input). Grounds the inverse
work before the real rule + wiring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* explain(value): reverse FlatMap (UNNEST) — close the explainability regression

Replace the `panic!` on `LinearOp::FlatMap` in the reverse walk with a real
rule, `emit_lookup_flatmap`. FlatMap is same-depth (it doesn't touch iteration
time) and its list rides as one opaque value, so no envelope change is needed:
build the (output -> input) pair table by running the op forward on the input
side, join the demand on the packed output, and recover the whole input (the
`None`-inverse endpoint). The one wrinkle — a plain flatmap drops the source
row and the key isn't unique — is handled by re-keying the input by itself
before exploding (the source rides through in the join key) and re-projecting
to (k, pos, elem) after; a chain_in <= chain_out filter keeps it sound in
iterating scopes. No new primitive; uses the existing project/flatmap/join.

With this, `writable => explainable` is restored for flatmap programs.
Verified: a flatmap sufficiency test passes, and the full suite — incl. the
heavy --ignored relational sweeps — stays green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* explain(value): confirm Collect reverses via the non-min keyed path

Collect (NEST) is a Reducer, so the reverse walk already routes it through the
non-min keyed lookup ("demand all same-key inputs") — which is exactly the
demand for a collected list (all its members). A sufficiency test over a
`| collect` program confirms the existing path handles a List-valued reducer
output; no new rule needed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant