Skip to content

feat(gfql #1650): structured flattened whole-entity returns#1656

Merged
lmeyerov merged 4 commits into
masterfrom
dev/gfql-1650-structured-returns
Jun 29, 2026
Merged

feat(gfql #1650): structured flattened whole-entity returns#1656
lmeyerov merged 4 commits into
masterfrom
dev/gfql-1650-structured-returns

Conversation

@lmeyerov

Copy link
Copy Markdown
Contributor

Summary

Closes the core of #1650. Terminal Cypher RETURN a (whole node/edge) previously emitted one column of Cypher display strings built row-wise (({id: 51, val: 51, kind: 'a'})). That string is a presentation format (it matches the cypher-shell / TCK oracle), not data — callers had to re-parse it to use it, and constructing it is O(rows).

This flattens whole-entity returns into structured {alias}.{field} columns (a.id, a.val, a.kind, ...) by default. The per-field columns already exist on the working frame before projection, so this is "stop collapsing", not "rebuild": near-free, lossless, directly usable, and it survives JSON / CSV / Parquet / Arrow serialization and plot().

Performance (dgx-spark, median-of-7, RETURN a vs old text form)

pandas @10k pandas @100k cuDF @10k cuDF @100k
flat (#1650) 20.5 ms 32.1 ms 19.8 ms 26.6 ms
old text form 41.9 ms 204.2 ms 53.3 ms 113.5 ms
speedup 2.0× 6.4× 2.7× 4.3×

Win grows with row count (text render is O(rows); flat is ~free).

Design

  • apply_result_projection(..., structured=True) emits flat columns for whole-entity returns; structured=False keeps the legacy single Cypher-display-string column. The OPTIONAL-MATCH null-fill / projection row-guard paths (which still consume a single-column entity value for row alignment) opt out via this flag and are unchanged.
  • A synthesized null/absent-entity row (top-level OPTIONAL-MATCH miss or OPTIONAL WITH-reentry no-match, built by _apply_empty_result_row as a single {alias: None} column) has no field columns to flatten, so it falls back to the single-column text form — rendering to None and preserving the shape the OPTIONAL / reentry machinery consumes for identity recovery and no-match detection. Real rows always carry flat fields and flatten.
  • Text is now presentation-only: render_entity_text(result, alias) reconstructs the Cypher display string on demand (used by the conformance/TCK driver and any caller wanting the human-readable form). The structured data path never pays the render cost.
  • The entity-projection meta ids snapshot (.copy()) is retained — bounded reentry recovers carried node identities from it and must not alias the live frame ([BUG] Cypher reentry path mis-handles OPTIONAL prefix MATCH on no-match fixtures #1356).

Behavior change

Callers that previously read the rendered Cypher display string from a terminal RETURN a column now receive flattened a.* columns. Documented under [Development] › Changed in CHANGELOG. No programmatic consumer of the display string was found (graphistry server / louie serialize to flat JSON/CSV/Parquet).

Tests

  • Whole-entity text assertions migrated to a entity_text_records shim that renders flat → text for comparison against the pre-GFQL: avoid spurious entity-text stringification of returned entities (return structured/Arrow frames) #1650 Cypher-text oracle.
  • Grouping / connected-optional / null_fill paths (still single-column text) keep direct text assertions.
  • Flat-shape + render-helper + projection-meta tests added.
  • Full graphistry/tests/compute/gfql/ on dgx-spark (cuDF 25.12 container): 2509 passed, 16 skipped, 15 xfailed. The only 2 failures are pre-existing container-environment artifacts unrelated to this change (image's stale baked setup.py; a cugraph test that asserts "without cugraph" in an image that has cugraph).
  • ruff-clean; mypy-clean for changed code.

Follow-ups (separate PRs, after this lands)

  • tck-gfql conformance adapter: reconstruct the Cypher display string from flattened a.* columns at the comparison hook (paired-contract).
  • pyg-bench probes: durable RETURN a structured-vs-text + where_rows micro-probes.
  • Optional polish: unify the intermediate grouping/reentry value-eval onto structured (flat everywhere). Not required for the perf win; tracked as deferred.

🤖 Generated with Claude Code

@lmeyerov lmeyerov force-pushed the dev/gfql-1650-structured-returns branch 5 times, most recently from 12db9a3 to eca4217 Compare June 28, 2026 20:00
lmeyerov added a commit that referenced this pull request Jun 28, 2026
…tructured returns

Squashed reconciliation of the native lazy Polars GFQL engine (was #1648's 28
commits; full history preserved at tag bak/1648) restacked onto the colleague's
#1656 structured whole-entity returns + #1657 parse_expr memo.

Engine: native polars hop/chain (semi/anti joins), native cypher row pipeline
(select/where/order_by/group_by/unwind/projection), lazy single-hop collect-once
with CPU/GPU execution targets (gfql/lazy/). NO pandas bridge — native or honest
NotImplementedError (plan.md NO-CHEATING).

Reconciliation with #1650 structured returns: apply_result_projection now threads
`structured` to the polars path (apply_result_projection_polars). Whole-entity
RETURN a flattens to {alias}.{field} columns natively (mirrors the pandas
_flat_entity_field_names selection exactly), which — unlike the legacy entity-text
expr — works for ANY dtype (float/temporal/nested just become columns), so polars
structured == pandas structured across the board. structured=False still renders
the native Cypher display string for int/string/bool single-entity nodes.
_include_numeric_id_as_property is now polars-aware so id flattens identically.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Jun 28, 2026
Per the #1656 author's handoff: the elif-structured single-column text fallback
in _apply_result_projection_pandas looks redundant but fixes two regressions
(top-level OPTIONAL-MATCH miss; OPTIONAL-WITH-reentry no-match). Mark DO NOT
REMOVE so a later 'tidy' doesn't reintroduce them. Our polars structured-returns
reconciliation touched this file; verified the fallback is preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from c40b69c to 7d77ce7 Compare June 28, 2026 20:43
lmeyerov added a commit that referenced this pull request Jun 28, 2026
…tructured returns

Squashed reconciliation of the native lazy Polars GFQL engine (was #1648's 28
commits; full history preserved at tag bak/1648) restacked onto the colleague's
#1656 structured whole-entity returns + #1657 parse_expr memo.

Engine: native polars hop/chain (semi/anti joins), native cypher row pipeline
(select/where/order_by/group_by/unwind/projection), lazy single-hop collect-once
with CPU/GPU execution targets (gfql/lazy/). NO pandas bridge — native or honest
NotImplementedError (plan.md NO-CHEATING).

Reconciliation with #1650 structured returns: apply_result_projection now threads
`structured` to the polars path (apply_result_projection_polars). Whole-entity
RETURN a flattens to {alias}.{field} columns natively (mirrors the pandas
_flat_entity_field_names selection exactly), which — unlike the legacy entity-text
expr — works for ANY dtype (float/temporal/nested just become columns), so polars
structured == pandas structured across the board. structured=False still renders
the native Cypher display string for int/string/bool single-entity nodes.
_include_numeric_id_as_property is now polars-aware so id flattens identically.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Jun 28, 2026
Per the #1656 author's handoff: the elif-structured single-column text fallback
in _apply_result_projection_pandas looks redundant but fixes two regressions
(top-level OPTIONAL-MATCH miss; OPTIONAL-WITH-reentry no-match). Mark DO NOT
REMOVE so a later 'tidy' doesn't reintroduce them. Our polars structured-returns
reconciliation touched this file; verified the fallback is preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from 7d77ce7 to bd75bf9 Compare June 28, 2026 20:52
lmeyerov added a commit that referenced this pull request Jun 28, 2026
…tructured returns

Squashed reconciliation of the native lazy Polars GFQL engine (was #1648's 28
commits; full history preserved at tag bak/1648) restacked onto the colleague's
#1656 structured whole-entity returns + #1657 parse_expr memo.

Engine: native polars hop/chain (semi/anti joins), native cypher row pipeline
(select/where/order_by/group_by/unwind/projection), lazy single-hop collect-once
with CPU/GPU execution targets (gfql/lazy/). NO pandas bridge — native or honest
NotImplementedError (plan.md NO-CHEATING).

Reconciliation with #1650 structured returns: apply_result_projection now threads
`structured` to the polars path (apply_result_projection_polars). Whole-entity
RETURN a flattens to {alias}.{field} columns natively (mirrors the pandas
_flat_entity_field_names selection exactly), which — unlike the legacy entity-text
expr — works for ANY dtype (float/temporal/nested just become columns), so polars
structured == pandas structured across the board. structured=False still renders
the native Cypher display string for int/string/bool single-entity nodes.
_include_numeric_id_as_property is now polars-aware so id flattens identically.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Jun 28, 2026
Per the #1656 author's handoff: the elif-structured single-column text fallback
in _apply_result_projection_pandas looks redundant but fixes two regressions
(top-level OPTIONAL-MATCH miss; OPTIONAL-WITH-reentry no-match). Mark DO NOT
REMOVE so a later 'tidy' doesn't reintroduce them. Our polars structured-returns
reconciliation touched this file; verified the fallback is preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from bd75bf9 to a83a59b Compare June 28, 2026 21:59
lmeyerov added a commit that referenced this pull request Jun 28, 2026
…tructured returns

Squashed reconciliation of the native lazy Polars GFQL engine (was #1648's 28
commits; full history preserved at tag bak/1648) restacked onto the colleague's
#1656 structured whole-entity returns + #1657 parse_expr memo.

Engine: native polars hop/chain (semi/anti joins), native cypher row pipeline
(select/where/order_by/group_by/unwind/projection), lazy single-hop collect-once
with CPU/GPU execution targets (gfql/lazy/). NO pandas bridge — native or honest
NotImplementedError (plan.md NO-CHEATING).

Reconciliation with #1650 structured returns: apply_result_projection now threads
`structured` to the polars path (apply_result_projection_polars). Whole-entity
RETURN a flattens to {alias}.{field} columns natively (mirrors the pandas
_flat_entity_field_names selection exactly), which — unlike the legacy entity-text
expr — works for ANY dtype (float/temporal/nested just become columns), so polars
structured == pandas structured across the board. structured=False still renders
the native Cypher display string for int/string/bool single-entity nodes.
_include_numeric_id_as_property is now polars-aware so id flattens identically.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Jun 28, 2026
Per the #1656 author's handoff: the elif-structured single-column text fallback
in _apply_result_projection_pandas looks redundant but fixes two regressions
(top-level OPTIONAL-MATCH miss; OPTIONAL-WITH-reentry no-match). Mark DO NOT
REMOVE so a later 'tidy' doesn't reintroduce them. Our polars structured-returns
reconciliation touched this file; verified the fallback is preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the dev/gfql-1650-structured-returns branch 2 times, most recently from 231526a to b8c169e Compare June 29, 2026 00:16
lmeyerov and others added 4 commits June 28, 2026 18:25
Terminal Cypher `RETURN a` (whole node/edge) previously emitted one column of
Cypher display strings (`({id: 51, val: 51, kind: 'a'})`) built row-wise. The
string is a *presentation* format (it matches the cypher-shell / TCK oracle),
not data — callers had to re-parse it to use it, and constructing it is O(rows).

This flattens whole-entity returns into structured `{alias}.{field}` columns
(`a.id, a.val, a.kind`, ...) by default. The per-field columns already exist on
the working frame before projection, so this is "stop collapsing", not
"rebuild": near-free, lossless, directly usable, and it survives JSON / CSV /
Parquet / Arrow serialization and `plot()`.

Measured (dgx-spark, median-of-7, RETURN a vs old text form):
  pandas @100k 32 vs 204 ms (6.4x); cuDF @100k 27 vs 114 ms (4.3x). Win grows
  with row count (text render is O(rows); flat is ~free).

Design:
- `apply_result_projection(..., structured=True)` emits flat columns for
  whole-entity returns; `structured=False` keeps the legacy single
  Cypher-display-string column. The OPTIONAL-MATCH null-fill / projection
  row-guard paths (which still consume a single-column entity value for row
  alignment) opt out via this flag and are unchanged.
- A synthesized null/absent-entity row (top-level OPTIONAL-MATCH miss or
  OPTIONAL WITH-reentry no-match, built by `_apply_empty_result_row` as a
  single `{alias: None}` column) has no field columns to flatten, so it falls
  back to the single-column text form — rendering to None and preserving the
  shape the OPTIONAL / reentry machinery consumes for identity recovery and
  no-match detection. Real rows always carry flat fields and flatten.
- Text is now presentation-only: `render_entity_text(result, alias)`
  reconstructs the Cypher display string on demand (used by the conformance /
  TCK driver and any caller wanting the human-readable form). The structured
  data path never pays the render cost.
- The entity-projection meta `ids` snapshot (`.copy()`) is retained — bounded
  reentry recovers carried node identities from it and must not alias the live
  frame (#1356).

Tests: whole-entity text assertions migrated to a `entity_text_records` shim
that renders flat -> text for comparison against the pre-#1650 Cypher-text
oracle; grouping / connected-optional / null_fill paths (still single-column
text) keep direct text assertions; flat-shape + render-helper + meta tests
added. gfql/cypher + row suites: 1646 passed, 15 xfailed (only the unrelated
in-container networkx setup.py packaging artifact fails).

Cross-repo follow-ups (separate, after this lands): tck-gfql conformance
adapter (structured -> text at the comparison hook) and pyg-bench probes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rm (#1650)

Structured whole-entity returns (#1650) changed terminal RETURN a from the display
string '(:person)' to flattened a.* columns. Two test_gfql.py tests still asserted
the old display-string form (missed when the behavior landed); update them to the
flattened columns. The behavior itself is correct + documented in CHANGELOG.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on; doc + tests + cuDF lane

- I1 (bug): `RETURN a, a.val` emitted a duplicate `a.val` column (the whole-entity flatten
  shares the `{alias}.{field}` namespace with the explicit property projection). Duplicate
  column names break selection and silently drop data on `to_dict`/serialization. De-dup the
  output columns (identical data — dotted aliases are rejected), keeping first occurrence.
- I2 (boundary): document that a whole entity with no flattenable field (no id binding, no
  props, no type — in practice only an edge with no edge-id binding) falls back to the single
  Cypher-display-text column (value correct, e.g. `[]`); nodes always carry an id and flatten.
  Pinned by a test; nodes-immune noted.
- Docs: new "Whole-Entity RETURN Output Shape" section in cypher.rst (flat columns,
  render_entity_text helper, dedup + no-field boundary).
- Tests: dup-column + no-field regression tests; cuDF lane for the edges-only/policy fast-path
  shapes test (chain.py is on the cuDF-pairing list).
- CHANGELOG: I1 Fixed entry; structured-returns entry notes the no-field boundary; edges-only
  Fixed wording corrected (the fast path isn't gated on edge-id synthesis).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Code-quality pass on the structured-returns layer:
- render_entity_text reads result._nodes directly (typed Plottable attr) instead of
  getattr duck-typing.
- Condense verbose comments to terse one/two-liners: structured/absent emission branch,
  the dedup rationale, the temporal dtype gate, the OPTIONAL opt-out note, and the
  edges-only node-binding rebuild.

No behavior change; mypy + ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the dev/gfql-1650-structured-returns branch from b8c169e to 47c6fa7 Compare June 29, 2026 01:29
@lmeyerov lmeyerov changed the base branch from master to dev/gfql-opt-base June 29, 2026 01:29
@lmeyerov lmeyerov deleted the branch master June 29, 2026 02:08
@lmeyerov lmeyerov closed this Jun 29, 2026
@lmeyerov lmeyerov reopened this Jun 29, 2026
@lmeyerov lmeyerov changed the base branch from dev/gfql-opt-base to master June 29, 2026 02:10
@lmeyerov lmeyerov merged commit 000d296 into master Jun 29, 2026
103 of 104 checks passed
@lmeyerov lmeyerov deleted the dev/gfql-1650-structured-returns branch June 29, 2026 02:11
@lmeyerov lmeyerov mentioned this pull request Jun 29, 2026
lmeyerov added a commit that referenced this pull request Jun 29, 2026
…tructured returns

Squashed reconciliation of the native lazy Polars GFQL engine (was #1648's 28
commits; full history preserved at tag bak/1648) restacked onto the colleague's

Engine: native polars hop/chain (semi/anti joins), native cypher row pipeline
(select/where/order_by/group_by/unwind/projection), lazy single-hop collect-once
with CPU/GPU execution targets (gfql/lazy/). NO pandas bridge — native or honest
NotImplementedError (plan.md NO-CHEATING).

Reconciliation with #1650 structured returns: apply_result_projection now threads
`structured` to the polars path (apply_result_projection_polars). Whole-entity
RETURN a flattens to {alias}.{field} columns natively (mirrors the pandas
_flat_entity_field_names selection exactly), which — unlike the legacy entity-text
expr — works for ANY dtype (float/temporal/nested just become columns), so polars
structured == pandas structured across the board. structured=False still renders
the native Cypher display string for int/string/bool single-entity nodes.
_include_numeric_id_as_property is now polars-aware so id flattens identically.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Jun 29, 2026
Per the #1656 author's handoff: the elif-structured single-column text fallback
in _apply_result_projection_pandas looks redundant but fixes two regressions
(top-level OPTIONAL-MATCH miss; OPTIONAL-WITH-reentry no-match). Mark DO NOT
REMOVE so a later 'tidy' doesn't reintroduce them. Our polars structured-returns
reconciliation touched this file; verified the fallback is preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Jul 1, 2026
…tructured returns

Squashed reconciliation of the native lazy Polars GFQL engine (was #1648's 28
commits; full history preserved at tag bak/1648) restacked onto the colleague's

Engine: native polars hop/chain (semi/anti joins), native cypher row pipeline
(select/where/order_by/group_by/unwind/projection), lazy single-hop collect-once
with CPU/GPU execution targets (gfql/lazy/). NO pandas bridge — native or honest
NotImplementedError (plan.md NO-CHEATING).

Reconciliation with #1650 structured returns: apply_result_projection now threads
`structured` to the polars path (apply_result_projection_polars). Whole-entity
RETURN a flattens to {alias}.{field} columns natively (mirrors the pandas
_flat_entity_field_names selection exactly), which — unlike the legacy entity-text
expr — works for ANY dtype (float/temporal/nested just become columns), so polars
structured == pandas structured across the board. structured=False still renders
the native Cypher display string for int/string/bool single-entity nodes.
_include_numeric_id_as_property is now polars-aware so id flattens identically.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lmeyerov added a commit that referenced this pull request Jul 1, 2026
Per the #1656 author's handoff: the elif-structured single-column text fallback
in _apply_result_projection_pandas looks redundant but fixes two regressions
(top-level OPTIONAL-MATCH miss; OPTIONAL-WITH-reentry no-match). Mark DO NOT
REMOVE so a later 'tidy' doesn't reintroduce them. Our polars structured-returns
reconciliation touched this file; verified the fallback is preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant