GFQL: cuDF cross-engine result divergences (list-literal order, toString(float), min_hops seed hop-label, group_by Series-truthiness)

## Summary

Four cuDF-vs-pandas result divergences in GFQL, all found while building the native polars-engine
differential-conformance matrix (`graphistry/tests/compute/gfql/test_engine_polars_conformance_matrix.py`).
All are **cuDF-engine** issues: `g.gfql(query, engine='cudf')` differs from `engine='pandas'` (the oracle).
They are **orthogonal to the polars engine** (polars is parity-or-honest-NIE on each) and are currently
scoped out of the 4-engine conformance `_assert_invariant` (dedicated pandas-vs-polars tests cover the
polars intent), so they don't block the polars work — but each is a real cuDF correctness gap.

Repro pattern: run the same query on `engine='pandas'` vs `engine='cudf'` and compare value-level.

---

### 1. cuDF reorders list-literal `[a, b, c]` elements vs pandas
A cypher list literal materialized into a column (e.g. a row-pipeline expr building `[n.num, n.num+1, 99]`)
comes back with the list ELEMENTS permuted under cuDF; pandas preserves construction order.
- Expected: element order matches pandas (construction order). Severity: wrong-answer for list-valued projections.

### 2. cuDF formats `toString(float)` differently than pandas
`toString(n.f)` over a float column yields a different string representation under cuDF than pandas
(precision / trailing zeros / exponent style).
- Expected: match pandas `str(float)` formatting (or document a canonical format). polars declines this as
  honest-NIE (it also can't match pandas float-repr), so cuDF is the silent-divergence here.

### 3. cuDF multi-hop `min_hops>1` labels the SEED node's hop wrong
For `e_forward(min_hops=2, max_hops=3)` etc., the SEED node appears with `__gfql_output_node_hop__` =
`max_hops` under cuDF but `None`/`NaN` under pandas. (Secondary: `num` comes back int under cuDF vs float
under pandas for the same result.)
- Repro: `[n({"id":[0]}), e_forward(min_hops=2, max_hops=3), n()]` on a small attributed graph; compare the
  seed row's `__gfql_output_node_hop__`. Found via the NA-hardened conformance signature.

### 4. cuDF `group_by` row-op raises "truth value of a Series is ambiguous"
`call("group_by", {"keys":[k], "aggregations":[("c","count"),("s","sum",col)]})` on a row table that carries
EXTRA non-key/non-aggregated columns (e.g. float `f` + string `name` alongside grouped `flag`/`num`) raises:
`GFQLTypeError: [invalid-node-reference] Error executing 'group_by': The truth value of a Series is ambiguous.`
- Repro: a 5-col node frame (id/num/f/name/flag) + the group_by above; pandas+polars return `[flag,c,s]`,
  cuDF raises. A minimal 3-col graph does NOT trigger it — the extra columns drive a `if <series>:` path
  (should be `.any()`/`.all()`). Likely in the GFQL group_by handler `graphistry/compute/gfql/row/pipeline.py`.

---

Findings 1–2 were known from earlier sessions; 3–4 were found 2026-06-30. None blocks the polars engine PRs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GFQL: cuDF cross-engine result divergences (list-literal order, toString(float), min_hops seed hop-label, group_by Series-truthiness) #1663

Summary

1. cuDF reorders list-literal `[a, b, c]` elements vs pandas

2. cuDF formats `toString(float)` differently than pandas

3. cuDF multi-hop `min_hops>1` labels the SEED node's hop wrong

4. cuDF `group_by` row-op raises "truth value of a Series is ambiguous"

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

GFQL: cuDF cross-engine result divergences (list-literal order, toString(float), min_hops seed hop-label, group_by Series-truthiness) #1663

Description

Summary

1. cuDF reorders list-literal [a, b, c] elements vs pandas

2. cuDF formats toString(float) differently than pandas

3. cuDF multi-hop min_hops>1 labels the SEED node's hop wrong

4. cuDF group_by row-op raises "truth value of a Series is ambiguous"

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. cuDF reorders list-literal `[a, b, c]` elements vs pandas

2. cuDF formats `toString(float)` differently than pandas

3. cuDF multi-hop `min_hops>1` labels the SEED node's hop wrong

4. cuDF `group_by` row-op raises "truth value of a Series is ambiguous"