Summary
Four cuDF-vs-pandas result divergences in GFQL, all found while building the native polars-engine
differential-conformance matrix (graphistry/tests/compute/gfql/test_engine_polars_conformance_matrix.py).
All are cuDF-engine issues: g.gfql(query, engine='cudf') differs from engine='pandas' (the oracle).
They are orthogonal to the polars engine (polars is parity-or-honest-NIE on each) and are currently
scoped out of the 4-engine conformance _assert_invariant (dedicated pandas-vs-polars tests cover the
polars intent), so they don't block the polars work — but each is a real cuDF correctness gap.
Repro pattern: run the same query on engine='pandas' vs engine='cudf' and compare value-level.
1. cuDF reorders list-literal [a, b, c] elements vs pandas
A cypher list literal materialized into a column (e.g. a row-pipeline expr building [n.num, n.num+1, 99])
comes back with the list ELEMENTS permuted under cuDF; pandas preserves construction order.
- Expected: element order matches pandas (construction order). Severity: wrong-answer for list-valued projections.
2. cuDF formats toString(float) differently than pandas
toString(n.f) over a float column yields a different string representation under cuDF than pandas
(precision / trailing zeros / exponent style).
- Expected: match pandas
str(float) formatting (or document a canonical format). polars declines this as
honest-NIE (it also can't match pandas float-repr), so cuDF is the silent-divergence here.
3. cuDF multi-hop min_hops>1 labels the SEED node's hop wrong
For e_forward(min_hops=2, max_hops=3) etc., the SEED node appears with __gfql_output_node_hop__ =
max_hops under cuDF but None/NaN under pandas. (Secondary: num comes back int under cuDF vs float
under pandas for the same result.)
- Repro:
[n({"id":[0]}), e_forward(min_hops=2, max_hops=3), n()] on a small attributed graph; compare the
seed row's __gfql_output_node_hop__. Found via the NA-hardened conformance signature.
4. cuDF group_by row-op raises "truth value of a Series is ambiguous"
call("group_by", {"keys":[k], "aggregations":[("c","count"),("s","sum",col)]}) on a row table that carries
EXTRA non-key/non-aggregated columns (e.g. float f + string name alongside grouped flag/num) raises:
GFQLTypeError: [invalid-node-reference] Error executing 'group_by': The truth value of a Series is ambiguous.
- Repro: a 5-col node frame (id/num/f/name/flag) + the group_by above; pandas+polars return
[flag,c,s],
cuDF raises. A minimal 3-col graph does NOT trigger it — the extra columns drive a if <series>: path
(should be .any()/.all()). Likely in the GFQL group_by handler graphistry/compute/gfql/row/pipeline.py.
Findings 1–2 were known from earlier sessions; 3–4 were found 2026-06-30. None blocks the polars engine PRs.
🤖 Generated with Claude Code
Summary
Four cuDF-vs-pandas result divergences in GFQL, all found while building the native polars-engine
differential-conformance matrix (
graphistry/tests/compute/gfql/test_engine_polars_conformance_matrix.py).All are cuDF-engine issues:
g.gfql(query, engine='cudf')differs fromengine='pandas'(the oracle).They are orthogonal to the polars engine (polars is parity-or-honest-NIE on each) and are currently
scoped out of the 4-engine conformance
_assert_invariant(dedicated pandas-vs-polars tests cover thepolars intent), so they don't block the polars work — but each is a real cuDF correctness gap.
Repro pattern: run the same query on
engine='pandas'vsengine='cudf'and compare value-level.1. cuDF reorders list-literal
[a, b, c]elements vs pandasA cypher list literal materialized into a column (e.g. a row-pipeline expr building
[n.num, n.num+1, 99])comes back with the list ELEMENTS permuted under cuDF; pandas preserves construction order.
2. cuDF formats
toString(float)differently than pandastoString(n.f)over a float column yields a different string representation under cuDF than pandas(precision / trailing zeros / exponent style).
str(float)formatting (or document a canonical format). polars declines this ashonest-NIE (it also can't match pandas float-repr), so cuDF is the silent-divergence here.
3. cuDF multi-hop
min_hops>1labels the SEED node's hop wrongFor
e_forward(min_hops=2, max_hops=3)etc., the SEED node appears with__gfql_output_node_hop__=max_hopsunder cuDF butNone/NaNunder pandas. (Secondary:numcomes back int under cuDF vs floatunder pandas for the same result.)
[n({"id":[0]}), e_forward(min_hops=2, max_hops=3), n()]on a small attributed graph; compare theseed row's
__gfql_output_node_hop__. Found via the NA-hardened conformance signature.4. cuDF
group_byrow-op raises "truth value of a Series is ambiguous"call("group_by", {"keys":[k], "aggregations":[("c","count"),("s","sum",col)]})on a row table that carriesEXTRA non-key/non-aggregated columns (e.g. float
f+ stringnamealongside groupedflag/num) raises:GFQLTypeError: [invalid-node-reference] Error executing 'group_by': The truth value of a Series is ambiguous.[flag,c,s],cuDF raises. A minimal 3-col graph does NOT trigger it — the extra columns drive a
if <series>:path(should be
.any()/.all()). Likely in the GFQL group_by handlergraphistry/compute/gfql/row/pipeline.py.Findings 1–2 were known from earlier sessions; 3–4 were found 2026-06-30. None blocks the polars engine PRs.
🤖 Generated with Claude Code