fix: include computed columns in this.* wildcard expansion#6045
fix: include computed columns in this.* wildcard expansion#6045prql-bot wants to merge 5 commits into
this.* wildcard expansion#6045Conversation
`this.*` resolved `this._self` through an input redirect to the first input's `_self`, enumerating only that input's columns and dropping computed columns. Keep a module's own `_self` when it has one, and flatten the per-input nested tuples so transforms like `sort` receive scalar columns. Closes #6044 Co-Authored-By: Claude <noreply@anthropic.com>
for more information, see https://pre-commit.ci
This is a Claude Code local-only config (bypassPermissions) that should not be tracked. Add it to .gitignore to prevent recurrence. Co-Authored-By: Claude <noreply@anthropic.com>
| let r = lookup_in(self, redirect.clone() + ident.clone()); | ||
| log::trace!("... result of redirect {redirect}: {r:?}"); | ||
| if !r.is_empty() { | ||
| if !r.is_empty() && !keep_direct_self { |
There was a problem hiding this comment.
keep_direct_self is constant through the whole for redirect in &self.redirects loop. If it is true, that has the effect that redirects would never be followed (since the purpose of this loop is to update the returned HashSet<Ident> with a found redirect).
It would be clearer (and incrementally more efficient) to change the PR such that if keep_direct_self is true, the function returns early.
There was a problem hiding this comment.
Good catch — you're right that keep_direct_self is loop-invariant, so the guard inside the loop never let it do anything useful. Refactored to return early before the redirect loop when it's true, and dropped the now-redundant per-iteration guard. Pushed in f9fc18d.
There was a problem hiding this comment.
At the risk of stretching this conversation too far, I wonder whether the more fundamental issue is in the application of redirects after the initial call to lookup_in().
My unclear understanding of redirects is the idea that a name like this is potentially a reference to the source relation, so that the following PRQL
from foo
select {x, y, z}
select {y}
select {y, this.z}
would still work, even though the second select has essentially dropped this.z as a column. It serves as a replacement for the more explicit this.foo.z syntax.
However, it appears that in the current implementation, if a redirect is followed and resolved, that will always shadow any name that is defined in the base module. I'm not sure this is the desired end goal.
Prior to #5875, the combination of a name present in the base module plus a successful redirect would result in an "ambiguous name" response. #5875 clarifies this but potentially in the wrong direction.
Could we drop the special-case handling of keep_direct_self and instead return early from the lookup() function if the first call to lookup_in() results in a hit? This seems like it would be a more general solution to the problem while also improving namespace lookup semantics.
There was a problem hiding this comment.
I should add that after some experimentation, the example PRQL above does not work the way I thought it did. The select {y, this.z} statement fails with an "Unknown name" error, and no other combination of references (this.foo.z, foo.z) work either.
There was a problem hiding this comment.
Good call — adopted in 7208350. Dropped keep_direct_self entirely; lookup() now returns early whenever the initial lookup_in() finds anything, and only falls back to redirects when there's no direct match. That makes "a direct hit wins over redirects" the general rule rather than a _self-specific patch, and the dead res.remove(ident) in the redirect loop goes away too.
This passes the full prqlc suite (lib + integration, including the #6044 cases) with no snapshot changes, so nothing in the existing tests was relying on a redirect shadowing a direct match.
On the redirect semantics you sketched — I checked the this.z-after-drop case and it doesn't currently resolve via redirect, with either approach:
from foo
select {x, y, z}
select {y}
select {y, this.z}
errors with Unknown name this.z`` both before and after this change (and on the PR's previous keep_direct_self commit). So restoring that "reference the dropped source column" behavior would be a separate, larger change to how redirects expose columns — this PR doesn't regress it, but it doesn't add it either. Happy to file a follow-up issue for that if it's behavior we want.
`keep_direct_self` is loop-invariant, so when it's true the redirect loop can never modify `res`. Return early instead of guarding each iteration, per review feedback from @kgutwin. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the narrow `keep_direct_self` special-case with a general rule: a direct hit in a module always wins over its redirects, so `lookup()` returns early when the initial `lookup_in()` finds anything. Redirects are only followed as a fallback when there is no direct match. This is the more general fix suggested in review for #6044 and passes the full prqlc test suite. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Problem
this.*reflected the columns of the original source relation rather than the current columns of the pipeline. Given a computed column, it silently dropped it, while barethiskept it:Root cause:
resolve_ident_wildcardlooks upthis._selfto find the module whose columns the wildcard should enumerate.Module::lookuplets a redirect into an input's_selfshadow the module's own_self(ares.remove/res.extendintroduced in #5875), sothis._selfresolved tothis.foo._self. The wildcard then enumerated only the first input's sub-module (a,b), never seeing the computed columnc, which lives at the top level of thethisframe.Solution
Module::lookup, keep a module's own_selfwhen it has a direct match, rather than letting an input redirect replace it._selfis a per-level marker, so redirecting it into an input is never what a wildcard wants.thisframe yields per-input columns grouped into nested, aliased tuples. A wildcard needs a flat list of column references (transforms likesortlower each key as a scalar and reject tuples), soresolve_ident_wildcardnow recursively flattens those tuples. Leaf columns carry their full path and inferred wildcards carry atarget_id, so the input association survives the flattening — qualifieda.*on a join still emitsa.*.As a bonus,
sort this.*now works with computed columns too (it previously errored on any nested frame).Testing
test_resolve_this_wildcard) and end-to-end SQL tests (test_this_wildcard_computed_column) coveringselect this.*andsort this.*with a computed column.prqlcsuite passes (563 tests), including the pre-existingtest_select_this_wildcard/test_sort_this_wildcardand joina.*cases.cargo clippyandcargo fmt --checkare clean.Closes #6044 — automated triage