Skip to content

fix: include computed columns in this.* wildcard expansion#6045

Open
prql-bot wants to merge 5 commits into
mainfrom
fix/issue-6044
Open

fix: include computed columns in this.* wildcard expansion#6045
prql-bot wants to merge 5 commits into
mainfrom
fix/issue-6044

Conversation

@prql-bot

Copy link
Copy Markdown
Collaborator

Problem

this.* reflected the columns of the original source relation rather than the current columns of the pipeline. Given a computed column, it silently dropped it, while bare this kept it:

from foo
select { a, b, c = a + b }
select this.*   # -> SELECT a, b   (dropped `c`)
select this     # -> SELECT a, b, a + b AS c

Root cause: resolve_ident_wildcard looks up this._self to find the module whose columns the wildcard should enumerate. Module::lookup lets a redirect into an input's _self shadow the module's own _self (a res.remove/res.extend introduced in #5875), so this._self resolved to this.foo._self. The wildcard then enumerated only the first input's sub-module (a, b), never seeing the computed column c, which lives at the top level of the this frame.

Solution

  • In Module::lookup, keep a module's own _self when it has a direct match, rather than letting an input redirect replace it. _self is a per-level marker, so redirecting it into an input is never what a wildcard wants.
  • Enumerating the whole this frame yields per-input columns grouped into nested, aliased tuples. A wildcard needs a flat list of column references (transforms like sort lower each key as a scalar and reject tuples), so resolve_ident_wildcard now recursively flattens those tuples. Leaf columns carry their full path and inferred wildcards carry a target_id, so the input association survives the flattening — qualified a.* on a join still emits a.*.

As a bonus, sort this.* now works with computed columns too (it previously errored on any nested frame).

Testing

  • Added a resolver-level snapshot (test_resolve_this_wildcard) and end-to-end SQL tests (test_this_wildcard_computed_column) covering select this.* and sort this.* with a computed column.
  • Full prqlc suite passes (563 tests), including the pre-existing test_select_this_wildcard / test_sort_this_wildcard and join a.* cases. cargo clippy and cargo fmt --check are clean.

Closes #6044 — automated triage

`this.*` resolved `this._self` through an input redirect to the first
input's `_self`, enumerating only that input's columns and dropping
computed columns. Keep a module's own `_self` when it has one, and flatten
the per-input nested tuples so transforms like `sort` receive scalar
columns.

Closes #6044

Co-Authored-By: Claude <noreply@anthropic.com>
pre-commit-ci Bot and others added 2 commits June 26, 2026 18:00
This is a Claude Code local-only config (bypassPermissions) that should
not be tracked. Add it to .gitignore to prevent recurrence.

Co-Authored-By: Claude <noreply@anthropic.com>
Comment thread prqlc/prqlc/src/semantic/module.rs Outdated
let r = lookup_in(self, redirect.clone() + ident.clone());
log::trace!("... result of redirect {redirect}: {r:?}");
if !r.is_empty() {
if !r.is_empty() && !keep_direct_self {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep_direct_self is constant through the whole for redirect in &self.redirects loop. If it is true, that has the effect that redirects would never be followed (since the purpose of this loop is to update the returned HashSet<Ident> with a found redirect).

It would be clearer (and incrementally more efficient) to change the PR such that if keep_direct_self is true, the function returns early.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — you're right that keep_direct_self is loop-invariant, so the guard inside the loop never let it do anything useful. Refactored to return early before the redirect loop when it's true, and dropped the now-redundant per-iteration guard. Pushed in f9fc18d.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the risk of stretching this conversation too far, I wonder whether the more fundamental issue is in the application of redirects after the initial call to lookup_in().

My unclear understanding of redirects is the idea that a name like this is potentially a reference to the source relation, so that the following PRQL

from foo
select {x, y, z}
select {y}
select {y, this.z}

would still work, even though the second select has essentially dropped this.z as a column. It serves as a replacement for the more explicit this.foo.z syntax.

However, it appears that in the current implementation, if a redirect is followed and resolved, that will always shadow any name that is defined in the base module. I'm not sure this is the desired end goal.

Prior to #5875, the combination of a name present in the base module plus a successful redirect would result in an "ambiguous name" response. #5875 clarifies this but potentially in the wrong direction.

Could we drop the special-case handling of keep_direct_self and instead return early from the lookup() function if the first call to lookup_in() results in a hit? This seems like it would be a more general solution to the problem while also improving namespace lookup semantics.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add that after some experimentation, the example PRQL above does not work the way I thought it did. The select {y, this.z} statement fails with an "Unknown name" error, and no other combination of references (this.foo.z, foo.z) work either.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — adopted in 7208350. Dropped keep_direct_self entirely; lookup() now returns early whenever the initial lookup_in() finds anything, and only falls back to redirects when there's no direct match. That makes "a direct hit wins over redirects" the general rule rather than a _self-specific patch, and the dead res.remove(ident) in the redirect loop goes away too.

This passes the full prqlc suite (lib + integration, including the #6044 cases) with no snapshot changes, so nothing in the existing tests was relying on a redirect shadowing a direct match.

On the redirect semantics you sketched — I checked the this.z-after-drop case and it doesn't currently resolve via redirect, with either approach:

from foo
select {x, y, z}
select {y}
select {y, this.z}

errors with Unknown name this.z`` both before and after this change (and on the PR's previous keep_direct_self commit). So restoring that "reference the dropped source column" behavior would be a separate, larger change to how redirects expose columns — this PR doesn't regress it, but it doesn't add it either. Happy to file a follow-up issue for that if it's behavior we want.

prql-bot and others added 2 commits June 26, 2026 18:41
`keep_direct_self` is loop-invariant, so when it's true the redirect
loop can never modify `res`. Return early instead of guarding each
iteration, per review feedback from @kgutwin.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the narrow `keep_direct_self` special-case with a general rule:
a direct hit in a module always wins over its redirects, so `lookup()`
returns early when the initial `lookup_in()` finds anything. Redirects
are only followed as a fallback when there is no direct match.

This is the more general fix suggested in review for #6044 and passes the
full prqlc test suite.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inconsistency between this and this.*

2 participants