feat(indexer): push result cap and ordering into KV block search (PLT-748)#3689
feat(indexer): push result cap and ordering into KV block search (PLT-748)#3689amir-deris wants to merge 5 commits into
Conversation
PR SummaryMedium Risk Overview KV block search gains a bounded fast path: for queries made only of equalities and Tx search now passes Reviewed by Cursor Bugbot for commit eaa58f8. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3689 +/- ##
==========================================
- Coverage 59.29% 58.36% -0.93%
==========================================
Files 2272 2185 -87
Lines 188193 178553 -9640
==========================================
- Hits 111580 104207 -7373
+ Misses 66560 65085 -1475
+ Partials 10053 9261 -792
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
A focused, well-documented refactor that pushes the result cap and ordering into the KV block indexer's scan path; the new fast/fallback paths faithfully reproduce the legacy match semantics and are well covered by tests. No correctness or security issues found — only minor non-blocking notes.
Findings: 0 blocking | 5 non-blocking | 1 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Second-opinion passes: Codex reported no material findings; the Cursor review file (cursor-review.md) is empty (no output produced). Noting per review instructions.
- Test coverage gap: TestBlockIndexerBounded covers equality drivers and a single lower-bound height range (desc), but not (a) a pure block.height-range driver in ascending order, (b) a dual-bounded range (
block.height >= x AND block.height <= y), or (c) context-cancellation returning partial results in the bounded path. The logic appears correct for all three; adding cases would lock the behavior in. - Minor perf: the pure block.height-range driver scans from the height extremum instead of seeking to the range boundary, so a bounded query like
block.height <= Kon a tall chain still iterates every height above K (cheap range-reject, no point-probe). This matches the old O(total) cost, so it's not a regression — a future optimization could seek the iterator to the bound. - The
seendedup map in searchBounded is effectively defensive: block event keys are unique per (compositeKey, eventValue, height) because the type suffix is always "finalize_block", so a height cannot repeat within a single driver scan today. Fine to keep for robustness; just noting it is not exercised. - 1 suggestion(s)/nit(s) flagged inline on specific lines.
| // and ordering down to the scan. | ||
| results, err := sink.SearchTxEvents(ctx, q, indexer.SearchOptions{ | ||
| Limit: env.Config.MaxTxSearchResults, | ||
| OrderDesc: req.OrderBy != "asc", |
There was a problem hiding this comment.
[nit] Nit: OrderDesc: req.OrderBy != "asc" maps any invalid order_by (e.g. "foo") to desc. Harmless today because the tx indexer ignores opts entirely and the switch below (lines 83-100) still validates order_by and returns ErrInvalidRequest. Worth keeping the TODO(PLT-748) note in mind: once the tx scan path honors these opts, order_by should be validated before it's turned into OrderDesc so a bad value can't silently drive the scan.
There was a problem hiding this comment.
This PR pushes the result cap and ordering down into the KV block indexer via a new bounded/point-probe fast path, with a materialize-then-cap fallback. The change is correct, consistent across all interface implementations and callers, and well tested; only minor non-blocking observations.
Findings: 0 blocking | 5 non-blocking | 2 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Cursor's second-opinion review (cursor-review.md) is empty — that pass produced no output. Codex reported no material findings but noted it could not run tests (no Go 1.25.6 / no network in its sandbox), so its pass was static-only.
- Performance (acknowledged as future work via TODO): the height-range driver scans the entire
block.heightprimary-key prefix and filters candidates withheightInRangerather than seeking to the range boundary. For an ascending query with a high lower bound (e.g.block.height >= 1000000on a long chain) this over-scans all non-matching lower heights before reaching the range. Descending queries are unaffected. Not a regression vs. the prior full-materialization behavior. - Test coverage suggestion: the bounded fast path is exercised, but there is no test asserting equivalence between the fast path and the fallback for the same query, nor a case where the driver equality matches only a subset of heights combined with an ascending limit. Consider adding one to guard against future divergence.
- 2 suggestion(s)/nit(s) flagged inline on specific lines.
| driverEquality *syntax.Condition | ||
| // driverHeightRange, when non-nil, drives the scan off the primary | ||
| // block.height key range. | ||
| driverHeightRange *indexer.QueryRange |
There was a problem hiding this comment.
[nit] driverHeightRange is assigned in planBounded (line 422) but never read: searchBounded decides the driver purely via plan.driverEquality != nil, and candidate filtering uses plan.heightRanges. This field is effectively dead — consider removing it (and its assignment) to avoid implying it drives behavior, or actually consuming it in searchBounded for clarity.
| prefix, err = orderedcode.Append(nil, plan.driverEquality.Tag, plan.driverEquality.Arg.Value()) | ||
| } else { | ||
| // Drive off the primary block.height key range. | ||
| prefix, err = orderedcode.Append(nil, types.BlockHeightKey) |
There was a problem hiding this comment.
[nit] nit: when driving off the primary block.height key range, the scan starts from the full-prefix boundary and relies on candidateMatches/heightInRange to discard out-of-range heights. For an ascending scan with a high lower bound this iterates many non-matching low heights before reaching the range. A future optimization could seek to the range's lower/upper bound; the TODO/doc note covers this, so non-blocking.
| // driverEquality, when non-nil, is the equality condition whose event-key | ||
| // prefix is scanned in height order. Exactly one of driverEquality / | ||
| // driverHeightRange is set. | ||
| driverEquality *syntax.Condition | ||
| // driverHeightRange, when non-nil, drives the scan off the primary | ||
| // block.height key range. | ||
| driverHeightRange *indexer.QueryRange | ||
| // equalityProbes are the remaining equality conditions, tested per | ||
| // candidate height with a point lookup. | ||
| equalityProbes []syntax.Condition | ||
| // heightRanges are block.height bounds evaluated directly from a candidate | ||
| // height. When driving off the height range it is included here too so it |
There was a problem hiding this comment.
🟡 The driverHeightRange *indexer.QueryRange field on boundedPlan (kv.go:369) is set by planBounded at line 422 but never read anywhere; searchBounded gates only on plan.driverEquality != nil and otherwise builds the prefix from types.BlockHeightKey directly. Behavior is correct — height-range enforcement flows through plan.heightRanges + candidateMatches/heightInRange — so this is just dead code + a slightly overstated struct comment. Suggest dropping the field (and the "drives the scan off the primary block.height key range" comment) or actually using its LowerBoundValue()/UpperBoundValue() in searchBounded to narrow the iterator bounds for narrow ranges.
Extended reasoning...
What
The fast-path plan struct declares a driverHeightRange field:
type boundedPlan struct {
driverEquality *syntax.Condition
// driverHeightRange, when non-nil, drives the scan off the primary
// block.height key range.
driverHeightRange *indexer.QueryRange
equalityProbes []syntax.Condition
heightRanges []indexer.QueryRange
}planBounded writes it in exactly one place (kv.go:422):
case len(plan.heightRanges) > 0:
qr := plan.heightRanges[0]
plan.driverHeightRange = &qrA grep across the repo shows zero readers of the field — no plan.driverHeightRange anywhere except that assignment and its declaration.
Why it does not affect correctness
searchBounded at kv.go:279-284 decides how to build the driver prefix by checking plan.driverEquality:
if plan.driverEquality != nil {
prefix, err = orderedcode.Append(nil, plan.driverEquality.Tag, plan.driverEquality.Arg.Value())
} else {
// Drive off the primary block.height key range.
prefix, err = orderedcode.Append(nil, types.BlockHeightKey)
}The else branch does not consult driverHeightRange — it just builds a prefix over the full block.height keyspace. Range enforcement is done elsewhere: heightRanges (the slice, not the driver pointer) is walked in candidateMatches, and heightInRange filters each candidate. The field is redundant with the fact that driverEquality == nil already implies the height-range driver was chosen.
Step-by-step proof (query: block.height >= 6, desc, limit 3)
planBoundedsees one range onblock.height, no equalities.- It falls into the
case len(plan.heightRanges) > 0branch and assignsplan.driverHeightRange = &qr. searchBoundedis called.plan.driverEqualityis nil, so the else branch runs: prefix =orderedcode.Append(nil, types.BlockHeightKey). ThedriverHeightRangefield is never touched.- The iterator scans all indexed
block.heightprimary keys in descending order. - For each candidate height,
candidateMatchescallsheightInRange(h, plan.heightRanges[i]), which correctly rejectsh < 6. - After collecting 3 matches, the loop exits.
Result: correct ([10, 9, 8]), but the field played no role. It could be removed with no observable behavior change.
Missed optimization
The struct comment claims the field "drives the scan off the primary block.height key range", which implies the range bounds are used to narrow the iterator. They are not. A query like block.height BETWEEN 100 AND 200 on a chain with millions of blocks still creates an iterator over the entire block.height prefix and filters via heightInRange. If searchBounded consulted driverHeightRange to build a bounded iterator with LowerBoundValue()/UpperBoundValue(), that would be a real perf win — but that is a follow-up, not a regression from this PR.
Fix
Either drop the field and update the accompanying comment (simplest, immediately reduces cognitive load) or wire it into searchBounded to narrow the iterator range. Both are acceptable; the former is safer for this PR since the latter needs its own tests. Not merge-blocking — the code works as intended.
There was a problem hiding this comment.
Well-structured, well-tested change that threads SearchOptions{Limit, OrderDesc} through the indexer interfaces and adds a bounded/ordered fast path to the KV block indexer; correctness and ordering look sound. No blockers — only a performance-completeness gap (the pure height-range driver isn't truly bounded for far-side bounds) and a minor dead field.
Findings: 0 blocking | 4 non-blocking | 2 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Codex's one finding is valid and non-blocking: the pure
block.height-range fast path (searchBoundeddriving offtypes.BlockHeightKey) always starts at the full height prefix, soblock.height <= 100withorder_by=desc(orblock.height >= <large>withorder_by=asc) still traverses the entire opposite tail before it starts collecting matches. This is no worse than the pre-PR full-materialization behavior and is fine for the common near-side/most-recent case, but the PR's stated goal of bounding the scan isn't achieved here. Consider seeking the iterator to the range bound (the reverse iterator could start at the upper bound; the forward iterator at the lower bound). Cursor's review file (cursor-review.md) andREVIEW_GUIDELINES.mdwere both empty, so no additional input from those. - TxSearch now passes SearchOptions through but the tx KV indexer ignores them (TODO(PLT-748)); behavior is intentionally unchanged for tx and clearly marked — no action needed, just confirming the double-cap (indexer + RPC safety net) is intentional.
- 2 suggestion(s)/nit(s) flagged inline on specific lines.
| prefix, err = orderedcode.Append(nil, plan.driverEquality.Tag, plan.driverEquality.Arg.Value()) | ||
| } else { | ||
| // Drive off the primary block.height key range. | ||
| prefix, err = orderedcode.Append(nil, types.BlockHeightKey) |
There was a problem hiding this comment.
[suggestion] The pure height-range driver always starts iterating from the full block.height prefix, so it isn't actually bounded when the bound is on the far side of the scan direction (e.g. block.height <= 100 with order_by=desc, or block.height >= <large> with order_by=asc) — it traverses every height above/below the bound before collecting any result. You already have the bound in plan.heightRanges/plan.driverHeightRange; seeking the iterator to that bound (reverse iterator starting at the upper bound, forward at the lower bound) would make this path genuinely bounded. Not a regression vs. the prior behavior, but worth addressing to meet the PR's stated goal.
| // heightRanges has exactly one element here, which also filters | ||
| // candidates during the scan. | ||
| qr := plan.heightRanges[0] | ||
| plan.driverHeightRange = &qr |
There was a problem hiding this comment.
[nit] plan.driverHeightRange is assigned here but never read anywhere — searchBounded only checks plan.driverEquality != nil and otherwise drives off types.BlockHeightKey, and the range filtering happens via plan.heightRanges in candidateMatches. Either drop this field or use it (e.g. to seek the iterator to the bound as noted above).
| func (idx *BlockerIndexer) Search(ctx context.Context, q *query.Query, opts indexer.SearchOptions) ([]int64, error) { | ||
| results := make([]int64, 0) | ||
| select { | ||
| case <-ctx.Done(): | ||
| if ctx.Err() != nil { | ||
| return results, nil | ||
|
|
||
| default: | ||
| } | ||
|
|
||
| conditions := q.Syntax() |
There was a problem hiding this comment.
🟣 [pre-existing] Search short-circuits on any block.height = X equality via lookForHeight and returns []int64{H} after only checking idx.Has(H) — any additional AND-conditions in the query are silently ignored. For example, block.height = 5 AND app.name = 'sei' returns [5] if height 5 is indexed even when app.name = 'sei' was never emitted on that block, producing a false positive from BlockSearch. This is inherited from upstream Tendermint (present pre-PR at kv.go:91-103) and the PR does not modify the short-circuit branch, but the enclosing Search function is substantially reorganized here — flagging for the author to decide whether to fix in this PR or as a follow-up (a proper fix likely needs to treat block.height = X as a probe/filter rather than a driver, since naive removal breaks the standalone height case).
Extended reasoning...
What
In BlockerIndexer.Search at sei-tendermint/internal/state/indexer/block/kv/kv.go:87-101:
height, ok := lookForHeight(conditions)
if ok {
ok, err := idx.Has(height)
if err != nil {
return nil, err
}
if ok {
return []int64{height}, nil
}
return results, nil
}lookForHeight (util.go:138) iterates conditions and returns (H, true) on the first block.height = H (TEq) it finds — it does not require the query to consist of only that condition. Search then confirms the height is indexed via idx.Has(H) and returns []int64{H}. Has only checks the primary block.height key store; it does not evaluate any of the other conditions in the query.
Why existing code does not prevent it
The short-circuit runs before both the fast-path planBounded/searchBounded and the fallback intersect/collectBounded, so the other conditions are never applied by the indexer. On the RPC side, BlockSearch in sei-tendermint/internal/rpc/core/blocks.go does not re-verify conditions against the returned heights — it just paginates and returns them.
The function comment even documents this behavior: "In the case of height queries, i.e. block.height=H, if the height is indexed, that height alone will be returned." That wording made sense when height-equality queries were expected to appear alone, but nothing today enforces that.
Step-by-step proof
Using the existing TestBlockIndexer fixture (heights 1..11 all carry finalize_event1.proposer=FCAA001; only even heights carry finalize_event2.foo):
- Query:
block.height = 5 AND finalize_event2.foo <= 5. Height 5 has no indexedfooevent. q.Syntax()yields[{block.height, TEq, 5}, {finalize_event2.foo, TLeq, 5}].lookForHeightwalks the slice, hits condition 0 withTag=block.height, Op=TEq, returns(5, true)— condition 1 is never inspected.idx.Has(5)reads the primary keyorderedcode(BlockHeightKey, 5)from the store — it exists (height 5 was indexed), so returns true.Searchreturns[]int64{5}.
Expected under AND semantics: []int64{}, because height 5 does not satisfy finalize_event2.foo <= 5. The caller receives a false positive.
Impact
BlockSearch RPC callers using compound block.height = X AND ... queries get incorrect results — they see block 5 when the events they filtered on are not actually present. Callers that trust the RPC response (block explorers, monitoring, most indexers) will silently mis-attribute events to the wrong block or count blocks that should have been filtered out.
Why pre-existing
The short-circuit at kv.go:91-103 is preserved verbatim from b8776ed (pre-PR). This PR restructures the surrounding Search function — extracting intersect, adding planBounded/searchBounded/collectBounded — but does not touch the lookForHeight early-return itself. The bug is inherited from upstream Tendermint. All four verifiers converged on pre_existing.
How to fix
A naive fix (deleting the short-circuit) would break the standalone block.height = X case: block.height is stored in the primary index under orderedcode(BlockHeightKey, int64(H)), while planBounded would build the driver prefix from c.Arg.Value() (the string "5"), producing a mismatched key layout. A proper fix likely treats block.height = X as an equalityProbe/filter over the height range rather than a driver, or short-circuits only when conditions has length 1. This is real behavior work on a widely-used RPC and probably belongs in a follow-up rather than blocking this scan-bounding PR.
There was a problem hiding this comment.
The PR cleanly threads a new SearchOptions{Limit, OrderDesc} through the indexer/sink/RPC stack and adds a well-tested bounded fast path for the KV block indexer; correctness looks sound. The main concern is a scan-efficiency gap in the height-range driver (raised by Codex) plus minor cleanups.
Findings: 0 blocking | 7 non-blocking | 2 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Second-opinion inputs:
cursor-review.mdandREVIEW_GUIDELINES.mdare both empty (no Cursor findings and no repo-specific guidelines to apply). Codex produced a single finding, incorporated below. - No prompt-injection or suspicious instructions were found in the PR title, description, or diff.
- TxSearch (tx.go) now passes SearchOptions but the tx KV indexer ignores it and the cap is still applied after the full sort — this is explicitly acknowledged with TODO(PLT-748) and behavior is unchanged, so it is fine as a follow-up.
- The fast path correctly relies on
lookForHeightshort-circuiting anyblock.height = Hequality beforeplanBoundedruns, so a driver equality is never a block.height point key (which is stored int64-encoded, not string-encoded). Worth a brief test asserting a mixed query likeblock.height = H AND app.name = 'x'still routes as expected, since that behavior is load-bearing for the new path. - Consider a targeted benchmark comparing the new bounded path vs. the old materialize-and-sort path for a broad
app.name = 'x'query to document the win claimed in the PR description. - 2 suggestion(s)/nit(s) flagged inline on specific lines.
| prefix, err = orderedcode.Append(nil, plan.driverEquality.Tag, plan.driverEquality.Arg.Value()) | ||
| } else { | ||
| // Drive off the primary block.height key range. | ||
| prefix, err = orderedcode.Append(nil, types.BlockHeightKey) |
There was a problem hiding this comment.
[suggestion] The height-range driver seeds the iterator from the full block.height prefix and never seeks to the actual range bounds — the bounds are only enforced afterward via heightInRange in candidateMatches. For a one-sided range whose retained end is far from the scan start (e.g. block.height < 1000 with order_by=desc on a tall chain), the scan walks from the tip down through millions of out-of-range heights before it reaches the matching region. The same applies to an equality driver combined with a restrictive height range in the opposite direction. This is a scan-work regression versus the old matchRange, which seeded the iterator at the range bound; memory stays bounded but the scan does not. Consider narrowing the driver prefix/iterator domain using the range bound for the scan direction (upper bound for desc, lower bound for asc).
| driverEquality *syntax.Condition | ||
| // driverHeightRange, when non-nil, drives the scan off the primary | ||
| // block.height key range. | ||
| driverHeightRange *indexer.QueryRange |
There was a problem hiding this comment.
[nit] nit: driverHeightRange is assigned in planBounded but never read — searchBounded only branches on driverEquality != nil and the range filtering goes through plan.heightRanges. Either drop the field or use it in searchBounded (it would be the natural place to compute the range-bounded start key mentioned above).
Summary
BlockSearch/TxSearchpreviously fetched the entire match set from the KVindexer, sorted it, and only then applied
MaxTxSearchResultsat the RPC layer.For broad queries this materializes and sorts far more heights than the caller
will ever see. This PR pushes the result cap and ordering down into the
indexer so a broad block query is bounded at the scan path.
What changed
indexer.SearchOptions{Limit, OrderDesc}threaded through theTxIndexer/BlockIndexerinterfaces, theEventSinkinterface, the KVsink, and the null/mock implementations.
block/kv) — bounded search:planBounded/searchBounded): when every condition is anequality (point-probeable) or a
block.heightrange (evaluable from thecandidate height), drive a single height-ordered scan in
order_byorder,point-probe the remaining conditions per candidate, and stop at
Limit.Memory is bounded by results kept, not by total match cardinality.
intersect+collectBounded): queries withCONTAINS/MATCHES/EXISTSor non-height ranges can't be point-probed, sothe intersection is materialized as before, then ordered and capped.
BlockSearch: validatesorder_byup front, passes it and the capinto the indexer, and keeps the post-sort cap as a safety net for sinks that
ignore the limit.
TxSearch: passesSearchOptionsthrough now, but the tx scan path isnot yet bounded — marked with
TODO(PLT-748); behavior is unchanged for tx.Tests
TestBlockIndexerBoundedcovers the fast path (equality driver, multi-equalityprobe, equality + height range, pure height-range driver) and the fallback
path (
CONTAINS), across asc/desc and bounded/unbounded limits.SearchOptionsargument.Follow-up
TODO(PLT-748)intx.go).PLT-748