Skip to content

BE-607: HashQL: Refactor ModuleRegistry into builder/immutable split with allocator-generic stdlib construction#8887

Open
indietyp wants to merge 12 commits into
bm/be-538-hashql-permission-system-integrationfrom
bm/be-607-hashql-remove-smallvec-from-the-standardlibrary-for-module
Open

BE-607: HashQL: Refactor ModuleRegistry into builder/immutable split with allocator-generic stdlib construction#8887
indietyp wants to merge 12 commits into
bm/be-538-hashql-permission-system-integrationfrom
bm/be-607-hashql-remove-smallvec-from-the-standardlibrary-for-module

Conversation

@indietyp

@indietyp indietyp commented Jun 19, 2026

Copy link
Copy Markdown
Member

🌟 What is the purpose of this PR?

Replaces the StandardLibrary type (12.9KB on the stack due to nested SmallVec) with a flat ModuleCache backed by MaybeUninit slots and a FiniteBitSet, and restructures ModuleRegistry construction around a builder pattern (PartialModuleRegistry). Reduces stdlib construction cost by ~20% in instructions and ~60% in L1D cache misses.

🔍 What does this change?

Module cache and construction:

  • Replaced nested SmallVec<(TypeId, ModuleDef)> in StandardLibrary with a flat ModuleCache using Box<[MaybeUninit<ModuleDef>]> and FiniteBitSet<CacheId, u64> for initialization tracking
  • Split StandardLibrary into StandardLibrary (coordinator), StandardLibraryContext (passed to define methods), and ModuleCache (shared across modules)
  • Added CacheId enum with one variant per stdlib module in preorder traversal order, replacing runtime TypeId linear scan with compile-time indices
  • ModuleDef tracks emitted_len for exact output capacity when building item slices
  • Direct match in build() instead of [Option<ItemKind>; 2].flatten() for item emission
  • Item slices allocated directly on the heap (bypassing InternSet) since module item slices are provably unique by ModuleId ownership

Registry restructuring:

  • Introduced PartialModuleRegistry as a mutable builder that collects modules during construction
  • ModuleRegistry is now immutable after construction, backed by a contiguous IdSlice instead of InternMap
  • Removed RwLock from the root namespace (mutable access during construction, shared reference after)
  • Added ModuleRegistry::builder() test helper and PartialModuleRegistry::insert_root_module() for tests that inject custom modules

Symbol and type cleanup:

  • Replaced all heap.intern_symbol("...") calls in stdlib modules with pre-interned sym:: constants
  • Replaced all raw string .opaque("::...") calls with sym::path:: nested constants
  • Interned::empty() for zero-generic decl! expansions and special form types
  • ModuleDef parameterized by allocator S for bump-scope 2.x support

Benchmark results (Apple Silicon, bump-scope 2.x):

Counter Before After Delta
Instructions 440.0K 351.2K -20.2%
L1D cache misses 1,213 487 -59.9%
Branch mispredictions 420 347 -17.4%
Walltime 23.49 us 19.57 us -16.7%

Pre-Merge Checklist 🚀

🚢 Has this modified a publishable library?

This PR:

  • does not modify any publishable blocks or libraries, or modifications do not need publishing

📜 Does this require a change to the docs?

The changes in this PR:

  • are internal and do not require a docs change

🕸️ Does this require a change to the Turbo Graph?

The changes in this PR:

  • do not affect the execution graph

🐾 Next steps

Potential further optimizations identified but not implemented:

  • Cache primitive TypeIds (number, boolean, etc.) on TypeBuilder to avoid repeated intern round-trips
  • Hoist repeated monomorphic TypeDefs in modules like math and special_form
  • Pre-size hash maps for the finite stdlib construction
  • Batch generic declaration construction in the decl! macro

🛡 What tests cover this?

  • Existing hashql-core test suite (880 unit + 326 integration + 20 doc tests) all pass
  • The module::namespace and module::resolver tests exercise custom module construction through the new PartialModuleRegistry builder API

❓ How to test this?

  1. cargo test --package hashql-core
  2. Verify all 1226 tests pass

@vercel

vercel Bot commented Jun 19, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hash Error Error Jun 30, 2026 12:49pm
petrinaut Ready Ready Preview Jun 30, 2026 12:49pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
hashdotdesign-tokens Ignored Ignored Preview Jun 30, 2026 12:49pm

@cursor

cursor Bot commented Jun 19, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Touches core module resolution and stdlib construction paths used on every compile, but the public ModuleRegistry::new flow and resolver behavior are intended to stay equivalent with broad test coverage.

Overview
Refactors how HashQL builds and stores the global module graph, mainly to cut stdlib construction cost and make the finished registry read-only.

Registry construction moves to PartialModuleRegistry (provision_moduleinsert_moduleregisterfinish), producing an immutable ModuleRegistry backed by a contiguous IdSlice instead of InternMap/InternSet. Root package lookup no longer uses an RwLock; module items are &'heap [Item] slices allocated at build time. Resolution and namespace code index modules with registry.modules[id] and iterate item slices directly; search_by_name tracks visited modules with DenseBitSet instead of a hash set.

Standard library registration replaces runtime TypeId scans and nested SmallVec module defs with a flat ModuleCache (MaybeUninit entries + FiniteBitSet), StandardLibraryContext, and per-module CacheId indices. ModuleDef is allocator-generic, tracks emitted_len for exact item buffers, and cross-module type references use cache.request instead of manifest. Stdlib sources switch to sym:: / sym::path:: constants and Interned::empty() where generics are absent.

API cleanup: Item::absolute_path is removed; diagnostics format paths via absolute_path_rev (see expander format_absolute_path). Tests inject modules through ModuleRegistry::builder and insert_root_module. Unrelated: darwin-kperf drops as5-1 from M5 CPU name matching.

Reviewed by Cursor Bugbot for commit 6696c2c. Bugbot is set up for automated code reviews on this repo. Configure here.

indietyp commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@indietyp indietyp force-pushed the bm/be-607-hashql-remove-smallvec-from-the-standardlibrary-for-module branch from 00cf591 to 85a5c9a Compare June 23, 2026 09:41
@indietyp indietyp force-pushed the bm/be-538-hashql-permission-system-integration branch from 81f4f86 to 0f60303 Compare June 23, 2026 09:41

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 85a5c9a. Configure here.

Comment thread libs/darwin-kperf/events/src/lib.rs

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 1 comment.

Comment thread libs/@local/hashql/core/src/module/std_lib/mod.rs
@indietyp indietyp force-pushed the bm/be-607-hashql-remove-smallvec-from-the-standardlibrary-for-module branch from 85a5c9a to ceb05c9 Compare June 23, 2026 09:52
@indietyp indietyp force-pushed the bm/be-538-hashql-permission-system-integration branch from 0f60303 to 5a0812f Compare June 23, 2026 09:52
Copilot AI review requested due to automatic review settings June 23, 2026 10:02
@indietyp indietyp force-pushed the bm/be-607-hashql-remove-smallvec-from-the-standardlibrary-for-module branch from ceb05c9 to df8bde2 Compare June 23, 2026 10:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 1 comment.

Comment on lines 59 to 63
b"a15" => Some(Self::M2),
b"a16" | b"as1" | b"as2" | b"as3" => Some(Self::M3),
b"as4" | b"as4-1" | b"as4-2" => Some(Self::M4),
b"as5" | b"as5-1" | b"as5-2" => Some(Self::M5),
b"as5" | b"as5-2" => Some(Self::M5),
_ => None,
@indietyp indietyp force-pushed the bm/be-607-hashql-remove-smallvec-from-the-standardlibrary-for-module branch from df8bde2 to fccf37b Compare June 30, 2026 09:56
@indietyp indietyp force-pushed the bm/be-538-hashql-permission-system-integration branch from d74c496 to 267b920 Compare June 30, 2026 09:56
@indietyp indietyp force-pushed the bm/be-607-hashql-remove-smallvec-from-the-standardlibrary-for-module branch from fccf37b to b14202a Compare June 30, 2026 12:44
@indietyp indietyp force-pushed the bm/be-538-hashql-permission-system-integration branch from 267b920 to b2e4747 Compare June 30, 2026 12:44
Copilot AI review requested due to automatic review settings June 30, 2026 12:45
@indietyp indietyp force-pushed the bm/be-607-hashql-remove-smallvec-from-the-standardlibrary-for-module branch from b14202a to 6696c2c Compare June 30, 2026 12:45
@indietyp indietyp force-pushed the bm/be-538-hashql-permission-system-integration branch from b2e4747 to 1e921af Compare June 30, 2026 12:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 2 comments.

b"a16" | b"as1" | b"as2" | b"as3" => Some(Self::M3),
b"as4" | b"as4-1" | b"as4-2" => Some(Self::M4),
b"as5" | b"as5-1" | b"as5-2" => Some(Self::M5),
b"as5" | b"as5-2" => Some(Self::M5),
Comment on lines 322 to 324
/// This method requires allocation for the traversal state:
/// - A vector for the module exploration stack
/// - A hash set for tracking visited modules
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/libs Relates to first-party libraries/crates/packages (area) type/eng > backend Owned by the @backend team

Development

Successfully merging this pull request may close these issues.

2 participants