Skip to content

Profile recipes 2#81

Open
RembrandtK wants to merge 15 commits into
mainfrom
profile-recipes-2
Open

Profile recipes 2#81
RembrandtK wants to merge 15 commits into
mainfrom
profile-recipes-2

Conversation

@RembrandtK

Copy link
Copy Markdown
Member

Replaces (resolving review concerns of): #75

Recipe overlay for composable network configs

The stack used to be configured by a single committed .env, with ad-hoc export FOO=… dances when a flow needed a different shape (DIPs on, mock REO off, alternate image pins, extra compose profiles). That made it hard to support more than one profile cleanly or run different test scenarios effectively.

This PR introduces a recipe model. A recipe is a small JSON file in recipes/ that declares the env fragments under config/ it wants merged, plus optional inline env and compose-file overrides. scripts/resolve-recipe.sh merges those fragments into a generated .env. Docker Compose picks the file up automatically, so once a recipe is resolved, bare docker compose … and the existing just targets just work.

Fragments are split by concern (ports/pins, image tags, accounts, REO mode, IP overlay), so a new profile is composed by listing the fragments it needs rather than forking the whole env. Selection follows a clear precedence — $RECIPE → .recipe.local (gitignored dev override) → .recipe (committed per-branch default) → baseline — which lets a branch declare its intended shape, a developer override locally, and tests pin a recipe.

Two recipes included: baseline (GIP-0088 contracts, no IP services, mock REO on) and indexing-payments (baseline + the DIPs overlay: dipper, dips-fork indexer-rs, IP profile).

Implications for use

Day-to-day: just resolve [recipe] once, then the usual just up / docker compose flow. just recipes lists what's available; just recipe-active shows what's currently resolved.

Per-branch defaults live in committed .recipe; personal overrides go in .recipe.local (gitignored).

Testing: each test scenario can pin its recipe explicitly, so the stack shape under test is declared in-repo rather than carried in a shell session. Diverging shapes (mock vs real REO, IP on/off) stop fighting over a single .env.

The old committed .env is gone; first run needs just resolve before docker compose will find variables.

Switch core services from build-from-source Dockerfiles to pre-built
image tags pinned by ${SERVICE_VERSION} env vars:

- gateway: build from ghcr.io/edgeandnode/graph-gateway:${GATEWAY_VERSION}
- eligibility-oracle-node: pulled from ghcr.io/edgeandnode/eligibility-oracle-node
- subgraph-deploy: copies indexing-payments subgraph from its per-branch image
  built in graphprotocol/indexing-payments-subgraph

docker-compose.yaml: rename ${SERVICE_COMMIT} build args to ${SERVICE_VERSION};
add cross-stack network so per-test indexer compose projects can reach the
shared chain/ipfs services; add recipe-resolution sentinel so docker compose
halts with a clear pointer to "just resolve" if .env is missing.

shared/lib.sh: kafka_topic() helper that suffixes ${KAFKA_TOPIC_ENVIRONMENT}
when set, mirroring gateway's kafka_topic_environment config.
Extend the contracts service to deploy the GIP-0088 upgrade on top of the
horizon base: RewardsEligibilityOracleA + RewardsEligibilityOracleB,
IssuanceAllocator, and RecurringAgreementManager. RewardsManager picks
REO-A as its providerEligibilityOracle when REO_MOCK=0; when REO_MOCK=1
(default) MockRewardsEligibilityOracle is wired instead so tests bypass
the eligibility gate.

subgraph-deploy: add deploy_indexing_payments() that builds and deploys
the indexing-payments subgraph (sources copied from the per-branch image)
alongside graph-network and block-oracle. Reassigns the deployment so
dipper's chain_listener doesn't stall on an unassigned subgraph.

indexer-service-rs: emit a [dips] config block when INDEXING_PAYMENTS_ENABLED=1
so the dips-fork build of indexer-rs (pinned via INDEXER_SERVICE_RS_VERSION
in the indexing-payments overlay) recognises the DIPs schema. Also wire
[subgraphs.escrow] (schema-required in Horizon mode) and the [horizon]
block for hybrid V1/V2 receipt handling.

eligibility-oracle-node: align config with the new [[blockchain.contracts]]
and [[blockchain.chains]] sections; topic names go through kafka_topic so
KAFKA_TOPIC_ENVIRONMENT suffixing matches gateway/IISA.

graph-tally-escrow-manager, tap-agent: track upstream argument shape.

graph-contracts/Dockerfile: drop the reintroduced data-edge clone so the
contracts image builds against the pinned commit.

tests/network_state: drop the obsolete assertion that referenced the old
graph-contracts surface.
Replace the ACCOUNT0/1/X402/RECEIVER scheme with role-named secrets that
say what they're for:

  ACCOUNT0_*  → DEPLOYER_*    (gateway payer, dipper signer)
  ACCOUNT1_*  → GOVERNOR_*    (RewardsManager governance, Controller setProxy)
  RECEIVER_*  → INDEXER_*     (indexer identity for staking/allocation)
  ACCOUNT_X402_ADDRESS dropped — the x402 gateway block isn't used locally

Containers and scripts that consumed the old names track the renames.
Test docs reference INDEXER_SECRET instead of RECEIVER_SECRET. The four
new REO role secrets (OPERATOR/ORACLE/PAUSE_ADMIN/SUBGRAPH_AVAILABILITY)
are added by the test-infra commit.
Replace the single committed .env with a recipe-driven resolver. A recipe
is a JSON file in recipes/ listing env fragments under config/ to merge,
in order, into a generated .env. docker compose picks up the generated
.env automatically, so bare compose commands work after `just resolve`.

Recipe selection (highest precedence wins): $RECIPE → .recipe.local →
.recipe (committed per-branch default) → "baseline".

Recipes shipped:
  - baseline:           base.env + services.env + accounts-role-named.env + mock-reo.env
  - indexing-payments:  baseline + indexing-payments.env  (overlays the DIPs
                         components: dipper, dips-fork indexer-rs, REO real)

Config fragments by concern:
  - base.env                   port assignments, contract pins, mnemonics
  - services.env               image-tag pins (graph-node, gateway, dipper, etc.)
  - accounts-role-named.env    deterministic test secrets per role
  - mock-reo.env               REO_MOCK toggle (default 1)
  - indexing-payments.env      DIPs overlay: dipper image, dips-fork indexer-rs

justfile: `just resolve`, `just recipes`, `just recipe-active`; `just up`
takes an optional recipe arg; `just reset` force-removes per-test stacks
before volume cleanup so leftover container refs don't silently skip the
wipe.

scripts/resolve-recipe.sh: the resolver itself. Recipe → fragment list →
merged .env with provenance comments. Idempotent; fails fast if a referenced
fragment is missing.

graph-tally-aggregator/run.sh: post-rebase alignment with main's argument
shape.
…en tests #[ignore]

Replace the shared-indexer test pattern with a per-test IndexerHandle so
allocation tests don't race the indexer-agent auto-reconciler running in
the main stack. Each #[serial(test_indexer)] test gets its own compose
project (`local-network-test-<test_name>`) with a dedicated indexer-agent
+ indexer-service + start-indexing wired against the shared chain/ipfs/
graph-node via the cross-stack network.

Pieces:

  - compose/test-indexer.yaml: per-test compose project; reuses main-project
    images (no rebuild) and attaches to the cross-stack network.
  - tests/src/indexer.rs: IndexerHandle fixture — spins up the per-test
    stack, exposes its INDEXER_ADDRESS/SECRET/MNEMONIC, tears down on drop.
  - indexer-agent/run.sh: capture TEST_INDEXER_* before sourcing .env so
    per-test identity overrides the production-default INDEXER_* values.
    Also force-syncs protocol-infra subgraphs (indexing-payments,
    block-oracle) so the reconciler doesn't pause them.

REO admin signing keys: four new role-named secrets (OPERATOR/ORACLE/
PAUSE_ADMIN/SUBGRAPH_AVAILABILITY) wired through TestNetwork. cast.rs
gains rm_provider_eligibility_oracle() / is_mock_reo_live() / oracle-
signed variants of cast_send for tests that exercise REO-A's renewal
mechanics.

MockRewardsEligibilityOracle wiring: TestNetwork.contracts.reo_mock
exposes the deployed mock address; is_mock_reo_live() reads the live
RewardsManager binding so REO_MOCK toggles are picked up at runtime.

Allocation/eligibility/reward/denial test suites migrate to IndexerHandle.
indexer_handle_smoke.rs covers the fixture itself.

Tests left ignored under default config:

  reo_governance.rs (3 tests): require the non-mock REO_MOCK=0 recipe;
    default recipes wire MockRewardsEligibilityOracle via mock-reo.env so
    most tests bypass the eligibility gate. Gated on a future REO-real recipe.

  provision_management.rs (1 test): provision_lifecycle failed on this
    branch but passed on main. Real test failure, not a config gap.
    Marked ignored pending triage.

reo_governance.rs::pause_blocks_writes also rewrites its assertions: the
new REO contract has no whenNotPaused guards, so writes succeed while
paused. The test verifies the new behaviour instead of the old.
…k reset

Three host-side helpers for working with stack state:

  scripts/dump-state.sh
    Captures container statuses/health/logs, chain state, indexer-agent
    management API state, per-test compose projects, and a SUMMARY.md
    into _dumps/<UTC-timestamp>/. Best-effort — missing pieces don't
    fail the dump. Useful for offline debugging.

  scripts/bake-snapshot.sh
    Snapshots all compose-declared named volumes as zstd tarballs plus
    a manifest.json (recipe, git SHA, image digests) into
    _snapshots/current/. Briefly stops services to capture consistent
    volume state. The recipe + .env are saved alongside so the same
    recipe restores cleanly.

  scripts/restore-snapshot.sh
    Wipes the named volumes, restores them from a snapshot, restores
    the recipe + .env, and brings the stack back up via the resolver.
    Anvil's chain time after restore is whatever was at bake time; tests
    sensitive to wall-clock alignment can re-sync separately.

`just dump-state`, `just bake-snapshot`, `just restore-snapshot` wrap them.
Bump freshness_threshold from the upstream default of 10 to 10_000.

Default is easy to trip under heavy test load when many blocks get mined
between DataEdge submissions: block-oracle goes into a 2s-cooldown spiral
and the epoch subgraph lags the chain. Indexer-agent and tests then time
out waiting for epoch sync — particularly painful for per-test stacks
that share the host block-oracle.

10_000 is comfortably above anything a local test run produces between
submissions while staying well below the real-network threshold.
README rewritten to lead with the recipe flow (`just recipes`, `just up
[recipe]`, .recipe pinning) and the baseline/indexing-payments split,
since the recipe system is now the only way to bring the stack up.
Shell snippets use bash code fences.

Sweep the docs/ tree:

  - docs/indexing-payments/* — safe-based IP planning docs +
    RecurringCollectorDeployment notes; pre-date the in-tree dipper +
    indexing-payments subgraph wiring.
  - docs/eligibility-oracle/{Goal,Status}.md — captured the REO local-network
    integration work that's now in graph-contracts/run.sh + eligibility-oracle-node/.
  - docs/testing/reo/{Goal,Status,CurationSignal}.md, docs/testing/TestFramework.md —
    the testing-layers + Rust-vs-bash decision is implemented; live state is
    in the tests/ crate and tests/README.md.
  - docs/explorer/Goal.md — Graph Explorer integration is still outstanding
    work but not active; captured externally with the UI→contract reference
    table preserved for future test scripts.
  - docs/flows/IndexingPaymentsTesting.md — pre-chain-listener dipper flow.
  - docs/flows/EligibilityOracleTesting.md — uses the old
    `RewardsEligibilityOracle` name and pre-recipe COMPOSE_PROFILES editing.
    scripts/test-reo-eligibility.sh has the same staleness; both need a
    revalidation pass against the GIP-0088 REO-A/B contract surface,
    captured externally.

What stays:

  - docs/flows/IndexerAgentTesting.md — describes scripts/test-indexer-agent.sh,
    still useful for upstream-source indexer-agent development.
  - docs/README.md, docs/flows/README.md — trimmed to match the new shape.
…-4140439

Replace the clone-at-build graph-contracts image with a thin wrapper over
the published ghcr.io/graphprotocol/contracts workspace image, selected
via CONTRACTS_VERSION (sha-pinned for reproducibility, 'local' for a
locally-built workspace image). Restore upstream helper scripts pruned in
error.

Squash of: 3e834fd, 86d7899, 6d9a516
…e 7 tests

- preserve anvil historical state across periodic dumps (archive access
  for snapshot/restore and reward queries)
- rename per-test compose services to kill the cross-stack DNS collision
  that made parallel IndexerHandle stacks flaky
- serialize indexer_handle_smoke into the alloc nextest group
- fix close-then-create wait, retry transient close errors
- un-ignore the 7 alloc tests these fixes repair

Squash of: 651c3ce, 2d73ba2, c21d040, bd07e24, 6880753
Add the reo-live recipe (REO_MOCK=0, RewardsManager wired to REO-A) and
runtime-gate the RM-dependent REO tests on it. Make eligibility tests
robust: correct pause_blocks_writes signer, advance chain time before
renewals, poll for network-subgraph indexing instead of fixed waits,
tolerate chain-time jumps in eligibility_expires_after_period.

Squash of: 39293e7, db14a8e, f0d0654, 06ffac1, 7c833c1, 86bfaa8
Add the 'main' recipe as a pre-GIP-0088 surrogate (Phase 3 skipped) and
make test selection recipe-driven via nextest profiles instead of
self-skips. Per-test stack hygiene: compose down -v before per-test
bring-up; slow-timeout floor and fail-safe reo group membership;
un-ignore eligibility_lifecycle under reo-live serialization.

Squash of: a685794, 36f6016, e5a92e9, f40ff3f, 6c5720b, aec94e3
just test-set: resolve recipe → restore its baked snapshot → nextest with
the recipe's test_profile, so every run starts from identical clean
state. Baselines are keyed by a content fingerprint of the resolved
recipe env plus container/compose/script inputs (FRESH/STALE/MISSING
resolution). Capture stack state on unexpected test failure; restore
snapshots on the snapshot's own recipe; realign chain time on restore
past baked contract timestamps; assorted test robustness fixes
(chain-time skew, eligibility-period close race, just-created-allocation
pending assert).
Agent accepts and collects RecurringCollector agreements on-chain off the
indexing-payments subgraph; dips_lifecycle E2E covers the flow under the
indexing-payments recipe.
Restructure recipes so the default represents what is actually deployed:

- baseline (default): mainnet-deployed contract image
  (sha-920db0e = contracts 4a1c24a8 + image packaging only), Phase 3
  skipped, profiles block-oracle+explorer. Absorbs and supersedes the
  'main' recipe, whose deferred plan this implements.
- gip-0088 (new, was baseline): post-audit image (sha-4140439) with the
  full GIP-0088 deployment, via the new config/gip-0088.env fragment.
- indexing-payments / reo-live: now layer gip-0088.env.
- nextest: default profile is the deployed-surface smoke set; the full
  set moves to profile.gip-0088; the GIP-0088 reclaim test is excluded
  from baseline (no reclaim functions on the deployed RewardsManager).
- baseline-surface fixes: skip the indexing-payments subgraph when no
  RecurringCollector is deployed; indexer-agent requires it only under
  INDEXING_PAYMENTS_ENABLED=1; guard address-book symlinks per package;
  idempotent governor fixup wiring subgraphAvailabilityOracle to the
  role-named account (the old protocol config hardcodes a stale ganache
  mnemonic address).

Validated from clean bring-ups: baseline 16/16, gip-0088 43/43,
indexing-payments 43/43, reo-live 3/3.

Squash of: 3ae1629, daa2920, 880d68a, 7eef9bc, 7686c40, 08e324e

Also folds main's documentation and dev-flow improvements over the
recipes rewrite: quick-start command set, the .env consumers note, the
compose/dev binary-mount section (with the gateway build-env dev-builder
stage), .env.local layering on the just compose invocations, and main's
query-gateway.sh → query-subgraph.sh rename.
@RembrandtK RembrandtK mentioned this pull request Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant