Skip to content

feat(agent): sellable smoke-test agent — read-only probes, GitHub reports, on-chain verdicts#633

Draft
bussyjd wants to merge 1 commit into
mainfrom
feat/sell-smoke-test-agent
Draft

feat(agent): sellable smoke-test agent — read-only probes, GitHub reports, on-chain verdicts#633
bussyjd wants to merge 1 commit into
mainfrom
feat/sell-smoke-test-agent

Conversation

@bussyjd

@bussyjd bussyjd commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What

A sellable agent service that smoke-tests another Obol Stack's public surface and leaves a verifiable trail: report committed to a public GitHub repo, verdict recorded as an ERC-8004 ValidationRegistry response.

Three pieces:

  1. Embedded smoke-test skill (internal/embed/skills/smoke-test/):
    • scripts/smoke.py — strictly read-only probes against a target base URL: /skill.md (200 + non-empty), /api/services.json (200 + valid catalog shape), per advertised service a 402-shape check (valid x402 accepts[]: scheme/network/payTo/asset/amount), /.well-known/agent-registration.json (informational). Never sends X-PAYMENT, never signs, bodies capped at 1 MiB. Emits report.md + machine-readable results.json (score 0–100, sha256 of the report bytes).
    • scripts/gh_post.py — commits the report to a seller-owned public repo via the GitHub contents API. Token only from env, a no-redirect handler prevents the Bearer header ever following a redirect cross-host, ≤2 writes per run, bounded Retry-After backoff, token never logged.
  2. obol smoke calldata — derives the ValidationRegistry validationResponse calldata for the run (requestHash = keccak256("obol/smoke-test/v1|<target>|<runId>"), golden-tested; selector 0x3d659a96 pinned). The operator submits with their own wallet — the agent and controller never sign chain transactions. This PR carries the additive internal/erc8004/validation.go calldata builders it needs.
  3. Provisioning: stock machinery — obol agent new <name> --skills smoke-test, then obol sell agent <name> to gate it behind x402. GitHub credentials ride the existing optional hermes-env Secret (already whitelisted by the admission policy and RBAC) — this PR adds zero render/RBAC/admission changes.

Why

Buyers paying for a test run shouldn't have to trust the agent's word. The trail is tamper-evident at three layers: the report's sha256 is in results.json, the same bytes are committed to a public repo (independently timestamped), and the same hash lands on-chain in the validation response. Either side rewriting history becomes detectable.

v0 deliberately posts to the seller's report repo — no buyer token handoff, no third-party repo access to reason about. Buyer-repo posting is a follow-up with an explicit access-grant handshake.

Validation

  • Full unit suite green (golden request-hash + calldata tests, CLI flag validation, redirect-guard regression test in tests/).
  • Live smoke on a fresh k3d cluster: agent provisioned via agent new --skills smoke-test with a local Ollama model, skill materialized in-pod, in-pod self-probe of the stack's own public surface → 3/3 checks, score 100/100, well-formed report + results, and obol smoke calldata produced the correct registry calldata for the run. (The per-service 402 check exercised a live paid offer end-to-end.)
  • GitHub posting path covered by unit test + degrades gracefully to local-report-only when no token is configured.
  • New flows/flow-20-smoke-agent.sh (cluster/GitHub gated, skips clean) + docs/guides/smoke-test-agent.md (includes GitHub App vs fine-grained PAT guidance and rate-limit/AUP notes).

Known v0 limitations

  • Results are self-reported by the agent; the verifiable trail makes lying detectable after the fact, not impossible. Pairing runs with independent re-execution is the planned hardening.
  • GitHub 422-on-concurrent-create is not retried (single-writer assumption per report repo).
  • flow-20 is standalone, not yet wired into release-smoke.

…orts, ValidationRegistry verdict calldata

- internal/embed/skills/smoke-test: SKILL.md + smoke.py (read-only x402/catalog
  probes, report.md + results.json, score 0-100) + gh_post.py (seller-owned
  public report repo, contents API, no-redirect token guard, Retry-After backoff)
- internal/erc8004: SmokeTestRequestHash ("obol/smoke-test/v1|<target>|<runId>",
  golden-tested) reusing the existing validationResponse encoder
- cmd/obol: 'obol smoke calldata' mirroring the bounty calldata UX (operator
  submits; agent never signs); GITHUB_TOKEN rides the existing optional
  hermes-env Secret — zero render/RBAC/admission changes
- flows/flow-20-smoke-agent.sh (cluster/GitHub gated, skips clean) +
  docs/guides/smoke-test-agent.md
- review: high finding (Bearer across redirects) fixed; dots-only run-id
  rejected post-review
@bussyjd bussyjd force-pushed the feat/sell-smoke-test-agent branch from e4ad0a2 to 25691e3 Compare June 12, 2026 13:34
@OisinKyne

Copy link
Copy Markdown
Contributor

I don't think obol smoke is a good verb. maybe obol test. i think this is a bit confusingly framed yet, and not easy enough to understand the why, (i guess 'Pay an agent to test your sold services and publish a report about them', longer desc: 'This command sends a third party agent enough fees + data to test a given service you have on offer, publishing a report on the test to a permanent URL, allowing you to use it as an Agent sale verification (ERC8004). Use this service if you want to improve the legitimacy of your service for discovery.

and tbh, i'm not sure we need such a feature yet. IDK if any key registries use the 8004 format for verification, do you know of any?

@bussyjd bussyjd marked this pull request as draft June 12, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants