feat(agent): sellable smoke-test agent — read-only probes, GitHub reports, on-chain verdicts#633
Draft
bussyjd wants to merge 1 commit into
Draft
feat(agent): sellable smoke-test agent — read-only probes, GitHub reports, on-chain verdicts#633bussyjd wants to merge 1 commit into
bussyjd wants to merge 1 commit into
Conversation
…orts, ValidationRegistry verdict calldata
- internal/embed/skills/smoke-test: SKILL.md + smoke.py (read-only x402/catalog
probes, report.md + results.json, score 0-100) + gh_post.py (seller-owned
public report repo, contents API, no-redirect token guard, Retry-After backoff)
- internal/erc8004: SmokeTestRequestHash ("obol/smoke-test/v1|<target>|<runId>",
golden-tested) reusing the existing validationResponse encoder
- cmd/obol: 'obol smoke calldata' mirroring the bounty calldata UX (operator
submits; agent never signs); GITHUB_TOKEN rides the existing optional
hermes-env Secret — zero render/RBAC/admission changes
- flows/flow-20-smoke-agent.sh (cluster/GitHub gated, skips clean) +
docs/guides/smoke-test-agent.md
- review: high finding (Bearer across redirects) fixed; dots-only run-id
rejected post-review
e4ad0a2 to
25691e3
Compare
Contributor
|
I don't think and tbh, i'm not sure we need such a feature yet. IDK if any key registries use the 8004 format for verification, do you know of any? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A sellable agent service that smoke-tests another Obol Stack's public surface and leaves a verifiable trail: report committed to a public GitHub repo, verdict recorded as an ERC-8004 ValidationRegistry response.
Three pieces:
smoke-testskill (internal/embed/skills/smoke-test/):scripts/smoke.py— strictly read-only probes against a target base URL:/skill.md(200 + non-empty),/api/services.json(200 + valid catalog shape), per advertised service a 402-shape check (valid x402accepts[]: scheme/network/payTo/asset/amount),/.well-known/agent-registration.json(informational). Never sendsX-PAYMENT, never signs, bodies capped at 1 MiB. Emitsreport.md+ machine-readableresults.json(score 0–100, sha256 of the report bytes).scripts/gh_post.py— commits the report to a seller-owned public repo via the GitHub contents API. Token only from env, a no-redirect handler prevents the Bearer header ever following a redirect cross-host, ≤2 writes per run, bounded Retry-After backoff, token never logged.obol smoke calldata— derives the ValidationRegistryvalidationResponsecalldata for the run (requestHash = keccak256("obol/smoke-test/v1|<target>|<runId>"), golden-tested; selector0x3d659a96pinned). The operator submits with their own wallet — the agent and controller never sign chain transactions. This PR carries the additiveinternal/erc8004/validation.gocalldata builders it needs.obol agent new <name> --skills smoke-test, thenobol sell agent <name>to gate it behind x402. GitHub credentials ride the existing optionalhermes-envSecret (already whitelisted by the admission policy and RBAC) — this PR adds zero render/RBAC/admission changes.Why
Buyers paying for a test run shouldn't have to trust the agent's word. The trail is tamper-evident at three layers: the report's sha256 is in
results.json, the same bytes are committed to a public repo (independently timestamped), and the same hash lands on-chain in the validation response. Either side rewriting history becomes detectable.v0 deliberately posts to the seller's report repo — no buyer token handoff, no third-party repo access to reason about. Buyer-repo posting is a follow-up with an explicit access-grant handshake.
Validation
tests/).agent new --skills smoke-testwith a local Ollama model, skill materialized in-pod, in-pod self-probe of the stack's own public surface → 3/3 checks, score 100/100, well-formed report + results, andobol smoke calldataproduced the correct registry calldata for the run. (The per-service 402 check exercised a live paid offer end-to-end.)flows/flow-20-smoke-agent.sh(cluster/GitHub gated, skips clean) +docs/guides/smoke-test-agent.md(includes GitHub App vs fine-grained PAT guidance and rate-limit/AUP notes).Known v0 limitations