Skip to content

Release: main → prod#23

Merged
m-messer merged 10 commits into
prodfrom
main
Jun 15, 2026
Merged

Release: main → prod#23
m-messer merged 10 commits into
prodfrom
main

Conversation

@m-messer

@m-messer m-messer commented Jun 15, 2026

Copy link
Copy Markdown
Member

Summary

  • Fix flaky test synchronization (replace time.After with m.Shutdown)
  • Fix referenceSolution content extraction to match µEd spec
  • µEd error handling improvements
  • µEd versioning support
  • Sandboxed workers
  • µEd OpenAPI validation middleware
  • Build workflow: latest image tag only on tag refs
  • Release workflow setup

Test plan

  • All CI checks pass on main before merging
  • Confirm GitHub Actions release workflow triggers on merge and creates a new tag/release

m-messer and others added 10 commits May 25, 2026 13:21
* Added GitHub Actions release workflow and updated build workflow to trigger on version tags

* Triggered evaluation-function-base release from release workflow
* Added OpenAPI request/response validation middleware and integrated OpenAPI specification

* Add embedded µEd OpenAPI specification

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Move µEd OpenAPI spec into runtime/schema

Relocates the spec from api/ into runtime/schema/ alongside the existing
JSON schema files, and renames it to mued_v0.1.0.yml to make the version
explicit. Removes the api/ package; embed is now owned by runtime/schema.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Ignore .idea/ directory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Make OpenAPI response validation strict for µEd routes

Previously, responses that failed spec validation were only logged as
warnings and forwarded anyway. Now a failed µEd response validation
returns 500 to the caller. The legacy / route is unaffected — it has
no matching path in the spec so the middleware passes it through
unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Update Go version to 1.25 in Dockerfile for builder stage

* Support OpenAPI 3.1.0 spec in router validation

Pass IsOpenAPI31OrLater and AllowExtraSiblingFields options to the
legacy router so description/summary siblings on $ref objects (valid
in 3.1.0) don't fail validation. Also propagate errors from
OpenAPIMiddleware and NewHttpServer instead of ignoring them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Refactor error responses and improve OpenAPI middleware robustness

Use `writeJSONError` helper for consistent JSON error responses in µEd handler. Enhance OpenAPI response validation to prevent buffer drainage during snapshot handling.

* Add health status response to µEd handler based on test results

* Update µEd test assertion to verify "status" field instead of "tests_passed" field

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Added Linux-only nsjail-based sandboxing for worker processes, including CLI support, configuration, and testing.

* Added validation for `Content-Length` in `headerPrefixPipe` and tests for oversized and negative values

* Enhanced `build.yml` to compile and install nsjail from source instead of using system package.

* Switched nsjail mode from "once" to "exec" for direct command execution with inherited stdio.

* Replaced `--time_limit` with `--rlimit_cpu` in nsjail arguments to ensure compatibility in containers without cgroupv2.

* Updated sandbox test to replace `--time_limit` with `--rlimit_cpu` and adjusted workflow to run integration tests with elevated permissions.
* Added `MuEdHandler` to handle `/evaluate` and `/evaluate/health` endpoints with authentication and runtime integration, along with associated tests

* Added `workflow_dispatch` trigger to GitHub Actions build workflow

* Removed `NewCommandRoute` and corrected route definitions for `/evaluate` and `/evaluate/health`

* Added `NormalizePath` middleware to canonicalize `/evaluate` and `/evaluate/health` paths across server and lambda integrations

* Added API versioning support for `/evaluate` and `/evaluate/health` endpoints with header validation, default version handling, and capability reporting

* Added OpenAPI request/response validation middleware and integrated OpenAPI specification

* Add embedded µEd OpenAPI specification

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Move µEd OpenAPI spec into runtime/schema

Relocates the spec from api/ into runtime/schema/ alongside the existing
JSON schema files, and renames it to mued_v0.1.0.yml to make the version
explicit. Removes the api/ package; embed is now owned by runtime/schema.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Ignore .idea/ directory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Make OpenAPI response validation strict for µEd routes

Previously, responses that failed spec validation were only logged as
warnings and forwarded anyway. Now a failed µEd response validation
returns 500 to the caller. The legacy / route is unaffected — it has
no matching path in the spec so the middleware passes it through
unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Simplify µEd response encoding by removing unnecessary "status" field logic

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Added `MuEdHandler` to handle `/evaluate` and `/evaluate/health` endpoints with authentication and runtime integration, along with associated tests

* Added `workflow_dispatch` trigger to GitHub Actions build workflow

* Removed `NewCommandRoute` and corrected route definitions for `/evaluate` and `/evaluate/health`

* Added `NormalizePath` middleware to canonicalize `/evaluate` and `/evaluate/health` paths across server and lambda integrations

* Added API versioning support for `/evaluate` and `/evaluate/health` endpoints with header validation, default version handling, and capability reporting

* Refactored `/evaluate` and `/evaluate/health` error handling to standardize JSON responses with `writeMuEdError` and included `X-Api-Version` header validation and degraded health status support.

* Added OpenAPI request/response validation middleware and integrated OpenAPI specification

* Add embedded µEd OpenAPI specification

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Move µEd OpenAPI spec into runtime/schema

Relocates the spec from api/ into runtime/schema/ alongside the existing
JSON schema files, and renames it to mued_v0.1.0.yml to make the version
explicit. Removes the api/ package; embed is now owned by runtime/schema.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Ignore .idea/ directory

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Make OpenAPI response validation strict for µEd routes

Previously, responses that failed spec validation were only logged as
warnings and forwarded anyway. Now a failed µEd response validation
returns 500 to the caller. The legacy / route is unaffected — it has
no matching path in the spec so the middleware passes it through
unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Update µEd handler to use dynamic status codes for responses

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The spec defines task.referenceSolution as a plain object with
additionalProperties, not a typed Submission wrapper. Change
MuEdTask.ReferenceSolution from *MuEdSubmission to map[string]any
and extract its content directly using the submission's type to
determine the expected key.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ation

Replace the 1ms timing-based wait with pool.Close() via m.Shutdown,
consistent with all other tests in the file that rely on the same
background goroutine pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep main's version of release.yml which includes the
Trigger evaluation-function-base release step.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@m-messer m-messer merged commit de0d81f into prod Jun 15, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant