Skip to content

v1.3.9 — Model evaluation SDK & CLI

Latest

Choose a tag to compare

@leeclemnet leeclemnet released this 29 May 20:28
· 2 commits to main since this release
e4c30e1

Model evaluations SDK & CLI

Wraps the public /{workspace}/model-evals REST surface so users can read evaluation results — mAP, confidence sweep, per-class performance, confusion matrix, vector clusters, per-image stats, recommendations — from Python and from the CLI without hitting the API directly.

SDK

  • Workspace.evals(project=None, version=None, model=None, status=None, limit=None) — list evals as ModelEval instances pre-populated with metadata from the list response.
  • Workspace.eval(eval_id) — fetch a single eval (returns a ModelEval with .summary populated when status is done).
  • ModelEval.refresh() — re-fetch the eval header.
  • ModelEval.map_results(), .confidence_sweep(), .performance_by_class(split=None), .confusion_matrix(split=None, confidence=None), .vector_analysis(confidence=None), .image_predictions(split=None, confidence=None, limit=None, offset=None), .recommendations() — one method per panel; each returns the raw JSON dict.

CLI

  • roboflow eval list [--project P] [--version V] [--model M] [--status S] [--limit N]
  • roboflow eval get <eval_id>
  • roboflow eval map-results <eval_id>
  • roboflow eval confidence-sweep <eval_id>
  • roboflow eval performance-by-class <eval_id> [--split S]
  • roboflow eval confusion-matrix <eval_id> [--split S] [--confidence N]
  • roboflow eval vector-analysis <eval_id> [--confidence N]
  • roboflow eval image-predictions <eval_id> [--split S] [--confidence N] [--limit N] [--offset N]
  • roboflow eval recommendations <eval_id>

Exit codes are stable per error class so shell scripts and AI agents can react without parsing message strings: 3 for model_eval_not_found (404), 4 for model_eval_not_done (409), 5 for invalid_split / invalid_confidence (400). Every command supports --json for structured output.

Low-level (roboflow.adapters.rfapi)

  • list_model_evals, get_model_eval, get_model_eval_map_results, get_model_eval_confidence_sweep, get_model_eval_performance_by_class, get_model_eval_confusion_matrix, get_model_eval_vector_analysis, get_model_eval_image_predictions, get_model_eval_recommendations.
  • New typed exceptions ModelEvalNotFoundError, ModelEvalNotDoneError, InvalidSplitError, InvalidConfidenceError (all subclasses of RoboflowError) so callers can distinguish "eval doesn't exist" from "eval still running" from "bad argument" without parsing strings.

The endpoints require the model-eval:read scope.

Fixed

  • rf-detr model upload: accept checkpoints whose args is a plain dict (e.g. EMA checkpoints) when extracting class names, instead of raising TypeError from vars().

Changed

  • Pin typer<0.26 and declare click explicitly: typer 0.26 vendors its own click and drops the external dependency, which broke the CLI and its type checks.

Full Changelog: v1.3.8...v1.3.9