Skip to content

perf: reduce tracker cold-start and concurrent measurement overhead#1246

Open
davidberenstein1957 wants to merge 7 commits into
masterfrom
davidberenstein1957/codecarbon-api-speed-test
Open

perf: reduce tracker cold-start and concurrent measurement overhead#1246
davidberenstein1957 wants to merge 7 commits into
masterfrom
davidberenstein1957/codecarbon-api-speed-test

Conversation

@davidberenstein1957

@davidberenstein1957 davidberenstein1957 commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR reduces CodeCarbon measurement launch latency and improves concurrent-run throughput while preserving existing behavior. Changes focus on deferring work until it is needed, caching hardware detection within a process, and slimming import paths — no benchmark tooling or optimization logs are included.

Performance results — cold launch (offline Mac ARM)

Metric Before After Improvement
Tracker __init__ ~15.7 s ~94 ms ~99% faster
start() (cold) ~1.0 s ~194 ms ~81% faster
First sample (cold) ~18.2 s ~288 ms ~98% faster (~63×)
Warm lifecycle (init+start+stop, same process) ~6 ms
CLI monitor subprocess overhead (codecarbon-monitor … sleep 2) ~1.5 s ~820 ms ~45% faster

Cold-path numbers are the first tracker in a fresh process; warm-path numbers reuse cached hardware within the same process.

Performance results — run throughput (offline, warm, same process)

Repeated OfflineEmissionsTracker(output_methods=[]) lifecycles (init → start → stop) in one Python process:

Mode Before After Improvement
Sequential runs / min ~926 ~1,695 ~1.8×
Parallel runs / min (8 threads) ~7,268 ~11,260 ~1.5×
Warm run latency (p50) ~62 ms ~6 ms ~10×

Before = hardware-cache milestone (2026-06-17); after = final warm-lifecycle + connection-reuse optimizations. Parallel benchmark: 8 worker threads, 30 s sustained load.

API write-path improvements

Client and server changes reduce overhead on the hot path (POST /runs, POST /emissions):

Area Change Effect
Client Shared requests.Session per API base URL (ApiClient, HTTPOutput) HTTP keep-alive — avoids per-upload TCP handshakes when posting emissions repeatedly
Client Lazy POST /runs — deferred until first emission upload Tracker construction no longer blocks on run registration
Server Docker entrypoint: SQL readiness poll instead of fixed sleep 5; skip alembic when already at head Faster, more reliable API container startup
Server Skip create_all when core tables already exist Warm restarts avoid redundant schema work

Local ponytail load tests (POST /emissions at rising concurrency) require a running carbonserver + Postgres stack; production health baseline is ~12 req/s (GET /). The run-throughput table above is the primary quantified gain for high-frequency tracker use.

What changed (functional only)

Tracker lifecycle

  • Lazy-import heavy modules (Emissions, ResourceTracker, output handlers, geography) in emissions_tracker.py
  • Defer hardware probing, system metadata, cloud/geo validation, and emissions engine construction until first use
  • Skip 1 Hz power monitor scheduler when output_methods=[]
  • Skip redundant measurement on stop()/flush() when a sample was just taken
  • Global cpu_percent prime once per process (non-blocking reads afterward)

Hardware detection

  • New process-level hardware setup cache (codecarbon/core/hardware_cache.py) — CPU/GPU/RAM detection reused across tracker instances
  • Platform-aware CPU backend order (Linux→RAPL first; Mac ARM skips PowerGadget)
  • Mac ARM prefers fast cpu_load before powermetrics sudo probe; falls back to powermetrics when cpu_load unavailable
  • Cached GPU probe results, shared cpu.TDP() lookup, and global detect_cpu_model() cache

Imports & I/O

  • Lazy pandas/CSV loading in input.py and core/emissions.py
  • Lazy DataSource in cpu.py; cache PowerGadget/powermetrics availability probes
  • Slimmer public import chain in __init__.py

CLI

  • Lazy imports in cli/main.py for auth/API commands
  • New lightweight codecarbon-monitor entry point (cli/monitor_main.py) for monitor -- <command> without pulling auth/questionary
  • Lazy tracker imports in run_and_monitor; strip leaked subcommand/-- tokens from ctx.args

API client & server

  • Shared requests.Session per API base URL for client + HTTP output
  • Carbonserver: skip create_all when tables exist; Docker entrypoint skips alembic when already at head

Test plan

  • CODECARBON_ALLOW_MULTIPLE_RUNS=True pytest --ignore=tests/test_viz_data.py -m 'not integ_test' tests/ (508 passed)
  • pytest tests/test_emissions_tracker.py tests/test_resource_tracker.py tests/test_gpu.py tests/test_input.py tests/test_powermetrics.py tests/test_core_util.py (106 passed)
  • Manual: codecarbon monitor --offline --country-iso-code FRA -- sleep 1
  • Manual: codecarbon-monitor monitor --offline --country-iso-code FRA -- sleep 1
  • Manual: in-process warm lifecycle (init → start → stop twice in same Python session)

Notes

Benchmark scripts (scripts/benchmark_*.py, scripts/optimization_log.py, scripts/profile_optimization.py) were not included in this PR — only production code and test updates. Throughput numbers above were captured during development on offline Mac ARM (2026-06-17).

Made with Cursor

Defer heavy imports and hardware probing until first use, cache hardware
setup per process, and add a lightweight codecarbon-monitor CLI entry point
so measurement launch and parallel runs stay fast without changing behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
davidberenstein1957 and others added 4 commits June 17, 2026 23:24
Skip the slow powermetrics sudo probe on Apple Silicon when cpu_load
setup succeeds, strip leaked subcommand tokens from monitor ctx.args,
and update tests for lazy tracker imports in run_and_monitor.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use class-name hardware cache serialization to survive module reloads in
tests, lazy-import get_datetime_with_timezone in config CLI, add probe cache
clear helpers, and update tests for lazy imports and get_cached_tdp.

Co-authored-by: Cursor <cursoragent@cursor.com>
Provide harnesses to measure cold-start, throughput, and API latency during
optimization so regressions can be caught and logged consistently.

Co-authored-by: Cursor <cursoragent@cursor.com>
Remove local-only harnesses used during optimization; the library perf
changes and their tests are sufficient for review without dev tooling.

Co-authored-by: Cursor <cursoragent@cursor.com>
@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.94349% with 45 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.75%. Comparing base (58acafa) to head (fe75a1e).

Files with missing lines Patch % Lines
codecarbon/core/resource_tracker.py 71.42% 14 Missing ⚠️
codecarbon/core/api_client.py 67.74% 10 Missing ⚠️
codecarbon/core/hardware_cache.py 94.64% 6 Missing ⚠️
codecarbon/core/powermetrics.py 64.70% 6 Missing ⚠️
codecarbon/emissions_tracker.py 96.15% 3 Missing ⚠️
codecarbon/output_methods/http.py 77.77% 2 Missing ⚠️
codecarbon/cli/main.py 93.33% 1 Missing ⚠️
codecarbon/cli/monitor_main.py 96.15% 1 Missing ⚠️
codecarbon/core/config.py 66.66% 1 Missing ⚠️
codecarbon/core/cpu.py 92.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1246      +/-   ##
==========================================
- Coverage   89.17%   88.75%   -0.43%     
==========================================
  Files          47       49       +2     
  Lines        4510     4810     +300     
==========================================
+ Hits         4022     4269     +247     
- Misses        488      541      +53     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

davidberenstein1957 and others added 2 commits June 17, 2026 23:50
Apply formatter/linter fixes, extract platform CPU backend selection to
satisfy flake8 complexity, stabilize the force_cpu_power load test with a
mocked cpu_percent, and add hardware_cache/monitor_main coverage tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
Avoid isinstance checks across module reload boundaries and mock
AppleSiliconChip rebuild so powermetrics is not required on non-macOS runners.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants