perf: reduce tracker cold-start and concurrent measurement overhead#1246
Open
davidberenstein1957 wants to merge 7 commits into
Open
perf: reduce tracker cold-start and concurrent measurement overhead#1246davidberenstein1957 wants to merge 7 commits into
davidberenstein1957 wants to merge 7 commits into
Conversation
Defer heavy imports and hardware probing until first use, cache hardware setup per process, and add a lightweight codecarbon-monitor CLI entry point so measurement launch and parallel runs stay fast without changing behavior. Co-authored-by: Cursor <cursoragent@cursor.com>
Skip the slow powermetrics sudo probe on Apple Silicon when cpu_load setup succeeds, strip leaked subcommand tokens from monitor ctx.args, and update tests for lazy tracker imports in run_and_monitor. Co-authored-by: Cursor <cursoragent@cursor.com>
Use class-name hardware cache serialization to survive module reloads in tests, lazy-import get_datetime_with_timezone in config CLI, add probe cache clear helpers, and update tests for lazy imports and get_cached_tdp. Co-authored-by: Cursor <cursoragent@cursor.com>
Provide harnesses to measure cold-start, throughput, and API latency during optimization so regressions can be caught and logged consistently. Co-authored-by: Cursor <cursoragent@cursor.com>
Remove local-only harnesses used during optimization; the library perf changes and their tests are sufficient for review without dev tooling. Co-authored-by: Cursor <cursoragent@cursor.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1246 +/- ##
==========================================
- Coverage 89.17% 88.75% -0.43%
==========================================
Files 47 49 +2
Lines 4510 4810 +300
==========================================
+ Hits 4022 4269 +247
- Misses 488 541 +53 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Apply formatter/linter fixes, extract platform CPU backend selection to satisfy flake8 complexity, stabilize the force_cpu_power load test with a mocked cpu_percent, and add hardware_cache/monitor_main coverage tests. Co-authored-by: Cursor <cursoragent@cursor.com>
Avoid isinstance checks across module reload boundaries and mock AppleSiliconChip rebuild so powermetrics is not required on non-macOS runners. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR reduces CodeCarbon measurement launch latency and improves concurrent-run throughput while preserving existing behavior. Changes focus on deferring work until it is needed, caching hardware detection within a process, and slimming import paths — no benchmark tooling or optimization logs are included.
Performance results — cold launch (offline Mac ARM)
__init__start()(cold)codecarbon-monitor … sleep 2)Cold-path numbers are the first tracker in a fresh process; warm-path numbers reuse cached hardware within the same process.
Performance results — run throughput (offline, warm, same process)
Repeated
OfflineEmissionsTracker(output_methods=[])lifecycles (init → start → stop) in one Python process:Before = hardware-cache milestone (2026-06-17); after = final warm-lifecycle + connection-reuse optimizations. Parallel benchmark: 8 worker threads, 30 s sustained load.
API write-path improvements
Client and server changes reduce overhead on the hot path (
POST /runs,POST /emissions):requests.Sessionper API base URL (ApiClient,HTTPOutput)POST /runs— deferred until first emission uploadsleep 5; skip alembic when already at headcreate_allwhen core tables already existLocal ponytail load tests (
POST /emissionsat rising concurrency) require a running carbonserver + Postgres stack; production health baseline is ~12 req/s (GET /). The run-throughput table above is the primary quantified gain for high-frequency tracker use.What changed (functional only)
Tracker lifecycle
Emissions,ResourceTracker, output handlers, geography) inemissions_tracker.pyoutput_methods=[]stop()/flush()when a sample was just takencpu_percentprime once per process (non-blocking reads afterward)Hardware detection
codecarbon/core/hardware_cache.py) — CPU/GPU/RAM detection reused across tracker instancescpu_loadbefore powermetrics sudo probe; falls back to powermetrics when cpu_load unavailablecpu.TDP()lookup, and globaldetect_cpu_model()cacheImports & I/O
input.pyandcore/emissions.pyDataSourceincpu.py; cache PowerGadget/powermetrics availability probes__init__.pyCLI
cli/main.pyfor auth/API commandscodecarbon-monitorentry point (cli/monitor_main.py) formonitor -- <command>without pulling auth/questionaryrun_and_monitor; strip leaked subcommand/--tokens fromctx.argsAPI client & server
requests.Sessionper API base URL for client + HTTP outputcreate_allwhen tables exist; Docker entrypoint skips alembic when already at headTest plan
CODECARBON_ALLOW_MULTIPLE_RUNS=True pytest --ignore=tests/test_viz_data.py -m 'not integ_test' tests/(508 passed)pytest tests/test_emissions_tracker.py tests/test_resource_tracker.py tests/test_gpu.py tests/test_input.py tests/test_powermetrics.py tests/test_core_util.py(106 passed)codecarbon monitor --offline --country-iso-code FRA -- sleep 1codecarbon-monitor monitor --offline --country-iso-code FRA -- sleep 1Notes
Benchmark scripts (
scripts/benchmark_*.py,scripts/optimization_log.py,scripts/profile_optimization.py) were not included in this PR — only production code and test updates. Throughput numbers above were captured during development on offline Mac ARM (2026-06-17).Made with Cursor