Skip to content

fix: exclude paused sections from instrumentation measurement#44

Open
not-matthias wants to merge 4 commits into
mainfrom
cod-2033-codspeed-cpp-includes-setupteardown-in-instrumentation-mode
Open

fix: exclude paused sections from instrumentation measurement#44
not-matthias wants to merge 4 commits into
mainfrom
cod-2033-codspeed-cpp-includes-setupteardown-in-instrumentation-mode

Conversation

@not-matthias
Copy link
Copy Markdown
Member

PauseTiming/ResumeTiming now stop/start Callgrind instrumentation in
CODSPEED_ANALYSIS mode, ensuring setup/teardown code is not included
in benchmark measurements.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Mar 13, 2026

Merging this PR will improve performance by ×2.9

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 24 improved benchmarks
❌ 14 regressed benchmarks
✅ 424 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation BM_Capture_int_string[int_string_test] 60.8 ns 90 ns -32.41%
Simulation BM_Capture[int_string_test] 61.4 ns 90.6 ns -32.21%
Simulation BM_Capture[int_test] 61.4 ns 90.6 ns -32.21%
Simulation BM_Capture_int[int_test] 61.4 ns 90.6 ns -32.21%
Simulation BM_Template1_Capture[int_string_test] 61.4 ns 90.6 ns -32.21%
Simulation BM_Template1_Capture[two_type_test, int, double] 61.4 ns 90.6 ns -32.21%
Simulation BM_Template1[int] 61.7 ns 90.8 ns -32.11%
Simulation BM_Template2[int, double] 61.9 ns 91.1 ns -32.01%
Simulation BM_custom_args[1000] 92.2 ns 121.4 ns -24.03%
Simulation BM_custom_args[100] 92.2 ns 121.4 ns -24.03%
Simulation BM_with_args[1000] 92.2 ns 121.4 ns -24.03%
Simulation BM_with_args[100] 92.2 ns 121.4 ns -24.03%
WallTime BM_memcpy[8192] 472.7 ns 573.4 ns -17.57%
WallTime BM_RLE_Decode[10000] 21.7 µs 24.2 µs -10.28%
Simulation BM_large_setup_teardown 13,179,436.1 ns 551.9 ns ×24,000
Simulation BM_large_setup 13,179,310.3 ns 576.7 ns ×23,000
Simulation BM_large_setup 10,561,210.6 ns 607.5 ns ×17,000
Simulation BM_large_setup_teardown 10,561,302.5 ns 612.5 ns ×17,000
Simulation BM_Capture[int_string_test] 89.7 ns 60.6 ns +48.17%
Simulation BM_Template1_Capture[int_string_test] 89.7 ns 60.6 ns +48.17%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing cod-2033-codspeed-cpp-includes-setupteardown-in-instrumentation-mode (4dd226e) with main (f5a917f)

Open in CodSpeed

@avalanche-staging
Copy link
Copy Markdown

avalanche-staging Bot commented Mar 16, 2026

Congrats! CodSpeed is installed 🎉

🆕 468 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks


ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Open in CodSpeed

@not-matthias not-matthias force-pushed the cod-2033-codspeed-cpp-includes-setupteardown-in-instrumentation-mode branch from 5b52b9e to 646ed4c Compare March 16, 2026 18:17
PauseTiming/ResumeTiming now stop/start Callgrind instrumentation in
CODSPEED_ANALYSIS mode, ensuring setup/teardown code is not included
in benchmark measurements.
@not-matthias not-matthias force-pushed the cod-2033-codspeed-cpp-includes-setupteardown-in-instrumentation-mode branch from 646ed4c to 696eaa1 Compare March 18, 2026 09:52
@not-matthias not-matthias marked this pull request as ready for review March 18, 2026 09:52
Replace the per-pause CALLGRIND_STOP/START_INSTRUMENTATION calls with
CALLGRIND_TOGGLE_COLLECT. Toggling instrumentation flushes callgrind's
simulated cache, so every benchmark measured an artificial cold-cache
warmup phase and the cache-warming repetition in RunAnalysis was wasted,
regressing 76 benchmarks. Toggling collection excludes paused sections
without touching simulator state.

Gate the toggles to explicit user PauseTiming()/ResumeTiming() calls:
TOGGLE_COLLECT is parity-based and collection starts enabled, so the
implicit StartKeepRunning()/FinishKeepRunning() bracket must not toggle
or it would invert the collection state for the whole run.

Bump instrument-hooks for the new callgrind_toggle_collect helper.

Fixes COD-2033
…-includes-setupteardown-in-instrumentation-mode

# Conflicts:
#	core/CMakeLists.txt
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Jun 5, 2026

Greptile Summary

This PR refines the CODSPEED_ANALYSIS (Callgrind) instrumentation strategy by replacing the previous codspeed_in_benchmark_loop_ guard with a steady-state toggle model using CodSpeedCollectInit and CALLGRIND_TOGGLE_COLLECT.

  • benchmark.cc introduces the CODSPEED_TOGGLE_COLLECT() macro and the CodSpeedCollectInit static-init sentinel that establishes "collection OFF" as the out-of-loop steady state before main() runs.
  • PauseTiming(), ResumeTiming(), SkipWithMessage(), and SkipWithError() are updated to toggle the Callgrind collection state, keeping setup/teardown and harness code outside measured sections without flushing the simulated CPU cache.
  • core/instrument-hooks submodule is bumped from b9ddb5b to b6c4a75.

Confidence Score: 4/5

Safe to merge for standard benchmarks; benchmarks ending their last iteration with a user PauseTiming() leave the toggle in an inverted state, causing the next benchmark to measure zero instructions silently.

The steady-state toggle design is correct for the common no-PauseTiming case, but the toggle-parity problem from the prior review round remains unresolved in the new implementation.

google_benchmark/src/benchmark.cc — specifically the interaction between FinishKeepRunning()'s internal PauseTiming() and any user-issued PauseTiming() that ends the last loop iteration.

Important Files Changed

Filename Overview
google_benchmark/src/benchmark.cc Core change: adds CodSpeedCollectInit static object and CODSPEED_TOGGLE_COLLECT macro to manage Callgrind collection state; measurement.hpp is now included redundantly for CODSPEED_ANALYSIS.
core/CMakeLists.txt Bumps instrument-hooks GIT_TAG from b9ddb5b to b6c4a75; no logic change.
core/instrument-hooks Submodule pointer advanced to b6c4a75, matching the CMakeLists.txt GIT_TAG bump.

Reviews (2): Last reviewed commit: "ref: exclude benchmark harness from meas..." | Re-trigger Greptile

Comment thread google_benchmark/src/benchmark.cc Outdated
Comment thread google_benchmark/include/benchmark/benchmark.h Outdated
…-state

Toggle callgrind collection off once at process start and make the
PauseTiming()/ResumeTiming() toggles unconditional. Collection is now
only enabled inside the benchmark loop, so State setup, timer reads and
instrument-hooks zero/dump requests no longer appear in the measurement,
and the codspeed_in_benchmark_loop_ gating flag is no longer needed.

The toggle is inlined via CALLGRIND_TOGGLE_COLLECT directly (instead of
calling the instrument-hooks wrapper) so no toggle frame shows up in
flamegraphs; the counted boundary shrinks to the ResumeTiming() epilogue
(~6 instructions). SkipWithMessage/SkipWithError restore the toggle
parity when a benchmark is skipped mid-loop without pausing.

Refs COD-2033
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants