Skip to content

Add instrumentation for memory profiling in Python SDK#38853

Open
tvalentyn wants to merge 23 commits into
apache:masterfrom
tvalentyn:profiler_rebase
Open

Add instrumentation for memory profiling in Python SDK#38853
tvalentyn wants to merge 23 commits into
apache:masterfrom
tvalentyn:profiler_rebase

Conversation

@tvalentyn

@tvalentyn tvalentyn commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This change adds instrumentation launch Python SDK Harness with custom profiling agents and adds memory Profiling capabilities with off-the-shelf profilers.

Design: https://s.apache.org/beam-python-memory-profiling

Requires these dependencies in the runtime environment (included in Beam default containers):

To enable memory profiling with Memray:

python pipeline.py  --runner=DataflowRunner   --profiler_agent=memray

By default, binary profiles are all uploaded to the <TEMP_LOCATION>/profiles and we are attempting to do basic profile postprocessing on the worker by creating memray flamegraphs.

Additional optional params:

  • To use native mode (for profiling extensions) or other memray run options, pass them like so: --profiler_extra_arg="--native". For more information, see https://bloomberg.github.io/memray/run.html.
  • To use aggregated mode, cap profiler duration (sdk process will be terminated+restarted, in-flight bundles will fail and be restarted on the runner): --profiler_extra_arg="--aggregate" --profiler_stop_after_sec=600
  • To use custom location for saving profiles: --profile_location=gs://your-bucket/profiles
  • To configure profile uploading and post-processing intervals: --profile_upload_interval_sec=60 --profile_postprocess_interval_sec=60

Other considerations:

  • Profiles take up disk space. Provision your workers to include more disk space.
  • Postprocessing competes for memory and CPU cycles. To disable, pass --profile_postprocess_interval_sec=0. You will then need to post-process/analyze profiles later, possibly using the same or equivalent environment.
  • Profilers may add some instability. To stop profiling after a first process crash, pass: --profiler_stop_after_crash .

To enable memory profiling with TCMalloc:

python pipeline.py --runner=DataflowRunner --profiler_agent=tcmalloc

To supply additional envirionment variables that configure tcmalloc, pass them individually with this option: --profiler_extra_env_vars="HEAP_PROFILE_TIME_INTERVAL=600". For more information, see: https://gperftools.github.io/gperftools/heapprofile.html

To upload profiles using GCS FUSE instead of gcloud (Dataflow runner only):

python pipeline.py --runner=DataflowRunner --profiler_agent=memray --experiments="gcsfuse_buckets=your-bucket-name:rw" --profile_temp_location=/var/opt/google/gcs/your-bucket-name/profiles --profile_upload_interval_sec=0 . Note: this still requires additional local disk space for storing profiles since Memray uses out-of-order writes, but allows to upload profiles without requiring gcloud cli.

fixes: #20298

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 255 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.71%. Comparing base (d9117f4) to head (d1f3b01).
⚠️ Report is 11 commits behind head on master.

Files with missing lines Patch % Lines
sdks/python/container/profiler.go 0.00% 199 Missing ⚠️
sdks/python/container/boot.go 0.00% 56 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #38853      +/-   ##
============================================
- Coverage     57.77%   57.71%   -0.06%     
  Complexity    12969    12969              
============================================
  Files          2509     2510       +1     
  Lines        260525   260768     +243     
  Branches      10658    10658              
============================================
- Hits         150516   150507       -9     
- Misses       104318   104570     +252     
  Partials       5691     5691              
Flag Coverage Δ
go 28.64% <0.00%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tvalentyn

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces profiling capabilities to the Apache Beam Python SDK worker harness, adding pipeline options to configure profiling agents (like memray and tcmalloc) and implementing background tasks in the Go bootloader to manage profiling, GCS uploads, and post-processing. Feedback on these changes highlights several critical improvements: replacing a potentially failing syscall.Kill with Go's standard process signaling, resolving an ARM64 compatibility issue by avoiding hardcoded absolute paths for tcmalloc, executing memray post-processing sequentially to prevent worker OOMs, implementing a cleanup mechanism for local profile files to avoid disk exhaustion, and fixing a goroutine/ticker leak by handling context cancellation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread sdks/python/container/boot.go
Comment thread sdks/python/container/profiler.go
Comment thread sdks/python/container/profiler.go Outdated
Comment thread sdks/python/container/profiler.go
Comment thread sdks/python/container/profiler.go Outdated
@tvalentyn tvalentyn changed the title Add memory profiler for Python Add instrumentation for memory profiling for Python SDK Jun 10, 2026
@tvalentyn tvalentyn changed the title Add instrumentation for memory profiling for Python SDK Add instrumentation for memory profiling in Python SDK Jun 10, 2026
@tvalentyn tvalentyn marked this pull request as ready for review June 10, 2026 02:09
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@tvalentyn

Copy link
Copy Markdown
Contributor Author

R: @jrmccluskey

@tvalentyn tvalentyn assigned tvalentyn and jrmccluskey and unassigned tvalentyn Jun 10, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

@jrmccluskey jrmccluskey left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few random questions and go things, the core logic makes sense though

Comment thread sdks/python/container/profiler.go Outdated
Comment thread sdks/python/container/profiler.go Outdated
Comment thread sdks/python/container/boot.go Outdated

if err := cmd.Wait(); err != nil {
var timer *time.Timer
var profilingTimedOut atomic.Bool

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically you'd want to define an atomic value outside of the scope of the goroutine so it is shared amongst the goroutines. Defined within the goroutine function I would suspect each one would have its own copy, which makes the atomic unnecessary as best I can tell. I think it comes down to the behavior you want here: do you want each goroutine to individually stop profiling on a timeout, or would you want one timeout to cause all of the goroutines to stop profiling?

@tvalentyn tvalentyn Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the ai review flagged this to me out of concern that the timer coroutine modifies the value concurrently.

@tvalentyn

Copy link
Copy Markdown
Contributor Author

thanks for a quick review! PTAL.

…y value later anyway. Users can pass --profile_upload_interval_sec=0 to disable uploads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve memory profiling for users of Portable Beam Python

2 participants