Skip to content

fix: [SDK-4794] prewarm dispatchers at cold-start entry points#2664

Merged
fadi-george merged 6 commits into
mainfrom
fix/SDK-4794-prewarm-entry-points
Jun 23, 2026
Merged

fix: [SDK-4794] prewarm dispatchers at cold-start entry points#2664
fadi-george merged 6 commits into
mainfrom
fix/SDK-4794-prewarm-entry-points

Conversation

@abdulraqeeb33

@abdulraqeeb33 abdulraqeeb33 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Call OneSignalDispatchers.prewarm() at every cold-start entry point so the IO/Default/SerialIO dispatcher + executor + scope construction happens on the low-priority OneSignal-prewarm daemon instead of on the thread that handles the wake-up. Entry points covered:

  • Receivers: FCMBroadcastReceiver, NotificationDismissReceiver, BootUpReceiver, UpgradeReceiver (before goAsync())
  • ADM: ADMMessageHandler, ADMMessageHandlerJob (each callback)
  • HMS: OneSignalHmsEventBridge.onNewToken / onMessageReceived
  • Notification-open trampolines: NotificationOpenedActivityBase, NotificationOpenedActivityHMS
  • Already present (unchanged): OneSignalImp.initWithContext / initWithContextSuspend, SyncJobService.onStartJob

prewarm() is idempotent and fire-and-forget, so the calls are cheap. The prewarm() KDoc is reworded to describe it as a best-effort head start, list the full entry-point table, and tell future contributors to prewarm any new entry point.

Why this approach over the two alternatives

Three options were considered for SDK-4507/SDK-4794 cold-start ANRs (the production OTel showed the first launchOnIO/launchOnSerialIO constructing the ThreadPoolExecutor synchronously on the main thread):

1. Wide — prewarm inside the ThreadUtils helpers (suspendifyOnIO, launchOnIO, …). One chokepoint, automatic for all background work.
Rejected: prewarm() is fire-and-forget, so a helper that prewarms and then immediately dispatches can still win the lazy-init race and construct the executor on the caller thread. It made the race more likely won but didn't actually fix it, and coupled every background dispatch to prewarm.

2. Durable — make the dispatcher defer cold construction off the caller thread (queue the first dispatch and replay it on the daemon so construction can never land on a caller thread, regardless of timing).
Rejected for now: it's a guaranteed fix, but it adds ~70 lines of lock/queue/CompletableJob concurrency code to the most safety-critical file in the SDK, with subtle serial-ordering / join() / cancellation semantics and a permanent review + maintenance cost. That is a large charge to close a gap we have not proven still fires.

3. Chosen — prewarm at the cold-start entry points. The expensive construction can only first hit a thread at a small, stable, enumerable set of doorways (the receivers / jobs / activities / bridges above); deep callers run after init on background threads. Prewarming at each doorway — where there is real lead time before the first dispatch (a goAsync() handoff or initWithContext work) — covers the actual production surface (the OTel ANRs were all at entry points) with ~10 one-line calls and zero new concurrency code.

Trade-off accepted: it is best-effort (a pathologically slow device could still lose the race), and a new entry point must remember to call prewarm() — both documented in the prewarm() KDoc. If telemetry later shows the race still firing, option #2 remains available as a follow-up.

Test plan

  • SyncJobServiceTestsonStartJob calls prewarm() before suspendifyOnIO
  • FCMBroadcastReceiverTests — prewarm fires for real FCM payloads, skipped for google.com/iid token-update intents
  • PrewarmEntryPointTestsBootUpReceiver / UpgradeReceiver / NotificationDismissReceiver prewarm before dispatch
  • OneSignalDispatchersTests — existing off-thread + idempotent prewarm coverage still passes
  • Note: ADMMessageHandler(Job) and OneSignalHmsEventBridge have no unit-test path (the Amazon/Huawei base classes aren't on the test classpath); those calls are verified by inspection.

Made with Cursor

@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

📊 Diff Coverage Report

Diff Coverage Report (Changed Lines Only)

Gate: aggregate coverage on changed executable lines must be ≥ 80% (JaCoCo line data for lines touched in the diff).

Changed Files Coverage

  • OneSignalDispatchers.kt: 11/17 touched executable lines (64.7%) (57 touched lines in diff)
    • 6 uncovered touched lines in this file
  • OneSignalImp.kt: 11/18 touched executable lines (61.1%) (44 touched lines in diff)
    • 7 uncovered touched lines in this file
  • NotificationOpenedActivityHMS.kt: 0/1 touched executable lines (0.0%) (4 touched lines in diff)
    • 1 uncovered touched lines in this file
  • NotificationOpenedActivityBase.kt: 0/1 touched executable lines (0.0%) (4 touched lines in diff)
    • 1 uncovered touched lines in this file
  • OneSignalHmsEventBridge.kt: 0/2 touched executable lines (0.0%) (5 touched lines in diff)
    • 2 uncovered touched lines in this file
  • BootUpReceiver.kt: 0/1 touched executable lines (0.0%) (5 touched lines in diff)
    • 1 uncovered touched lines in this file
  • FCMBroadcastReceiver.kt: 0/1 touched executable lines (0.0%) (6 touched lines in diff)
    • 1 uncovered touched lines in this file
  • NotificationDismissReceiver.kt: 0/1 touched executable lines (0.0%) (5 touched lines in diff)
    • 1 uncovered touched lines in this file
  • UpgradeReceiver.kt: 0/1 touched executable lines (0.0%) (5 touched lines in diff)
    • 1 uncovered touched lines in this file
  • ADMMessageHandler.kt: 0/3 touched executable lines (0.0%) (5 touched lines in diff)
    • 3 uncovered touched lines in this file
  • ADMMessageHandlerJob.kt: 0/3 touched executable lines (0.0%) (5 touched lines in diff)
    • 3 uncovered touched lines in this file
  • ⚠️ IOMockHelper.kt: Not in coverage report (may not be compiled/tested)

Overall (aggregate gate)

22/49 touched executable lines covered (44.9% — requires ≥ 80%)

Per-file detail (informational; gate is aggregate above):

  • OneSignalDispatchers.kt: 64.7% (6 uncovered touched lines)

  • OneSignalImp.kt: 61.1% (7 uncovered touched lines)

  • NotificationOpenedActivityHMS.kt: 0.0% (1 uncovered touched lines)

  • NotificationOpenedActivityBase.kt: 0.0% (1 uncovered touched lines)

  • OneSignalHmsEventBridge.kt: 0.0% (2 uncovered touched lines)

  • BootUpReceiver.kt: 0.0% (1 uncovered touched lines)

  • FCMBroadcastReceiver.kt: 0.0% (1 uncovered touched lines)

  • NotificationDismissReceiver.kt: 0.0% (1 uncovered touched lines)

  • UpgradeReceiver.kt: 0.0% (1 uncovered touched lines)

  • ADMMessageHandler.kt: 0.0% (3 uncovered touched lines)

  • ADMMessageHandlerJob.kt: 0.0% (3 uncovered touched lines)

❌ Coverage Check Failed

Aggregate coverage on touched lines is 44.9% (minimum 80%).

📥 View workflow run

fadi-george
fadi-george previously approved these changes Jun 22, 2026
@fadi-george

Copy link
Copy Markdown
Contributor

@claude review

@fadi-george fadi-george dismissed their stale review June 22, 2026 18:03

Taking a deeper look

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — mechanical extension of the existing idempotent prewarm() to the common ThreadUtils entry points.

Extended reasoning...

Overview

This PR adds a single OneSignalDispatchers.prewarm() call to six ThreadUtils entry points (suspendifyOnMain, suspendifyOnSerialIO, suspendifyWithCompletion, suspendifyWithErrorHandling, launchOnIO, launchOnDefault) and one explicit call at FCMBroadcastReceiver.onReceive before goAsync(). KDoc on prewarm() is updated to document the explicit early callsites. Tests are added/updated in ThreadUtilsFeatureFlagTests, SyncJobServiceTests, and a new FCMBroadcastReceiverTests.

Security risks

None. This is a performance/startup optimization that shifts dispatcher cold-init off the main thread. No auth, crypto, permissions, IPC parsing, or data exposure surface is touched. The FCM receiver continues to short-circuit on google.com/iid token-update intents before the prewarm call, matching prior behavior.

Level of scrutiny

Low. Although the file touches threading infrastructure (which would normally warrant high scrutiny), each individual edit is a one-line invocation of a function that already exists, is documented and verified idempotent (@Volatile prewarmStarted guard + double-checked lock), is fire-and-forget (work runs on a separate OneSignal-prewarm daemon thread at NORM_PRIORITY - 2), and swallows failures with a logged fallback. The behavioral change is just "warm dispatchers a bit earlier in more code paths."

Other factors

  • The only bug surfaced is a test-consistency Nit about pre-existing BackgroundManagerTests not stubbing the new prewarm() call on mockkObject(OneSignalDispatchers) — the inline comment notes the race is impractical and CI passes; not blocking.
  • The diff-coverage gate failed at 57.1% (FF-off branches of the new ThreadUtils sites aren't directly verified, and the FCM-receiver explicit prewarm shares behavior with the ThreadUtils prewarm it duplicates by design). The uncovered lines are low-risk one-liners.
  • New test coverage is solid for the FF-on contract on each entry point and for the FCM receiver's filter behavior on token-update intents.

@fadi-george

Copy link
Copy Markdown
Contributor

Thanks for chasing this. I think there’s one core issue to address before merge: the new ThreadUtils-level prewarm() calls are immediately followed by OneSignalDispatchers.launchOn*, which touches the same lazy dispatcher/scope chain that prewarm() is trying to initialize on the daemon thread.

Since prewarm() is fire-and-forget, the caller can still win the lazy-init race, or block behind the prewarm thread’s synchronized lazy initialization. In those cases the first cold suspendifyOnIO / suspendifyOnSerialIO caller can still pay the dispatcher construction cost on the main thread, so this may not reliably fix the cold-start ANR path.

Can we either:

  • keep prewarm only at genuinely early call sites where there is meaningful lead time before first dispatch, or
  • change the implementation so first dispatch cannot initialize the lazy scopes on the caller thread?

I’d also like to see coverage or trace evidence that the first cold helper call no longer initializes IOScope / DefaultScope / SerialIOScope on the caller thread.

Small follow-ups:

  • Remove the SDK-4794 reference from source KDoc.
  • Clarify the FCM test/comment, since IOMockHelper mocks suspendifyOnIO; the test verifies the explicit receiver prewarm call, not the real ThreadUtils prewarm path or end-to-end cold-init behavior.

@nan-li nan-li self-requested a review June 23, 2026 02:39
@abdulraqeeb33 abdulraqeeb33 changed the title fix: [SDK-4794] prewarm dispatchers from ThreadUtils entry points fix: [SDK-4794] prewarm dispatchers at cold-start entry points Jun 23, 2026
Production OTel showed the first launchOnIO/launchOnSerialIO on a cold start
constructing the ThreadPoolExecutor + dispatcher + scope synchronously on the
main thread (Activity trampoline, JobService.onStartJob, FCM goAsync), blocking
it for seconds (SDK-4507 family). prewarm() shifts that construction onto the
low-priority OneSignal-prewarm daemon.

Call prewarm() at the cold-start entry points so the daemon gets a head start
before the first dispatch:

- receivers: FCMBroadcastReceiver, NotificationDismissReceiver, BootUpReceiver,
  UpgradeReceiver (before goAsync())
- ADM: ADMMessageHandler, ADMMessageHandlerJob
- HMS: OneSignalHmsEventBridge.onNewToken / onMessageReceived
- notification-open trampolines: NotificationOpenedActivityBase,
  NotificationOpenedActivityHMS

These join the existing prewarm() calls in OneSignalImp.initWithContext* and
SyncJobService.onStartJob. prewarm() is idempotent and fire-and-forget, so the
calls are cheap. It is a best-effort head start, not a hard guarantee: a caller
that dispatches immediately can still win the lazy-init race, so prewarm is
placed where there is real lead time before the first dispatch.

- reword prewarm() KDoc as a best-effort head start with the entry-point table
- stub OneSignalDispatchers.prewarm() in IOMockHelper so mockkObject specs don't
  spawn the daemon
- add PrewarmEntryPointTests, SyncJobServiceTests and FCMBroadcastReceiverTests
  coverage for the explicit prewarm calls

Co-authored-by: Cursor <cursoragent@cursor.com>
@abdulraqeeb33 abdulraqeeb33 force-pushed the fix/SDK-4794-prewarm-entry-points branch from d3bbefd to c51092c Compare June 23, 2026 15:15
@fadi-george

Copy link
Copy Markdown
Contributor

A few things to fix before merge:

  • Tests only verify prewarm() was called, not that it happened before the first dispatch. Please assert ordering.
  • afterAny { unmockkAll() } can tear down IOMockHelper’s spec-scoped mocks after the first test. Clean up only test-owned mocks or move cleanup to spec teardown.
  • prewarm() catches failures inside the spawned thread, but not Thread(...), priority assignment, or start() failures. Please wrap startup too.
  • Minor: remove internal ticket IDs from source/test comments and names.

…p guard)

Addresses fadi-george's pre-merge review on the prewarm cold-start PR:

- prewarm(): wrap Thread construction, priority assignment, and start() in a
  try/catch (Throwable) so a daemon-startup failure (e.g. OOM "unable to create
  new native thread", SecurityManager, InternalError) can never propagate onto a
  cold-start entry point's caller thread. Reset the started guard on failure so a
  later entry point can retry; dispatchers keep their lazy fallbacks. Also widen
  the in-thread catch to Throwable.
- Tests now assert ordering, not just that prewarm() was called: prewarm() must
  run before the first suspendifyOnIO dispatch (verifyOrder) in
  PrewarmEntryPointTests, FCMBroadcastReceiverTests, and SyncJobServiceTests.
- Replace afterAny { unmockkAll() } with teardown of only the spec-owned mock
  (unmockkObject(OneSignal)). unmockkAll() was stripping IOMockHelper's
  spec-scoped static/object mocks after the first test. Per-test prewarm() call
  counts are isolated via clearMocks(OneSignalDispatchers, answers = false).
- Remove internal ticket IDs from source/test comments and test names.
- Reword the stale IOMockHelper comment that claimed ThreadUtils helpers call
  prewarm() (the wide approach was rejected; only entry points call it).

Co-authored-by: Cursor <cursoragent@cursor.com>
@abdulraqeeb33 abdulraqeeb33 requested a review from sherwinski June 23, 2026 18:25
AR Abdul Azeez and others added 2 commits June 24, 2026 00:14
processIntent() now calls OneSignalDispatchers.prewarm() at this entry point.
The Robolectric spec drives setup() on the test thread without IOMockHelper, so
the real prewarm() ran — spawning the OneSignal-prewarm daemon and lazy-initing
the IO/Default/SerialIO executors + scopes, which persist across the JVM for
later test classes. Register listener(IOMockHelper) so prewarm() is stubbed to a
no-op, matching the pattern already used in FCMBroadcastReceiverTests and
PrewarmEntryPointTests.

Co-authored-by: Cursor <cursoragent@cursor.com>
@fadi-george

Copy link
Copy Markdown
Contributor

A few remaining things to fix/consider:

  • OneSignalDispatchers.prewarm() now catches Throwable, but the new catches are unsuppressed and likely trip detekt’s TooGenericExceptionCaught. Please add the suppression or update the baseline.
  • If the daemon starts but lazy construction fails inside the daemon body, prewarmStarted stays true, so later entry points can’t retry. Please reset the guard there too, or track started/completed separately.
  • We now test ordering for receivers and SyncJobService, but not activities/HMS/ADM. At least activity prewarm() before suspendifyOnDefault coverage seems worth adding.
  • The KDoc should mention that suspendifyOnIO/Default bypass OneSignalDispatchers while ThreadingMode.useBackgroundThreading is false, so prewarm is limited in that state.

Address the 12 weighted issues failing :OneSignal:core:detekt:
- Add KDoc to IFeatureManager/isEnabled, FeatureFlag.isEnabledIn, and the
  public JwtTokenStore listener/getter functions (UndocumentedPublicClass/Function).
- Extract notInitializedMessage() and awaitInitCompletion() from
  waitUntilInitInternal() to get under the LongMethod threshold.
- Suppress TooGenericExceptionCaught on prewarm(), where catching Throwable is
  intentional so a prewarm failure never propagates onto a cold-start caller.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@fadi-george fadi-george merged commit 860b08b into main Jun 23, 2026
5 of 6 checks passed
@fadi-george fadi-george deleted the fix/SDK-4794-prewarm-entry-points branch June 23, 2026 20:26
abdulraqeeb33 added a commit that referenced this pull request Jun 26, 2026
Co-authored-by: AR Abdul Azeez <abdul@onesignal.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Fadi George <fadii925@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants