Skip to content

fix: resolve TOCTOU race on the current user in user executor callbacks#1679

Draft
nan-li wants to merge 2 commits into
mainfrom
nan/sdk-4818
Draft

fix: resolve TOCTOU race on the current user in user executor callbacks#1679
nan-li wants to merge 2 commits into
mainfrom
nan/sdk-4818

Conversation

@nan-li

@nan-li nan-li commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Description

One Line Summary

Read _user once and confirm it still owns the request before mutating it (or logging out) in user executor network-response callbacks, closing a check-then-deref race.

Details

Motivation

Across the user executors, a network-response callback calls OneSignalUserManagerImpl.isCurrentUser(...) and then separately dereferences/mutates _user?.<model> (or calls _logout()).

  • isCurrentUser(_ identityModel:) checks the identity model store; it says nothing about _user.
  • _user is a bare, unsynchronized var, reassigned by start(), _logout() (_user = nil), and setNewInternalUser() — all reachable via login/logout/identify.

Because the check is not atomic with the later _user access, a concurrent login/logout can swap _user inside that window, so the callback applies one user's data/action to a different user — cross-user contamination, or an erroneous logout / data loss. This is the same class of bug as the recently merged stale-FetchUser fix (#1672), which added an isCurrentUser guard that is itself still check-then-deref.

Approach

Add a single atomic accessor to OneSignalUserManagerImpl that eliminates the check-then-deref idiom by capturing _user once and confirming it owns the request, then operating on that captured instance:

  • currentUser(matching modelId:) -> OSUserInternal? — returns the current user only if _user.identityModel.modelId == modelId, else nil.
  • withCurrentUser(matching modelId:, _ body:) — runs body with that captured instance, or no-ops.

No lock is held across the closure body: the model mutations there (hydrate, clearData) fire model-store listeners into the operation repo, so holding a lock across them risks deadlock. Capturing the single _user read into a local is the tactical fix (mirrors the capture-once + compare-modelId pattern).

Sites routed through the accessor:

  • HIGH OSUserExecutor.parseFetchUserResponse — properties + email/SMS hydration (the shared contamination point reached from both the fetch-user and create-user paths).
  • HIGH OSUserExecutor.executeFetchUserRequest success — clearUserData(user) + re-hydrate now operate on one captured user (clearUserData(_:) overload added).
  • MEDIUM _logout() failure/.missing branches: OSUserExecutor.executeFetchUserRequest, OSUserExecutor.executeIdentifyUserRequest, OSPropertyOperationExecutor, OSSubscriptionOperationExecutor, OSIdentityOperationExecutor — gated on currentUser(matching:) != nil immediately before logout.

Scope

  • Behavior is unchanged on the happy path; the guards only change which user a callback acts on when a concurrent user swap has occurred (it now correctly no-ops instead of touching the wrong user).
  • Clearly-safe sites are left alone: create/identify success paths that only enqueue fetchUser, and the identify-conflict branch that already snapshots _user.
  • No public API changes — the new accessors are internal.

Testing

Unit testing

Added deterministic unit tests in OneSignalUserTests for the accessor: the closure runs on the captured current user when the model ID matches, and is skipped (the accessor returns nil) when there is no current user or when a concurrent swap has changed the current user's model ID.

Manual testing

Not done on-device: this is an internal concurrency fix and the race is inherently nondeterministic to reproduce on a device. It is covered by the new unit tests and by the existing user-executor/switch-user/concurrency suites, which continue to pass.

Ran xcodebuild test -scheme OneSignalUserTests on the iPhone 17 Pro simulator. The new tests and the rest of the suite pass. Two pre-existing, timing-based flakes (testBasicCombiningUserUpdateDeltas_resultsInOneRequest and OneSignalUserObjcTests.testSendPurchases, both onlyOneRequest-after-a-fixed-wait assertions) intermittently fail under parallel-clone resource pressure and a simulator launch hiccup, but pass deterministically when re-run in isolation; neither touches the changed behavior.

Affected code checklist

  • Notifications
    • Display
    • Open
    • Push Processing
    • Confirm Deliveries
  • Outcomes
  • Sessions
  • In-App Messaging
  • REST API requests
  • Public API changes

Checklist

Overview

  • I have filled out all REQUIRED sections above
  • PR does one thing
  • Any Public API changes are explained in the PR details and conform to existing APIs (there are none; the new accessors are internal)

Testing

  • I have included test coverage for these changes, or explained why they are not needed
  • All automated tests pass, or I explained why that is not possible (two pre-existing timing flakes noted above)
  • I have personally tested this on my device, or explained why that is not possible (see Manual testing)

Final pass

  • Code is as readable as possible.
  • I have reviewed this PR myself, ensuring it meets each checklist item

Follow-up identified while reviewing #1678.

Made with Cursor

nan-li and others added 2 commits June 25, 2026 17:12
Network-response callbacks across the user executors checked the current user
via isCurrentUser(...) and then separately dereferenced _user (or called
_logout()). A concurrent login/logout could swap _user in that window, so the
callback applied one user's data or action to a different user (cross-user
contamination or an erroneous logout / data loss).

Add withCurrentUser(matching:) and currentUser(matching:) to
OneSignalUserManagerImpl, which read _user once and confirm it still owns the
request's identity model before operating on that single captured instance.
Route through them:
- parseFetchUserResponse properties + email/SMS hydration (the shared fetch and
  create-user contamination point)
- the clearUserData + re-hydrate in the FetchUser success path
- the _logout() failure paths in FetchUser, IdentifyUser, and the property,
  subscription, and identity update executors

No lock is held across the closure: the model mutations there fire model-store
listeners into the operation repo, so holding one could deadlock.

Co-authored-by: Cursor <cursoragent@cursor.com>
Add deterministic unit tests for withCurrentUser/currentUser: the closure runs
on the captured current user when the model ID matches, and is skipped (the
accessor returns nil) when there is no current user or a concurrent swap has
changed the current user's model ID.

Co-authored-by: Cursor <cursoragent@cursor.com>
@nan-li nan-li marked this pull request as draft June 26, 2026 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant