fix: resolve TOCTOU race on the current user in user executor callbacks#1679
Draft
nan-li wants to merge 2 commits into
Draft
fix: resolve TOCTOU race on the current user in user executor callbacks#1679nan-li wants to merge 2 commits into
nan-li wants to merge 2 commits into
Conversation
Network-response callbacks across the user executors checked the current user via isCurrentUser(...) and then separately dereferenced _user (or called _logout()). A concurrent login/logout could swap _user in that window, so the callback applied one user's data or action to a different user (cross-user contamination or an erroneous logout / data loss). Add withCurrentUser(matching:) and currentUser(matching:) to OneSignalUserManagerImpl, which read _user once and confirm it still owns the request's identity model before operating on that single captured instance. Route through them: - parseFetchUserResponse properties + email/SMS hydration (the shared fetch and create-user contamination point) - the clearUserData + re-hydrate in the FetchUser success path - the _logout() failure paths in FetchUser, IdentifyUser, and the property, subscription, and identity update executors No lock is held across the closure: the model mutations there fire model-store listeners into the operation repo, so holding one could deadlock. Co-authored-by: Cursor <cursoragent@cursor.com>
Add deterministic unit tests for withCurrentUser/currentUser: the closure runs on the captured current user when the model ID matches, and is skipped (the accessor returns nil) when there is no current user or a concurrent swap has changed the current user's model ID. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
One Line Summary
Read
_useronce and confirm it still owns the request before mutating it (or logging out) in user executor network-response callbacks, closing a check-then-deref race.Details
Motivation
Across the user executors, a network-response callback calls
OneSignalUserManagerImpl.isCurrentUser(...)and then separately dereferences/mutates_user?.<model>(or calls_logout()).isCurrentUser(_ identityModel:)checks the identity model store; it says nothing about_user._useris a bare, unsynchronizedvar, reassigned bystart(),_logout()(_user = nil), andsetNewInternalUser()— all reachable vialogin/logout/identify.Because the check is not atomic with the later
_useraccess, a concurrent login/logout can swap_userinside that window, so the callback applies one user's data/action to a different user — cross-user contamination, or an erroneous logout / data loss. This is the same class of bug as the recently merged stale-FetchUser fix (#1672), which added anisCurrentUserguard that is itself still check-then-deref.Approach
Add a single atomic accessor to
OneSignalUserManagerImplthat eliminates the check-then-deref idiom by capturing_useronce and confirming it owns the request, then operating on that captured instance:currentUser(matching modelId:) -> OSUserInternal?— returns the current user only if_user.identityModel.modelId == modelId, elsenil.withCurrentUser(matching modelId:, _ body:)— runsbodywith that captured instance, or no-ops.No lock is held across the closure body: the model mutations there (
hydrate,clearData) fire model-store listeners into the operation repo, so holding a lock across them risks deadlock. Capturing the single_userread into a local is the tactical fix (mirrors the capture-once + compare-modelIdpattern).Sites routed through the accessor:
OSUserExecutor.parseFetchUserResponse— properties + email/SMS hydration (the shared contamination point reached from both the fetch-user and create-user paths).OSUserExecutor.executeFetchUserRequestsuccess —clearUserData(user)+ re-hydrate now operate on one captured user (clearUserData(_:)overload added)._logout()failure/.missingbranches:OSUserExecutor.executeFetchUserRequest,OSUserExecutor.executeIdentifyUserRequest,OSPropertyOperationExecutor,OSSubscriptionOperationExecutor,OSIdentityOperationExecutor— gated oncurrentUser(matching:) != nilimmediately before logout.Scope
fetchUser, and the identify-conflict branch that already snapshots_user.Testing
Unit testing
Added deterministic unit tests in
OneSignalUserTestsfor the accessor: the closure runs on the captured current user when the model ID matches, and is skipped (the accessor returnsnil) when there is no current user or when a concurrent swap has changed the current user's model ID.Manual testing
Not done on-device: this is an internal concurrency fix and the race is inherently nondeterministic to reproduce on a device. It is covered by the new unit tests and by the existing user-executor/switch-user/concurrency suites, which continue to pass.
Ran
xcodebuild test -scheme OneSignalUserTestson the iPhone 17 Pro simulator. The new tests and the rest of the suite pass. Two pre-existing, timing-based flakes (testBasicCombiningUserUpdateDeltas_resultsInOneRequestandOneSignalUserObjcTests.testSendPurchases, bothonlyOneRequest-after-a-fixed-wait assertions) intermittently fail under parallel-clone resource pressure and a simulator launch hiccup, but pass deterministically when re-run in isolation; neither touches the changed behavior.Affected code checklist
Checklist
Overview
Testing
Final pass
Follow-up identified while reviewing #1678.
Made with Cursor