Skip to content

fix(ocr): sanitize Graph validationToken echo to prevent reflected XSS#250

Open
gnjoseph wants to merge 1 commit into
user/dluces/sample_app_test_planfrom
user/gnjoseph/pr-248-ocr-xss-fix
Open

fix(ocr): sanitize Graph validationToken echo to prevent reflected XSS#250
gnjoseph wants to merge 1 commit into
user/dluces/sample_app_test_planfrom
user/gnjoseph/pr-248-ocr-xss-fix

Conversation

@gnjoseph

@gnjoseph gnjoseph commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Purpose

Fixes the CodeQL js/reflected-xss (high severity) alert on #248 — the check currently failing on that PR (mergeStateStatus: UNSTABLE). Targets user/dluces/sample_app_test_plan so it flows into #248 after review (same pattern as #249).

The vulnerability

AI/ocr/server/onReceiptAdded.ts echoed the Microsoft Graph subscription validationToken (from req.query) straight into the HTTP response body:

const validationToken = req.query['validationToken'];
if (validationToken) {
  res.status(200).type('text/plain').send(String(validationToken));   // reflected XSS sink

The value is attacker-influenceable and reflected verbatim. Even as text/plain, this is a reflected-XSS sink (browser MIME-sniffing, and the value is user-controlled). Introduced when the OCR backend was ported from restify to express (b4b864f).

The fix

Graph requires the opaque, URL-safe validationToken to be echoed back verbatim to complete the subscription handshake, so I strip any character outside the token's known-safe set (base64url/base64: A–Z a–z 0–9 . _ ~ + / = - and space) before reflecting it, and add X-Content-Type-Options: nosniff:

const sanitizeValidationToken = (value: unknown): string =>
  String(value).replace(/[^A-Za-z0-9._~+/=\- ]/g, '');
...
res.status(200).type('text/plain').set('X-Content-Type-Options', 'nosniff').send(safeValidationToken);

This is a no-op for legitimate tokens (so the Graph handshake still works) while removing the characters needed for XSS (<, >, ", etc.).

Verification (Windows, Node 24)

  • npm run build:backend compiles clean.
  • validate-sample.ps1 OCR backend smoke passes.
  • Live handshake test against the running backend:
    • Legit token abc123-XYZ_.~token== → echoed verbatim (MATCH: True), Content-Type: text/plain, X-Content-Type-Options: nosniff.
    • <script>alert(1)</script> → returned as scriptalert1/script (all angle brackets stripped).

CodeQL will re-run on this PR to confirm the alert is resolved.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

CodeQL js/reflected-xss (high) flagged AI/ocr/server/onReceiptAdded.ts:
the Microsoft Graph subscription validationToken from req.query was echoed
straight into the HTTP response body. Although the response is sent as
text/plain, a user-controlled value reflected verbatim is a reflected XSS
sink (browsers can MIME-sniff, and the value is attacker influenceable).

Graph requires the opaque, URL-safe validationToken to be echoed back to
complete the subscription handshake, so strip any character outside the
token's known-safe set (base64url/base64) before reflecting it. This is a
no-op for legitimate tokens while removing the characters needed for XSS.
Also set X-Content-Type-Options: nosniff as defense in depth.

Verified on Windows (Node 24): backend builds; validate-sample.ps1 backend
smoke passes; a legitimate token echoes verbatim (handshake intact) while
'<script>alert(1)</script>' is returned as 'scriptalert1/script' (angle
brackets stripped).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant