Skip to content

Native Windows port: wheel scrolling, shell launch services, native file picker#5209

Merged
shai-almog merged 28 commits into
masterfrom
win32-port-gaps
Jun 11, 2026
Merged

Native Windows port: wheel scrolling, shell launch services, native file picker#5209
shai-almog merged 28 commits into
masterfrom
win32-port-gaps

Conversation

@shai-almog

Copy link
Copy Markdown
Collaborator

Works through the actionable gaps tracked in Ports/WindowsPort/status.md for the new native Win32/Direct2D port.

Mouse wheel scrolling — gap 1b (now a shared core API)

Rather than the per-port synthetic-event hack, the wheel→scroll mapping now lives once in core:

  • CodenameOneImplementation.pointerWheelMoved(x, y, scrollX, scrollY) replays the wheel as a press → drag → release gesture spread over four EDT cycles (so CN1's own tensile/deceleration animates it), temporarily makes the component under the cursor non-focusable so the synthetic press isn't a click, and owns the scrollWheeling / isScrollWheeling() state.
  • JavaSE (which carried the original inline implementation) is refactored onto this method, so every desktop port maps the wheel identically.
  • Windows: WndProc pushes WM_MOUSEWHEEL/WM_MOUSEHWHEEL into the input ring (new CN1_EVENT_MOUSE_WHEEL/_HWHEEL); drainInput converts the delta to a DPI-scaled distance and calls the shared method.

Shell launch services — gap 4

A native shellOpen() (ShellExecuteW) backs honest desktop implementations of execute(url), dial() (tel:), sendSMS() (sms:?body=, so getSMSSupport() reports SMS_INTERACTIVE) and sendMessage() (mailto:?subject=&body=). Nothing is fabricated — an absent handler reports failure.

Native file picker — gap 4

GetOpenFileNameW (comdlg32) run modally on the window-owning pump thread via a blocking WM_CN1_FILEDIALOG SendMessage, filtered by media type. openGallery / openImageGallery now use the real OS picker and return a file:// path FileSystemStorage opens, instead of the in-app FileTree fallback. comdlg32 + shell32 added to the Windows link set.

Honesty preserved

status.md is updated to move these gaps to "done" while keeping the remaining hardware/OS-account capabilities (camera, sensors, location, contacts, push, biometric, audio recording, SIMD) honestly unsupported — they return null/no-op/false rather than fabricating data, per the port's guiding rule.

Validation

  • core, javase and windows Maven modules compile cleanly (the shared pointerWheelMoved API links across modules).
  • Native additions are pure Win32 (ShellExecuteW, GetOpenFileNameW) / input-ring wiring; the gallery/file-dialog path is not exercised headlessly, so it cannot affect the screenshot suite, and falls back gracefully.
  • CI: windows-cross-compile (Linux PE link) + parparvm-tests-windows (clean-target build/run + screenshot suite) are the gates.

🤖 Generated with Claude Code

shai-almog and others added 2 commits June 9, 2026 22:28
Closes the most actionable gaps in Ports/WindowsPort/status.md.

Mouse wheel (gap 1b) is now a proper shared API instead of a per-port hack:
CodenameOneImplementation.pointerWheelMoved(x, y, scrollX, scrollY) owns the
synthetic press/drag/release scroll gesture (spread over four EDT cycles, with
the component under the cursor temporarily made non-focusable) and the
scrollWheeling / isScrollWheeling() state. The JavaSE port, which carried the
original inline implementation, is refactored onto it so every desktop port maps
the wheel identically. The Windows WndProc pushes WM_MOUSEWHEEL / WM_MOUSEHWHEEL
into the input ring (new CN1_EVENT_MOUSE_WHEEL/_HWHEEL) and drainInput converts
the delta to a DPI-scaled distance.

Shell launch services (gap 4): a native shellOpen() (ShellExecuteW) backs honest
desktop implementations of execute(url), dial() (tel:), sendSMS() (sms:, so
getSMSSupport() reports SMS_INTERACTIVE) and sendMessage() (mailto:). Nothing is
fabricated -- an absent handler reports failure.

Native file picker (gap 4): GetOpenFileNameW (comdlg32) run modally on the
window-owning pump thread via a blocking WM_CN1_FILEDIALOG SendMessage, filtered
by media type. openGallery / openImageGallery now use the real OS picker and
return a file:// path FileSystemStorage opens, instead of the in-app FileTree
fallback. comdlg32 + shell32 added to the Windows link set.

status.md updated: these gaps move to "done"; the remaining hardware/OS-account
capabilities (camera, sensors, location, contacts, push, biometric, audio
recording, SIMD) stay honestly unsupported.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WIN32_LEAN_AND_MEAN keeps shellapi.h out of windows.h and shlobj.h does not
pull it in under clang-cl, so ShellExecuteW was an implicit declaration and the
clean-target build failed (call to undeclared function / int->HINSTANCE). Add
the explicit include.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Cloudflare Preview

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

✅ ByteCodeTranslator Quality Report

Test & Coverage

  • Tests: 388 total, 0 failed, 12 skipped

Benchmark Results

  • Execution Time: 21502 ms

  • Hotspots (Top 20 sampled methods):

    • 25.40% java.lang.String.indexOf (640 samples)
    • 19.80% com.codename1.tools.translator.Parser.isMethodUsed (499 samples)
    • 16.87% java.util.ArrayList.indexOf (425 samples)
    • 5.00% com.codename1.tools.translator.BytecodeMethod.addToConstantPool (126 samples)
    • 2.14% java.lang.StringBuilder.append (54 samples)
    • 2.10% com.codename1.tools.translator.ByteCodeClass.markDependent (53 samples)
    • 2.02% com.codename1.tools.translator.ByteCodeClass.updateAllDependencies (51 samples)
    • 1.98% com.codename1.tools.translator.Parser.getClassByName (50 samples)
    • 1.79% com.codename1.tools.translator.ByteCodeClass.calcUsedByNative (45 samples)
    • 1.47% com.codename1.tools.translator.BytecodeMethod.appendMethodSignatureSuffixFromDesc (37 samples)
    • 1.19% com.codename1.tools.translator.Parser.generateClassAndMethodIndexHeader (30 samples)
    • 1.07% com.codename1.tools.translator.BytecodeMethod.optimize (27 samples)
    • 0.83% java.lang.Object.hashCode (21 samples)
    • 0.79% com.codename1.tools.translator.BytecodeMethod.appendCMethodPrefix (20 samples)
    • 0.71% com.codename1.tools.translator.bytecodes.Invoke.addDependencies (18 samples)
    • 0.63% java.util.TreeMap.getEntry (16 samples)
    • 0.60% com.codename1.tools.translator.Parser.cullMethods (15 samples)
    • 0.60% com.codename1.tools.translator.BytecodeMethod.addInstruction (15 samples)
    • 0.60% java.lang.StringCoding.encode (15 samples)
    • 0.60% org.objectweb.asm.ClassReader.readCode (15 samples)
  • ⚠️ Coverage report not generated.

Static Analysis

  • ✅ SpotBugs: no findings (report was not generated by the build).
  • ⚠️ PMD report not generated.
  • ⚠️ Checkstyle report not generated.

Generated automatically by the PR CI workflow.

@shai-almog

shai-almog commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 128 screenshots: 128 matched.

Native Android coverage

  • 📊 Line coverage: 14.27% (8655/60658 lines covered) (artifact android-coverage-report, jacocoAndroidReport/html/index.html)
    • Other counters: instruction 11.54% (42634/369524), branch 5.08% (1764/34737), complexity 6.06% (2018/33305), method 10.49% (1634/15575), class 17.19% (377/2193)
    • Lowest covered classes
      • kotlin.collections.kotlin.collections.ArraysKt___ArraysKt – 0.00% (0/6327 lines covered)
      • kotlin.collections.unsigned.kotlin.collections.unsigned.UArraysKt___UArraysKt – 0.00% (0/2384 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.ClassReader – 0.00% (0/1519 lines covered)
      • kotlin.collections.kotlin.collections.CollectionsKt___CollectionsKt – 0.00% (0/1148 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.MethodWriter – 0.00% (0/923 lines covered)
      • kotlin.sequences.kotlin.sequences.SequencesKt___SequencesKt – 0.00% (0/730 lines covered)
      • kotlin.text.kotlin.text.StringsKt___StringsKt – 0.00% (0/623 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.Frame – 0.00% (0/564 lines covered)
      • kotlin.collections.kotlin.collections.ArraysKt___ArraysJvmKt – 0.00% (0/495 lines covered)
      • kotlinx.coroutines.kotlinx.coroutines.JobSupport – 0.00% (0/423 lines covered)

✅ Native Android screenshot tests passed.

Native Android coverage

  • 📊 Line coverage: 14.27% (8655/60658 lines covered) (artifact android-coverage-report, jacocoAndroidReport/html/index.html)
    • Other counters: instruction 11.54% (42634/369524), branch 5.08% (1764/34737), complexity 6.06% (2018/33305), method 10.49% (1634/15575), class 17.19% (377/2193)
    • Lowest covered classes
      • kotlin.collections.kotlin.collections.ArraysKt___ArraysKt – 0.00% (0/6327 lines covered)
      • kotlin.collections.unsigned.kotlin.collections.unsigned.UArraysKt___UArraysKt – 0.00% (0/2384 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.ClassReader – 0.00% (0/1519 lines covered)
      • kotlin.collections.kotlin.collections.CollectionsKt___CollectionsKt – 0.00% (0/1148 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.MethodWriter – 0.00% (0/923 lines covered)
      • kotlin.sequences.kotlin.sequences.SequencesKt___SequencesKt – 0.00% (0/730 lines covered)
      • kotlin.text.kotlin.text.StringsKt___StringsKt – 0.00% (0/623 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.Frame – 0.00% (0/564 lines covered)
      • kotlin.collections.kotlin.collections.ArraysKt___ArraysJvmKt – 0.00% (0/495 lines covered)
      • kotlinx.coroutines.kotlinx.coroutines.JobSupport – 0.00% (0/423 lines covered)

Benchmark Results

Detailed Performance Metrics

Metric Duration
SIMD kernel backend scalar fallback (no native SIMD)
SIMD int-add (64K x300) java 179ms / native 194ms = 0.9x speedup
SIMD float-mul (64K x300) java 237ms / native 72ms = 3.2x speedup
SIMD kernel correctness PASS (native result == scalar reference)
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 SIMD byte path gated to scalar (CPU autovectorizes scalar; explicit SIMD not beneficial here)
Base64 CN1 encode 398.000 ms
Base64 CN1 decode 337.000 ms
Base64 native encode 1292.000 ms
Base64 encode ratio (CN1/native) 0.308x (69.2% faster)
Base64 native decode 1130.000 ms
Base64 decode ratio (CN1/native) 0.298x (70.2% faster)
Image encode benchmark status skipped (SIMD unsupported)

@shai-almog

shai-almog commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 125 screenshots: 125 matched.
Native Windows port: full hellocodenameone screenshot suite, rendered offscreen with Direct2D/DirectWrite (x64). Compared against the in-repo baseline in scripts/windows/screenshots; every tile has a baseline, so any difference posts as "changed" for review.

Benchmark Results

Detailed Performance Metrics

Metric Duration
SIMD backend SSE2 (x64) / NEON (arm64) native kernels
SIMD int-add (64K x300) java 65ms / native 3ms = 21.6x speedup
SIMD float-mul (64K x300) java 66ms / native 4ms = 16.5x speedup
SIMD correctness PASS (native result == scalar reference)

@shai-almog

shai-almog commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 11 screenshots: 11 matched.
✅ JavaSE simulator integration screenshots matched stored baselines.

PMD's MissingOverride rule is enforced on core; the four anonymous Runnable
run() methods added for the shared wheel-scroll gesture need the annotation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shai-almog

shai-almog commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 128 screenshots: 128 matched.
✅ Native Mac screenshot tests passed.

Benchmark Results

  • VM Translation Time: 0 seconds
  • Compilation Time: 189 seconds

Detailed Performance Metrics

Metric Duration
SIMD kernel backend SSE2 (x64) / NEON (arm64) native kernels
SIMD int-add (64K x300) java 54ms / native 3ms = 18.0x speedup
SIMD float-mul (64K x300) java 55ms / native 2ms = 27.5x speedup
SIMD kernel correctness PASS (native result == scalar reference)
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 SIMD byte path active (NEON-accelerated)
Base64 CN1 encode 290.000 ms
Base64 CN1 decode 224.000 ms
Base64 native encode 657.000 ms
Base64 encode ratio (CN1/native) 0.441x (55.9% faster)
Base64 native decode 389.000 ms
Base64 decode ratio (CN1/native) 0.576x (42.4% faster)
Base64 SIMD encode 86.000 ms
Base64 encode ratio (SIMD/CN1) 0.297x (70.3% faster)
Base64 SIMD decode 73.000 ms
Base64 decode ratio (SIMD/CN1) 0.326x (67.4% faster)
Base64 encode ratio (SIMD/native) 0.131x (86.9% faster)
Base64 decode ratio (SIMD/native) 0.188x (81.2% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 29.000 ms
Image createMask (SIMD on) 3.000 ms
Image createMask ratio (SIMD on/off) 0.103x (89.7% faster)
Image applyMask (SIMD off) 83.000 ms
Image applyMask (SIMD on) 59.000 ms
Image applyMask ratio (SIMD on/off) 0.711x (28.9% faster)
Image modifyAlpha (SIMD off) 100.000 ms
Image modifyAlpha (SIMD on) 68.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.680x (32.0% faster)
Image modifyAlpha removeColor (SIMD off) 82.000 ms
Image modifyAlpha removeColor (SIMD on) 52.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.634x (36.6% faster)

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

✅ Continuous Quality Report

Test & Coverage

Static Analysis

  • SpotBugs [Report archive]
    • ByteCodeTranslator: 0 findings (no issues)
    • android: 0 findings (no issues)
    • codenameone-maven-plugin: 0 findings (no issues)
    • core-unittests: 0 findings (no issues)
    • ios: 0 findings (no issues)
  • PMD: 0 findings (no issues) [Report archive]
  • Checkstyle: 0 findings (no issues) [Report archive]

Generated automatically by the PR CI workflow.

@shai-almog

shai-almog commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 121 screenshots: 121 matched.
✅ JavaScript-port screenshot tests passed.

@shai-almog

shai-almog commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 124 screenshots: 124 matched.
✅ Native iOS screenshot tests passed.

Benchmark Results

  • VM Translation Time: 0 seconds
  • Compilation Time: 295 seconds

Build and Run Timing

Metric Duration
Simulator Boot 70000 ms
Simulator Boot (Run) 1000 ms
App Install 24000 ms
App Launch 9000 ms
Test Execution 363000 ms

Detailed Performance Metrics

Metric Duration
SIMD kernel backend SSE2 (x64) / NEON (arm64) native kernels
SIMD int-add (64K x300) java 75ms / native 3ms = 25.0x speedup
SIMD float-mul (64K x300) java 134ms / native 23ms = 5.8x speedup
SIMD kernel correctness PASS (native result == scalar reference)
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 SIMD byte path active (NEON-accelerated)
Base64 CN1 encode 591.000 ms
Base64 CN1 decode 276.000 ms
Base64 native encode 1056.000 ms
Base64 encode ratio (CN1/native) 0.560x (44.0% faster)
Base64 native decode 594.000 ms
Base64 decode ratio (CN1/native) 0.465x (53.5% faster)
Base64 SIMD encode 74.000 ms
Base64 encode ratio (SIMD/CN1) 0.125x (87.5% faster)
Base64 SIMD decode 63.000 ms
Base64 decode ratio (SIMD/CN1) 0.228x (77.2% faster)
Base64 encode ratio (SIMD/native) 0.070x (93.0% faster)
Base64 decode ratio (SIMD/native) 0.106x (89.4% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 19.000 ms
Image createMask (SIMD on) 2.000 ms
Image createMask ratio (SIMD on/off) 0.105x (89.5% faster)
Image applyMask (SIMD off) 75.000 ms
Image applyMask (SIMD on) 59.000 ms
Image applyMask ratio (SIMD on/off) 0.787x (21.3% faster)
Image modifyAlpha (SIMD off) 138.000 ms
Image modifyAlpha (SIMD on) 266.000 ms
Image modifyAlpha ratio (SIMD on/off) 1.928x (92.8% slower)
Image modifyAlpha removeColor (SIMD off) 239.000 ms
Image modifyAlpha removeColor (SIMD on) 69.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.289x (71.1% faster)

@shai-almog

shai-almog commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 128 screenshots: 128 matched.
✅ Native iOS Metal screenshot tests passed.

Benchmark Results

  • VM Translation Time: 0 seconds
  • Compilation Time: 394 seconds

Build and Run Timing

Metric Duration
Simulator Boot 68000 ms
Simulator Boot (Run) 2000 ms
App Install 16000 ms
App Launch 17000 ms
Test Execution 280000 ms

Detailed Performance Metrics

Metric Duration
SIMD kernel backend SSE2 (x64) / NEON (arm64) native kernels
SIMD int-add (64K x300) java 246ms / native 4ms = 61.5x speedup
SIMD float-mul (64K x300) java 98ms / native 13ms = 7.5x speedup
SIMD kernel correctness PASS (native result == scalar reference)
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 SIMD byte path active (NEON-accelerated)
Base64 CN1 encode 474.000 ms
Base64 CN1 decode 340.000 ms
Base64 native encode 1281.000 ms
Base64 encode ratio (CN1/native) 0.370x (63.0% faster)
Base64 native decode 456.000 ms
Base64 decode ratio (CN1/native) 0.746x (25.4% faster)
Base64 SIMD encode 82.000 ms
Base64 encode ratio (SIMD/CN1) 0.173x (82.7% faster)
Base64 SIMD decode 83.000 ms
Base64 decode ratio (SIMD/CN1) 0.244x (75.6% faster)
Base64 encode ratio (SIMD/native) 0.064x (93.6% faster)
Base64 decode ratio (SIMD/native) 0.182x (81.8% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 21.000 ms
Image createMask (SIMD on) 6.000 ms
Image createMask ratio (SIMD on/off) 0.286x (71.4% faster)
Image applyMask (SIMD off) 100.000 ms
Image applyMask (SIMD on) 152.000 ms
Image applyMask ratio (SIMD on/off) 1.520x (52.0% slower)
Image modifyAlpha (SIMD off) 130.000 ms
Image modifyAlpha (SIMD on) 113.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.869x (13.1% faster)
Image modifyAlpha removeColor (SIMD off) 211.000 ms
Image modifyAlpha removeColor (SIMD on) 255.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 1.209x (20.9% slower)

shai-almog and others added 13 commits June 10, 2026 04:22
Implements native SIMD for the Windows port (status.md gap 2), the x86/ARM analog
of IOSSimd. WindowsSimd overrides the hot-path vector ops with SSE2 (x64) / NEON
(arm64) intrinsics in cn1_windows_simd.c; Simd's @concrete gains
win=com.codename1.impl.windows.WindowsSimd and WindowsImplementation.createSimd()
returns it, so Simd.get().isSupported() is true on Windows. Each kernel vectorizes
the bulk with a scalar tail (unaligned load/store, so no aligned allocator); ops
SSE2 lacks (int32 mul/min/max/dot) stay scalar on x64 but vectorize on arm64, and
any op not overridden inherits the correct portable Simd scalar loop.

Covered: int add/sub/mul/min/max/and/or/xor/sum/dot, float add/sub/mul/min/max/
sum/dot, byte add/sub(saturating)/and/or/xor, plus fused
replaceTopByteFromUnsignedBytes / blendByMaskTestNonzero.

SimdApiTest (already in the Windows suite) gates correctness; new SimdBenchmarkTest
times native vs an inline Java scalar loop over a 64K workload, verifies the native
result matches, and logs CN1SS:SIMD:BENCH ... speedup=Nx so CI shows the benefit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements getSecureStorage() (status.md gap 4) so the networking layer can read
API keys / tokens at rest. WindowsSecureStorage encrypts each value with the
Windows Data Protection API (CryptProtectData, bound to the current user's logon)
via native dpapiProtect/dpapiUnprotect and persists the ciphertext through CN1
Storage -- the desktop analog of the iOS keychain / Android
EncryptedSharedPreferences non-prompting store. The biometric-prompting overloads
map to the same store (DPAPI is itself the user-account auth boundary). crypt32
added to the link set. SecureStorageTest round-trips set/get/remove in the suite
(self-skips where unsupported, e.g. JS).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements scheduleLocalNotification/cancelLocalNotification (status.md gap 4),
mirroring the JavaSE desktop semantic: while the app runs a Timer fires the
notification at its scheduled time (with REPEAT_* support) and Shell_NotifyIcon
shows a tray balloon; clicking it routes the id (WM_CN1_TRAY -> drainInput poll)
to the app's LocalNotificationCallback. Native tray/balloon lives in
cn1_windows_notify.c, marshaled to the window-owning pump thread via WM_CN1_NOTIFY.
Background scheduling fires only while the process runs (no OS scheduler survives
app exit on desktop) -- a documented limitation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements createMediaRecorder / captureAudio (status.md gap 4): records from the
default mic via the classic waveIn (winmm) API to a 16-bit PCM WAV, a worker
thread draining capture buffers to disk and patching the RIFF/data sizes on stop
(cn1_windows_audiorec.c + WindowsAudioRecorder). getAvailableRecordingMimeTypes
reports audio/wav (also decodable by the port's MF playback). waveIn over an MF
encode pipeline: dependency-free, no codec negotiation. winmm added to the link
set. Verified on a real Windows ARM64 VM: compiles clean and waveIn captures
88200 bytes/s from the mic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements getBiometrics() (status.md gap 4) backed by the WinRT
UserConsentVerifier (face/fingerprint/PIN): isSupported()/canAuthenticate() map
to CheckAvailabilityAsync, authenticate() runs the Hello prompt off the EDT and
completes the AsyncResource. WinRT is consumed via the WRL ABI projection
(cn1_windows_winrt.cpp) -- the same COM mechanism the Media Foundation layer uses,
no cppwinrt needed.

Adds the CN1_HAVE_WINRT build gate: the generated CMake probes the toolchain
(check_cxx_source_compiles of a minimal WinRT TU + runtimeobject) and defines
CN1_HAVE_WINRT / links runtimeobject only when WinRT is available, so a
cross-compile sysroot without WinRT compiles the natives as honest 'unsupported'
stubs and stays green -- mirroring the WebView2 gate. This same file/gate will
carry the upcoming WinRT location/contacts/share services.

Verified on a real Windows ARM64 VM: both the real and stub native paths compile;
a standalone WRL test activates UserConsentVerifier and awaits the async op,
correctly reporting DeviceNotPresent on the VM (no Hello hardware).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements getLocationManager() (status.md gap 4) -> WindowsLocationManager backed
by the WinRT Geolocator (cn1_windows_winrt.cpp, same CN1_HAVE_WINRT gate).
getCurrentLocation/getLastKnownLocation resolve one fix (lat/lon/accuracy/altitude/
heading/speed); a continuous LocationListener is served by a polling thread. When
Windows location is disabled / no provider answers it reports OUT_OF_SERVICE /
throws (no fabricated fix), and getLocationManager returns null on a WinRT-less
build. Verified on the Windows ARM64 VM: Geolocator activates and GetGeoposition
returns E_ACCESSDENIED (location off there), surfaced honestly as unavailable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements getAllContacts/getContactById (status.md gap 4) via the WinRT
ContactStore (cn1_windows_winrt.cpp, same CN1_HAVE_WINRT gate). One native call
returns every contact as a delimited blob (id/name/phone/email, read via the base
IContact + versioned IContact2/IContactManagerStatics2 interfaces) which the impl
parses and briefly caches so the base's id-then-fetch loop shares a single store
read. Returns nothing when the store is inaccessible (no WinRT / access denied),
never fabricated. Compiles on the Windows ARM64 VM via the proven WRL await
pattern.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements isNativeShareSupported()/share() (status.md gap 4) via the WinRT
DataTransferManager: the EDT-facing shareText marshals to the window thread
(WM_CN1_SHARE), where IDataTransferManagerInterop GetForWindow +
ShowShareUIForWindow open the system share flyout for the unpackaged Win32 window
and a DataRequested handler supplies text/title (cn1_windows_winrt.cpp, same
CN1_HAVE_WINRT gate). Shares text today (image-file sharing via SetStorageItems
is a follow-up). Also documents that print has no CN1 core API to hook. Compiles
(real + stub) on the Windows ARM64 VM; the flyout is interactive.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the legacy Capture API capturePhoto (status.md gap 3): grabs one real
frame from the default webcam via Media Foundation (MFEnumDeviceSources(VIDCAP) ->
IMFSourceReader -> RGB32, discarding the first few frames so exposure settles),
returns it as a CN1 ARGB int[], and Java encodes the PNG + writes the file
(cn1_windows_camera.cpp). A desktop has no built-in capture UI, so this is the
honest snapshot -- a real frame, never synthetic. createCameraImpl() (live preview
peer / video) stays null pending generic native-peer placement (gap 5a).

Verified on the Windows ARM64 VM: a 640x480 frame with genuine image data (~150K
non-zero pixels) is captured from the passed-through camera.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Linux cross-compile failed compiling cn1_windows_winrt.cpp: the <string> I
used for the contacts/share string building pulls the MSVC STL (yvals_core.h),
which hard-asserts a very recent Clang (STL1000) that the xwin cross-toolchain
predates. The rest of the port is deliberately STL-free for exactly this reason
(cn1_windows_browser.cpp's std:: usage is behind the WebView2 gate, off on the
cross-compile). Replace std::string/std::wstring with a small C growable buffer
(CN1Buf) and owned WCHAR* so no C++ STL header is pulled (verified via
/showIncludes: zero STL headers, only the C <string.h>). clean-target (real
Windows, newer clang) already passed; this unblocks the cross-compile gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The guard was defensive insurance against a build toolchain lacking WinRT, but the
cross-compile CI proved otherwise: it had already DEFINED CN1_HAVE_WINRT (the
earlier STL error fired inside the #ifdef on the Linux/xwin leg), i.e. the
xwin-laid-out SDK ships the WinRT ABI headers + runtimeobject. So stub mode never
actually triggered anywhere the port builds.

Remove the CMake probe + the per-function #ifdef/#else stub branches and link
runtimeobject unconditionally; the compiled output is identical to the prior green
build, just simpler. Also drop the now-pointless locationSupported() native
(getLocationManager always returns the manager; isNativeShareSupported checks for
a host window). Each service still degrades honestly at RUNTIME (no device /
disabled / denied -> false/null). status.md updated to drop the gate/stub language.

Trade-off (accepted): a future build SDK without WinRT would now fail the build
instead of degrading to stubs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…real shipping pipeline)

Adds a CI workflow that mirrors how a Codename One app is actually shipped for
Windows -- compiled on a (Linux) build host, executed on the user's Windows
machine. Neither existing Windows workflow did this end to end:
windows-cross-compile.yml builds on Linux but only links (never runs), and
parparvm-tests-windows.yml builds natively on Windows and runs. Here the binary
that renders on Windows is the exact one cross-compiled on Linux, browser
included.

New workflow windows-cross-build-run.yml, artifact-chained within a run:
- cross-build (ubuntu): pins LLVM 19 (the WebView2 peer's MSVC STL needs a recent
  clang), lays out the Windows SDK via xwin, fetches the WebView2 NuGet SDK on
  Linux, builds core + Windows port + the hellocodenameone suite, then translates
  and clang-cl/xwin cross-compiles the full suite .exe (WebView2 linked) and
  uploads it.
- run-on-windows (windows-latest x64): downloads that Linux-built exe and runs the
  full screenshot suite over the cn1ss WebSocket, capturing ~112 PNGs.
- compare-comment (ubuntu): diffs the screenshots against the in-repo baseline
  and posts them to the PR under a distinct marker.

Harness refactor (CleanTargetIntegrationTest):
- Split buildHelloCodenameOneExe into a host-agnostic translateHelloSuiteDist()
  (pure Java translation) + the native clang-cl build, and extract crossBuildDist()
  out of crossCompilesWindowsExeWithXwin so a dist can be cross-compiled on Linux.
- Add crossBuildsHelloSuiteExe(): translates the full suite and xwin-cross-builds
  it to CN1_CROSS_EXE_OUT (WEBVIEW2_SDK_DIR flows into the generated CMake).
- capturesHelloSuiteOverWebSocket honors CN1_PREBUILT_EXE to run a provided exe
  instead of building one, so the Windows runner only needs a JDK.

scripts/windows/fetch-webview2-sdk.sh: portable bash counterpart of the PowerShell
fetch (curl the NuGet nupkg + unzip), laying out build/native for the Linux
cross-build's WEBVIEW2_SDK_DIR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The CN1 CSS compiler (codenameone-maven-plugin css goal) renders via CEF/AWT and
threw java.awt.HeadlessException on the display-less Linux cross-build runner.
Install xvfb and run the hellocodenameone-common Maven build under a virtual X
server so the css/transcode rendering has a display. Core + port + LLVM 19 + xwin
+ WebView2 SDK fetch all succeeded before this point; this unblocks the actual
suite cross-compile.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shai-almog

shai-almog commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 125 screenshots: 125 matched.
Native Windows port, REAL shipping pipeline: the hellocodenameone screenshot suite rendered by a binary CROSS-COMPILED on Linux (clang-cl + xwin, WebView2 linked) and RUN on a Windows x64 runner. Compared against the in-repo baseline in scripts/windows/screenshots.

Benchmark Results

Detailed Performance Metrics

Metric Duration
SIMD kernel backend SSE2 (x64) / NEON (arm64) native kernels
SIMD int-add (64K x300) java 74ms / native 4ms = 18.5x speedup
SIMD float-mul (64K x300) java 66ms / native 4ms = 16.5x speedup
SIMD kernel correctness PASS (native result == scalar reference)
Base64 native bridge unavailable (CN1 + SIMD + image benchmarks only)
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 SIMD byte path gated to scalar (CPU autovectorizes scalar; explicit SIMD not beneficial here)
Base64 CN1 encode 327.000 ms
Base64 CN1 decode 182.000 ms
Base64 SIMD encode 136.000 ms
Base64 encode ratio (SIMD/CN1) 0.416x (58.4% faster)
Base64 SIMD decode 127.000 ms
Base64 decode ratio (SIMD/CN1) 0.698x (30.2% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 32.000 ms
Image createMask (SIMD on) 14.000 ms
Image createMask ratio (SIMD on/off) 0.438x (56.3% faster)
Image applyMask (SIMD off) 55.000 ms
Image applyMask (SIMD on) 24.000 ms
Image applyMask ratio (SIMD on/off) 0.436x (56.4% faster)
Image modifyAlpha (SIMD off) 53.000 ms
Image modifyAlpha (SIMD on) 22.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.415x (58.5% faster)
Image modifyAlpha removeColor (SIMD off) 58.000 ms
Image modifyAlpha removeColor (SIMD on) 24.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.414x (58.6% faster)

shai-almog and others added 3 commits June 10, 2026 18:45
…atusBarTap golden

Two CI/reporting fixes flagged on the PR:

- SIMD benchmark now appears in the PR comment. SimdBenchmarkTest emits ready-to-render
  "CN1SS:SIMD:STAT <key> : <value>" lines (backend, int-add/float-mul speedup, correctness);
  the capture harness collects them into windows-simd-stats.txt next to the PNGs; cn1ss.sh
  passes it via --extra-stats and RenderScreenshotReport renders it as a Benchmark Results
  table (the same mechanism iOS uses for base64-performance-stats.txt). Both the native and
  cross-compiled comment jobs lift the file to the artifacts dir cn1ss scans.

- The StatusBarTapDiagnosticScreenshotTest tile that showed as "updated" was a stale golden,
  not flakiness: the native-Windows and Linux-cross renders are byte-identical to each other
  (deterministic) and the glass-pane content (counter 0->3, scroll, native:no) is correct;
  only the earlier golden's tile scroll positions differed. Refreshed the golden to the
  current deterministic render.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…codename1.camera)

Native peer support (cn1_windows_peer.cpp + WindowsGenericPeer): wraps an
@NativeInterface-returned child HWND (boxed as long[]) in a PeerComponent that
reparents it onto the host window and tracks the lightweight component's bounds
(peerInitialized/peerSetBounds/peerSetVisible/peerDeinitialized), the analog of iOS
NativeIPhoneView. In the offscreen screenshot pipeline -- where a live HWND is not
composited -- it falls back to a PrintWindow peer image, mirroring the WebView2 peer.
WindowsImplementation.createNativePeer now routes long[] handles here, so the generic
peer-placement path (gap #5a) exists for native-interface widgets.

Camera (WindowsCameraImpl + cn1_windows_camera.cpp): implements the device camera
API (com.codename1.camera.CameraImpl), not just the legacy capturePhoto. A Media
Foundation source-reader session runs on a worker thread keeping the latest frame;
the preview is an image-based PeerComponent (browser-style, so it renders headlessly
and live), takePhoto encodes the freshest frame, enumerateCameras lists devices, and
a frame listener is polled at its fps. Video recording / flash / optical zoom /
focus-point are honestly reported unsupported (a generic webcam exposes none via the
source reader), per the port's "real or unsupported" rule. createCameraImpl now
returns this instead of null.

Also: register cn1_windows_peer.cpp in the clean-target native compile list, and trim
status.md to a TODO-only list (camera + native peers moved out of the gaps) so the
file can be deleted at merge. Windows port compiles clean; native build verified by
the cross-compile leg.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
shai-almog and others added 3 commits June 10, 2026 19:28
The clean (ParparVM) target has no java.lang.Thread.setDaemon, so the translated
WindowsCameraImpl.c failed to compile (undeclared virtual_java_lang_Thread_setDaemon).
The takePhoto worker exits as soon as the frame is captured, so a non-daemon thread
is fine.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
VM testing on a real webcam revealed the source reader was delivering 2-bpp frames
(YUY2, the common webcam format: 640x480 -> 614400 bytes), not RGB32 (1228800),
because SetCurrentMediaType(RGB32) is silently ignored unless the reader's video
processor is enabled. The width*height*4 size check then rejected every frame, so
the preview/session captured nothing.

Create both source readers (the continuous session and the legacy capturePhoto)
with MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING=TRUE so the reader inserts a
converter to RGB32. Verified on the Windows ARM64 VM: the worker-thread session now
delivers full 640x480 RGB32 frames (polled via cameraSessionLatestFrame), and the
generic-peer PrintWindow capture returns real pixels. CI runners have no camera, so
this path is only exercisable on a real machine.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… comments

Two fixes to the Windows screenshot/benchmark report:

1. Real benchmarks instead of a toy SIMD micro-bench. Base64NativePerformanceTest
   (the same shared test that produces the iOS/Metal base64 + image numbers) used to
   early-return on the native Windows port because there is no app @NativeInterface
   Base64 bridge. It now skips only the native-vs-CN1 base64 comparison and still runs
   the CN1 + SIMD base64 and the image SIMD benchmarks (createMask / applyMask /
   modifyAlpha / PNG encode, SIMD on vs off) -- gated on Simd.isSupported(), which
   WindowsSimd provides. iOS/Android behaviour is unchanged (they have the native
   bridge: hasNative=true, isWindows()=false make every new guard a no-op there). The
   capture harness now collects the shared CN1SS:STAT: markers (not a bespoke one)
   into windows-benchmark-stats.txt, and SimdBenchmarkTest emits via the same marker
   so its raw-kernel numbers join the table. JPEG + native-base64 degrade honestly to
   "unsupported"/"unavailable" on a webcam-less, bridge-less desktop.

2. Per-architecture comments. x64 (Intel) and arm64 each post a SEPARATE PR comment
   with a distinct marker, each depending only on its own screenshot leg -- so one
   architecture's pipeline failing or re-running no longer overrides or hides the
   other's result (previously a single combined comment showed only x64 and was
   skipped entirely if the x64 leg failed). Each comment carries that arch's own
   benchmark table (SSE2 on x64, NEON on arm64).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shai-almog

shai-almog commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Compared 125 screenshots: 125 matched.
Native Windows port (x64 / Intel-AMD): full hellocodenameone screenshot suite rendered offscreen with Direct2D/DirectWrite, plus the real benchmarks (base64 native/CN1/SIMD, image createMask/applyMask/modifyAlpha/PNG/JPEG, SSE2 SIMD kernels). Compared against the in-repo baseline in scripts/windows/screenshots.

Benchmark Results

Detailed Performance Metrics

Metric Duration
SIMD kernel backend SSE2 (x64) / NEON (arm64) native kernels
SIMD int-add (64K x300) java 69ms / native 4ms = 17.2x speedup
SIMD float-mul (64K x300) java 72ms / native 4ms = 18.0x speedup
SIMD kernel correctness PASS (native result == scalar reference)
Base64 native bridge unavailable (CN1 + SIMD + image benchmarks only)
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 SIMD byte path gated to scalar (CPU autovectorizes scalar; explicit SIMD not beneficial here)
Base64 CN1 encode 276.000 ms
Base64 CN1 decode 170.000 ms
Base64 SIMD encode 151.000 ms
Base64 encode ratio (SIMD/CN1) 0.547x (45.3% faster)
Base64 SIMD decode 131.000 ms
Base64 decode ratio (SIMD/CN1) 0.771x (22.9% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 36.000 ms
Image createMask (SIMD on) 13.000 ms
Image createMask ratio (SIMD on/off) 0.361x (63.9% faster)
Image applyMask (SIMD off) 59.000 ms
Image applyMask (SIMD on) 30.000 ms
Image applyMask ratio (SIMD on/off) 0.508x (49.2% faster)
Image modifyAlpha (SIMD off) 58.000 ms
Image modifyAlpha (SIMD on) 23.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.397x (60.3% faster)
Image modifyAlpha removeColor (SIMD off) 67.000 ms
Image modifyAlpha removeColor (SIMD on) 24.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.358x (64.2% faster)

@shai-almog

shai-almog commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Native Windows port (arm64)

Compared 125 screenshots: 124 matched, 1 updated.

  • graphics-draw-arc — updated screenshot. Screenshot differs (784x561 px, bit depth 8).

    graphics-draw-arc
    Preview info: JPEG preview quality 60; JPEG preview quality 60.
    Full-resolution PNG saved as graphics-draw-arc.png in workflow artifacts.

Benchmark Results

Detailed Performance Metrics

Metric Duration
SIMD kernel backend SSE2 (x64) / NEON (arm64) native kernels
SIMD int-add (64K x300) java 62ms / native 3ms = 20.6x speedup
SIMD float-mul (64K x300) java 60ms / native 4ms = 15.0x speedup
SIMD kernel correctness PASS (native result == scalar reference)
Base64 native bridge unavailable (CN1 + SIMD + image benchmarks only)
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 SIMD byte path gated to scalar (CPU autovectorizes scalar; explicit SIMD not beneficial here)
Base64 CN1 encode 600.000 ms
Base64 CN1 decode 237.000 ms
Base64 SIMD encode 104.000 ms
Base64 encode ratio (SIMD/CN1) 0.173x (82.7% faster)
Base64 SIMD decode 133.000 ms
Base64 decode ratio (SIMD/CN1) 0.561x (43.9% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 23.000 ms
Image createMask (SIMD on) 6.000 ms
Image createMask ratio (SIMD on/off) 0.261x (73.9% faster)
Image applyMask (SIMD off) 38.000 ms
Image applyMask (SIMD on) 13.000 ms
Image applyMask ratio (SIMD on/off) 0.342x (65.8% faster)
Image modifyAlpha (SIMD off) 37.000 ms
Image modifyAlpha (SIMD on) 12.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.324x (67.6% faster)
Image modifyAlpha removeColor (SIMD off) 41.000 ms
Image modifyAlpha removeColor (SIMD on) 11.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.268x (73.2% faster)

shai-almog and others added 6 commits June 10, 2026 21:31
…er-than-scalar)

The benchmark honestly showed base64 SIMD ~6.5x SLOWER than scalar and a few image
ops slower too. Root cause: the ops the Base64 SIMD codec and Image.createMask use
(shl/shrLogical/lookupBytes/pack+unpack interleaved/unpackLookup/packIntToByteTruncate)
were not implemented in WindowsSimd, so they fell through to the generic Simd scalar
DEFAULTS -- lane-scratch loops with per-op dispatch -- which are slower than the
straight-line scalar codec. Only the fused ops (replaceTopByteFromUnsignedBytes,
used by modifyAlpha -66%) were native and won. iOS implements all of these in NEON,
which is why iOS base64 SIMD is faster.

Implement them natively in cn1_windows_simd.c: NEON-vectorized on arm64 (mirroring
IOSSimd.m: vld3q/vst3q/vld4q/vst4q interleave, vshlq_u8 byte shifts), SSE2 on x64
(byte shift via 16-bit shift + per-byte mask, since SSE2 has no byte shift), and
scalar table lookups on both (exactly as IOSSimd does -- the lookup is scalar there
too). A native C kernel (one call, tight loop) already beats the scalar-default
fallback, so SIMD stops losing. unpackLookupBytesInterleaved4 matches the base Simd
contract precisely (out-of-range -> 0, returns the OR of all outputs).

Verified: NEON kernels pass a standalone correctness harness on the arm64 VM (shl/
shrLogical x8 shifts, unpack3/pack3/pack4 vs scalar references); the C compiles clean
arm64. x64/SSE2 correctness is gated by the base64/image SIMD validation in the
benchmark (byteArraysEqual) on CI. Also deletes Ports/WindowsPort/status.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…modifyAlpha removeColor)

removeColor was the one image op still on the scalar fallback (8% slower on x64,
27% on arm64) -- WindowsSimd didn't override blendByMaskTestNonzeroSubstituteOnKeepEq,
so Image.modifyAlpha(alpha, removeColor) hit the generic Simd scalar default. Add it
as a fused vectorized blend (SSE2 on x64, NEON on arm64), mirroring its sibling
blendByMaskTestNonzero (which modifyAlpha uses and wins 70%). Verified on the arm64
VM: the NEON kernel matches the scalar reference exactly over mixed transparent /
removeColor-matching / opaque pixels. With this, every image SIMD op (createMask,
applyMask, modifyAlpha, modifyAlpha+removeColor, PNG encode) beats scalar.

(base64 SIMD remains ~2x slower on x64 -- it is bound by ParparVM per-native-call
overhead across ~6000 calls/encode plus the /O2 auto-vectorized scalar competitor;
on arm64 base64 ENCODE is already 25% faster. base64 SIMD is explicit opt-in;
standard Base64.encode uses the fast scalar path.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…benchmark)

The iOS and Mac UI-test builds ran -configuration Debug, which inherits Xcode's
GCC_OPTIMIZATION_LEVEL=0 (-O0) -- so the benchmark's scalar baseline was unvectorized
and the SIMD speedups were inflated (SIMD beating unoptimized code, not real shipping
code). Windows already builds the benchmark Release (/O2, auto-vectorized scalar),
which is why its SIMD margins are smaller and base64 micro-SIMD even loses there.

Override GCC_OPTIMIZATION_LEVEL to 2 (env CN1_TEST_OPT_LEVEL, 0/1/2/3/s) on both the
iOS and Mac test builds so the compiler auto-vectorizes the scalar baseline -- making
the SIMD-vs-scalar comparison apples-to-apples across all three ports. This is the
measurement foundation for pruning the SIMD that doesn't beat optimized scalar.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…orized scalar

The O2 measurement settled it: with -O2 the explicit NEON Base64 codec is still
75-83% faster than autovectorized scalar on Apple Silicon (iOS/Mac), but on x86-64
it is ~2x slower -- /O2 already autovectorizes the scalar codec and SSE2 has no
3-way interleave, so the per-op SIMD just adds overhead. The fused image kernels
win on every platform (they can't be autovectorized) and are unaffected.

Add Simd.isByteShuffleAccelerated() -- true only where the chained byte
shuffle/interleave pipeline actually beats scalar: IOSSimd returns true (NEON);
WindowsSimd returns it as a per-arch native constant (arm64 true, x86-64 false);
the base scalar Simd returns false. Base64.encodeNoNewlineSimd / decodeNoWhitespaceSimd
consult it and, when false, skip the SIMD loop so the autovectorized scalar tail
encodes everything (still fully correct -- SimdTest's scalar-vs-SIMD equality test
passes). So x86-64 base64 SIMD now matches scalar instead of losing 2x, while ARM
keeps its 75-83% win. The benchmark emits the gate state ("active (NEON)" vs "gated
to scalar"). These methods are explicit opt-in; no production code auto-uses them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ncode bench

- SpotBugs (the build-test failure): drop the redundant `simd != null` checks in
  Base64.encode/decodeNoWhitespaceSimd -- Simd.get() is provably non-null there
  (RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE).
- base64 decode was 27% slower on Windows arm64: my unpackLookupBytesInterleaved4 /
  lookupBytes are scalar (iOS uses NEON vqtbl), so the codec's decode loses there
  even though encode wins -- not a clear net win. So WindowsSimd.isByteShuffleAccelerated()
  is now false on both Windows arches (was true on arm64); only iOS/Mac (full NEON,
  75-83% faster) enable the base64 SIMD path. Fused image kernels are unaffected.
- PNG/JPEG "encode SIMD on/off" ratio was misleading (+20% x64, -12.8% arm64): that
  benchmark is dominated by the platform image encoder (native WIC on Windows), which
  SIMD does not touch, so the ratio is encoder noise, not a SIMD measurement. Removed
  it; the four pure image ops (createMask/applyMask/modifyAlpha/removeColor) measure
  the SIMD-affected work directly and all win.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split the gate's combined byte[] declaration into one per line to satisfy PMD
(the SpotBugs fix in the prior commit cleared; this is the remaining static-analysis
nit on the same code).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shai-almog shai-almog merged commit e3cb4e5 into master Jun 11, 2026
44 of 46 checks passed
@shai-almog shai-almog deleted the win32-port-gaps branch June 11, 2026 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant