Skip to content

fix: stream large Keras HDF5 inspection#1663

Merged
mldangelo-oai merged 24 commits into
mainfrom
mdangelo/codex/hf-fp-t33-large-keras-h5-streaming-20260610
Jun 12, 2026
Merged

fix: stream large Keras HDF5 inspection#1663
mldangelo-oai merged 24 commits into
mainfrom
mdangelo/codex/hf-fp-t33-large-keras-h5-streaming-20260610

Conversation

@mldangelo-oai

@mldangelo-oai mldangelo-oai commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Make Keras HDF5 scanning file-backed for large models by using h5py metadata/link traversal instead of the generic 512 MiB whole-file read gate.
  • Defer whole-file aggregate hashing for large HDF5 inputs so core dispatch does not materialize multi-GB model files before the Keras scanner runs.
  • Add bounded JSON/name attribute reads and SoftLink/VDS-aware external-reference traversal so malformed, cyclic, oversized, external-link, external-storage, and Lambda controls still fail closed or report security findings.

Root Cause

KerasH5Scanner inherited the default BaseScanner max_file_read_size of 512 MiB, and core also attempted top-level hashing before scanner dispatch. The pinned 2.24 GB valid HDF5 model was therefore rejected before h5py.File(...) could perform file-backed inspection.

Baseline on origin/main at 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69:

{"success": false, "exit_code": 2, "bytes_scanned": 0, "scan_outcome_reasons": ["max_file_read_size_exceeded"], "file_size": 2240076248}

Security Tradeoffs

Large HDF5 files now skip whole-file integrity hashing and report content_hash=null at the aggregate level, because hashing would reintroduce the same full-file read. Small H5 files keep the existing file integrity hash. Keras H5 inspection remains bounded by explicit HDF5 JSON/name attribute, link traversal, SoftLink resolution, virtual dataset source, and external-reference reporting limits; ambiguous or malformed evidence remains inconclusive unless a security finding is already present.

Pinned Real-Model QA

Pinned model:

repo=FacebookAI/xlm-roberta-large
revision=c23d21b0620b635a76227c604d44e43a9f0ee389
filename=tf_model.h5
size=2240076248

Opt-in regression:

MODELAUDIT_RUN_REAL_HF_H5=1 MODELAUDIT_HF_CACHE_DIR=../hf-cache PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_keras_h5_scanner.py::test_real_hf_xlm_roberta_large_h5_reaches_keras_scan_without_read_cap -q

Outcome:

1 passed in 3.75s

Direct terminal scan outcome on the cached pinned file:

{"content_hash": null, "elapsed_seconds": 0.869, "exit_code": 0, "file_size": 2240076248, "files_scanned": 1, "max_rss_kb": 96920, "scan_outcome": null, "scan_outcome_reasons": null, "scanner_names": ["keras_h5"], "success": true}

The real-model test also asserts the bounded HDF5 metadata shape: root keys ["roberta", "top_level_model_weights"], attrs {"backend", "keras_version", "layer_names"}, and bounded layer_names == ["roberta"].

Malicious And Malformed Controls

  • Large malicious Lambda H5 still reports CVE-2025-9905 and exits with security findings.
  • Large malformed model_config fails closed as inconclusive without max_file_read_size_exceeded.
  • Large ExternalLink and SoftLink-to-ExternalLink H5 controls still report CVE-2026-1669.
  • Large SoftLink-to-external-storage H5 controls still report CVE-2026-1669.
  • Large out-of-file Virtual Dataset source H5 controls still report CVE-2026-1669.
  • Same-file Virtual Dataset sources stay clean.
  • Keras 3 root vars and optimizer/vars external-link/VDS controls still report CVE-2026-1669.
  • SoftLink cycles and oversized config/name attributes fail closed as inconclusive.
  • Oversized layer_names / weight_names attributes fail closed before materialization.
  • Generic HDF5 files with dangling /layers SoftLinks stay clean instead of being treated as Keras weights.
  • Sparse/chunked/compressed HDF5 datasets are not materialized while metadata/Lambda detection still runs.

Validation

uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_keras_h5_scanner.py -q
MODELAUDIT_RUN_REAL_HF_H5=1 MODELAUDIT_HF_CACHE_DIR=../hf-cache PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_keras_h5_scanner.py::test_real_hf_xlm_roberta_large_h5_reaches_keras_scan_without_read_cap -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1
git diff --check

Results:

ruff check: All checks passed
ruff format --check: 419 files already formatted
mypy: Success: no issues found in 474 source files
Keras H5 scanner tests: 313 passed, 1 skipped
Real HF pinned H5 regression: 1 passed
Broad non-integration suite: 18558 passed, 793 skipped, 40 warnings in 1138.26s
git diff --check: clean

Copy link
Copy Markdown
Contributor Author

@codex review

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.489s -> 1.476s (-0.9%).

Workload Benchmark Target Size Files Baseline Current Change Status
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 124.54ms 118.83ms -4.6% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 514.4us 525.3us +2.1% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 512.1us 518.8us +1.3% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 547.8us 554.3us +1.2% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 146.02ms 144.34ms -1.1% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 113.78ms 112.70ms -0.9% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 497.43ms 492.98ms -0.9% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 74.36ms 74.93ms +0.8% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 587.5us 590.9us +0.6% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 116.18ms 115.61ms -0.5% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 461.6us 463.3us +0.4% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 414.42ms 413.48ms -0.2% stable

Comment thread tests/scanners/test_keras_h5_scanner.py Fixed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8da917608

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/keras_h5_scanner.py

Copy link
Copy Markdown
Contributor Author

Independent review of exact head 9ece3b123f4e1dad285d5def0fc894dac5910c40: request changes / do not merge.

  1. P1: external HDF5 virtual datasets are not inspected on the newly admitted >512 MiB path. A 536,875,008-byte Keras-shaped sparse H5 with an external VDS source returns clean exit 0. Add bounded VDS source inspection equivalent to Keras ZIP, with external/same-file/mixed/truncated-report tests.
  2. P1: layer_names/weight_names attributes are fully materialized before element/byte limits. A 2M-element attribute drove ~689 MB RSS versus ~75 MB baseline. Bound at dataspace/type level before Python conversion.
  3. P2: any root /layers SoftLink makes generic HDF5 look like Keras and return exit 2. Establish Keras-shaped ownership before fail-closing; test generic dangling links and Keras dangling/external links separately.
  4. P3: document that >512 MiB HDF5 no longer emits aggregate content_hash, even for callers that previously opted into large reads.

Strengthen pinned real-model QA to assert actual Keras layout/roots and an immutable digest, not only a passing generic format check. Full report: modelaudit-pr-swarm/outputs/pr-1663-independent-review-20260611.md.

@codex address these blockers, add regressions and bounded-memory proof, rerun pinned QA + focused/broad validation, push, and request exact-head review.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9ece3b123f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/keras_h5_scanner.py
Comment thread modelaudit/scanners/keras_h5_scanner.py
Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated
Comment thread CHANGELOG.md Outdated
Comment thread tests/scanners/test_keras_h5_scanner.py
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex incorporate this additional pinned large-HDF5 regression before merge:

facebook/esm2_t33_650M_UR50D@08e4846e537177426273712802403f7ba8261b6c contains a valid 2,605,109,760-byte tf_model.h5. Current origin/main rejects it at the generic 512 MiB cap with max_file_read_size_exceeded and exits 2. The final fix should reach truthful bounded Keras/HDF5 inspection without whole-file materialization, alongside the existing XLM-R fixture.

@chatgpt-codex-connector

Copy link
Copy Markdown

Summary

  • Added shared pinned Hugging Face large-HDF5 regression helpers that verify immutable Hub metadata, file size, model_config, and Keras root layout before asserting scans reach keras_h5 without max_file_read_size_exceeded.
  • Kept the existing XLM-R pinned fixture and strengthened it with expected Xet/SHA256 metadata and real Keras layout assertions.
  • Added the requested pinned ESM2 regression for facebook/esm2_t33_650M_UR50D@08e4846e537177426273712802403f7ba8261b6c, including the expected 2,605,109,760-byte size and immutable digest metadata.
  • Resolved the mixed modelaudit.core import style by using direct imports for determine_exit_code and scan_model_directory_or_file.

Testing

  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run mypy tests/scanners/test_keras_h5_scanner.py
  • uv run python -m py_compile tests/scanners/test_keras_h5_scanner.py
  • git diff --check
  • ⚠️ uv sync --extra all-ci (environment proxy blocked downloading nvidia-nvshmem-cu13 from PyPI)
  • ⚠️ PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_keras_h5_scanner.py::test_real_hf_xlm_roberta_large_h5_reaches_keras_scan_without_read_cap tests/scanners/test_keras_h5_scanner.py::test_real_hf_esm2_large_h5_reaches_keras_scan_without_read_cap -q (environment lacks h5py, so the h5py-gated module skipped during collection)

Committed changes on the current branch with commit 7290161f9927df61c2422d6992ac3c6cb66dee66 and created the PR via make_pr.

View task →

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14c0248cb0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/keras_h5_scanner.py
Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

PR #1663 independent re-review

Merge disposition

REQUEST CHANGES — DO NOT MERGE exact head 14c0248.

Two security/correctness blockers remain after the remediation push:

  1. Keras 3 loader-consumed root and optimizer variable groups are omitted from external-reference traversal, so newly admitted large malicious weights can return clean exit 0.
  2. The name-attribute guard prevents Python conversion but does not bound native HDF5/h5py memory incurred while opening a dense attribute; RSS remains linear in attacker-controlled metadata size.

A separate same-file VDS false positive causes benign aliases to emit CVE-2026-1669 and exit 1. The prior generic SoftLink and changelog blockers are fixed.

Review identity and live state

  • Repository: promptfoo/modelaudit
  • PR: fix: stream large Keras HDF5 inspection #1663, fix: stream large Keras HDF5 inspection
  • Requested base: 8d6c486
  • Requested and verified head: 14c0248
  • Live PR head at final review snapshot: 14c0248
  • Live PR base at final review snapshot: 8d6c486
  • Mergeability: MERGEABLE, but merge state BLOCKED and review decision REVIEW_REQUIRED
  • Diff: 4 files, +1,169/-113, 4 commits
  • Changed files: CHANGELOG.md, modelaudit/core.py, modelaudit/scanners/keras_h5_scanner.py, tests/scanners/test_keras_h5_scanner.py
  • Review checkout: isolated clean detached checkout at /tmp/modelaudit-pr1663-review-U25wD4; the dirty user checkout was not modified
  • Security scan bundle: /tmp/codex-security-scans/modelaudit/14c0248cb03c915a3483e62b9be2a06bc17246aa_20260611T045750Z/report.md and /tmp/codex-security-scans/modelaudit/14c0248cb03c915a3483e62b9be2a06bc17246aa_20260611T045750Z/report.html

Findings

P1 — Keras 3 /vars and /optimizer/vars external references still pass clean

Affected control: modelaudit/scanners/keras_h5_scanner.py:911-1039, especially the Keras 3 root enumeration at lines 1008-1039. Traversal only examines the returned roots at lines 1368-1431.

The remediation adds direct VDS inspection, but its Keras 3 root discovery only enumerates layers//vars. It does not include:

  • root /vars, which Keras H5IOStore.get reads when the saveable path is empty; or
  • /optimizer/vars, which Keras reaches through recursive child-saveable loading.

Official Keras v3.13.1 evidence is in keras/src/saving/saving_lib.py: H5IOStore.get reads /vars for an empty path and path/vars for child saveables, while _load_state recursively visits children such as optimizers.

I reproduced two independent 536,871,936-byte exact-head artifacts. Both contained an ordinary layers/dense/vars/0 dataset to establish Keras 3 ownership.

Probe Malicious location Head result
Keras 3 model variables /vars/0 ExternalLink to external.h5:/payload no CVE-2026-1669, scanner success true, aggregate exit 0
Keras 3 optimizer state /optimizer/vars/0 ExternalLink to external.h5:/payload no CVE-2026-1669, scanner success true, aggregate exit 0

The exact base rejects the same large Keras 3 shape with max_file_read_size_exceeded and exit 2. This PR therefore changes a fail-closed result into a clean false negative on loader-consumed paths.

This validates the newest live P1 thread, “Scan all Keras 3 vars roots before lifting size cap.” It is not fixed at the requested head.

Required fix: derive all proven Keras 3 loader-consumed saveable roots under a bounded ownership strategy, at minimum root /vars and /optimizer/vars, and test ExternalLink, external storage, external VDS, same-file VDS, mixed VDS, truncation, and clean controls for each independently reachable root.

P1 — Oversized name metadata remains native-memory-unbounded

Affected control: modelaudit/scanners/keras_h5_scanner.py:1075-1115.

Commit f9444f6 correctly avoids attrs[name] and Python list conversion after the dataspace/storage checks. However, the helper first performs attribute membership and attrs.get_id. With dense fixed-width attributes, the native HDF5/h5py layer allocates memory proportional to the encoded attribute before the code can reject its point count or storage size.

Exact-head isolated subprocess measurements for valid HDF5 files sparsely inflated to 536,871,936 bytes:

layer_names elements Encoded attribute bytes Exact-head max RSS
500,000 16,000,000 107,921,408
1,000,000 32,000,000 140,296,192
2,000,000 64,000,000 203,456,512
4,000,000 128,000,000 330,956,800

For the 64 MiB attribute, exact base stayed at 75,464,704 bytes because it rejected before h5py metadata access; exact head reached 202,866,688 bytes. A component probe placed h5py.File open near 39 MiB but attribute membership/get-id near 167 MiB, proving the residual growth occurs before Python decoding.

The scan now returns inconclusive rather than materializing the full Python array, which is an improvement, but a sufficiently large attribute can still OOM a worker. The production behavior is not bounded by the advertised 10 MiB/4,096-item limits.

The added regression at tests/scanners/test_keras_h5_scanner.py:980-1015 uses a tiny two-element attribute and monkeypatches AttributeManager.getitem. It proves only that Python-level conversion is skipped; it cannot detect native allocation during membership/get_id.

This prior P1 blocker is partially addressed but remains merge-blocking.

Required fix: retain a conservative cap, inspect risky HDF5 parsing in a hard memory-limited subprocess, or add a pre-h5py mechanism that can reject oversized attribute storage without opening it through the allocating native path. Add a real dense-attribute RSS/memory-limit regression.

P2 — Same-file VDS is falsely reported as CVE-2026-1669

Affected control: modelaudit/scanners/keras_h5_scanner.py:1271-1313.

record_virtual_dataset_sources increments external_reference_count for every virtual dataset before checking source filenames. HDF5 filename . means the virtual source is in the same file and does not cross the external-file boundary.

A 536,871,936-byte Keras HDF5 with model_weights/dense/kernel mapped to VirtualSource(".", "/source") produced:

  • one CVE-2026-1669 issue;
  • evidence identifying filename "." and path "/source";
  • aggregate exit 1.

An equivalent mapping to external.h5 also produced exit 1, as expected. The unchanged Keras ZIP implementation already has the correct distinction at modelaudit/scanners/keras_zip_scanner.py:3724-3727: it skips source_filename == "." before incrementing the external count.

This validates the newest live P2 thread, “Ignore same-file virtual dataset sources.”

Required fix: filter same-file VDS mappings before counting or reporting external sources, keep the dataset clean when all mappings are same-file, preserve external findings for mixed mappings, and fail closed if the bounded source walk cannot determine whether unvisited mappings are external.

P3 validation gap — pinned real Hugging Face QA is still incomplete

Affected tests: tests/scanners/test_keras_h5_scanner.py:1018-1055.

The XLM-R test is stronger than the prior head: it now asserts exact root keys, exact root attributes, bounded layer_names decoding, and layer_names == ["roberta"]. That closes the prior concern that a generic HDF5 pass alone could satisfy the test.

Two requested pieces are still absent at exact head:

  • no immutable SHA-256/Xet digest assertion exists for the downloaded XLM-R artifact;
  • there is no facebook/esm2_t33_650M_UR50D test anywhere in the file, despite the live PR request and a connector comment claiming that it was added.

The connector comment references commit 7290161f9927df61c2422d6992ac3c6cb66dee66, but that commit is not in the exact four-commit PR head. The current branch contains only e8da917, 9ece3b1, f9444f6, and 14c0248.

The opt-in XLM-R test was skipped locally because no cached 2.24 GB artifact was present and huggingface.co is unavailable from this environment. This gap does not supersede the reproduced source bugs above, but it means realistic multi-GB Hugging Face compatibility is not independently closed.

Prior-blocker reconciliation

Prior blocker or requested focus Exact-head disposition Evidence
Direct legacy external VDS clean bypass Fixed for the original model_weights path Large external VDS now reports CVE-2026-1669 and exit 1.
All loader-consumed VDS/external-reference paths Unfixed Root /vars and /optimizer/vars return clean exit 0.
Same-file VDS handling New regression Valid "." mapping reports CVE and exit 1.
Unbounded layer_names/weight_names Python materialization Fixed narrowly attrs[name]/tolist path is skipped after metadata checks.
Bounded metadata memory overall Unfixed Native RSS scales from 108 MB to 331 MB with 16 MB to 128 MB attributes.
Generic dangling /layers SoftLink false positive Fixed Exact regression passes clean with generic_h5 format.
Established Keras SoftLink-to-external handling Fixed ExternalLink and external-storage aliases are detected.
SoftLink cycle fail-open Fixed Cycle remains inconclusive/exit 2 when no security finding takes precedence.
External-storage segment/report limits Fixed in inspected roots Existing bounded tests pass; omitted Keras 3 roots remain outside analysis.
VDS source/report limits Partially fixed Per-dataset source reporting is bounded and truncation fails closed, but same-file classification and root coverage are wrong.
Oversized model_config/training_config Fixed for Python parsing path Oversized/malformed config fails closed before JSON parsing.
Missing h5py Fail-closed preserved Direct missing-h5py regression remains inconclusive/exit 2.
Large Lambda detection Preserved Large malicious Lambda regression passes and yields security exit 1.
Aggregate content_hash compatibility disclosure Fixed CHANGELOG now states that large file-backed HDF5 omits aggregate content_hash.
Real XLM-R structural QA Partially fixed Structure and layer_names are asserted; immutable digest is absent and test was not rerun locally.
Requested ESM2 2.605 GB regression Unfixed No esm2 test exists at exact head.
Core directory/single/stream hash deferral No new correctness defect found Changed paths consistently mark aggregate hash incomplete and avoid whole-file hash.
Fail-open behavior overall Unfixed Keras 3 root and optimizer external references return clean exit 0.
Realistic false positives Unfixed Same-file VDS produces a concrete false positive.

Review-thread reconciliation

Live review threads inspected at exact head:

  • Resolved import-style thread: fixed.
  • Older unbounded-attribute threads: still marked unresolved; the Python conversion issue is improved, but native resource growth validates the blocker in a narrower residual form.
  • Older VDS thread: direct legacy VDS is fixed, but the current exact-head root-coverage P1 is valid.
  • Older generic SoftLink thread: outdated and fixed by current code/test.
  • Older changelog thread: outdated and fixed.
  • Pinned QA thread: still unresolved and only partially addressed.
  • Current “all Keras 3 vars roots” P1: validated.
  • Current “same-file VDS” P2: validated.

No GitHub comments or reviews were posted by this re-review.

Exact-head validation

Static checks

  • ruff check on modelaudit/core.py, modelaudit/scanners/keras_h5_scanner.py, and tests/scanners/test_keras_h5_scanner.py: passed.
  • ruff format --check on the same files: passed.
  • mypy on the same files: passed.
  • git diff --check for base..head: passed.

Focused tests

  • Targeted remediation filter covering large HDF5 external link/storage/VDS, SoftLink, oversized weight names, and generic dangling /layers: 8 passed.
  • Full tests/scanners/test_keras_h5_scanner.py: 306 passed, 2 failed, 1 skipped in 17.42s.
  • Skipped: opt-in real XLM-R Hugging Face download test.
  • Failure 1, test_missing_h5py_invalidates_stale_cache_entries: also fails on exact base in this local cache-identity harness, so it is not attributed to PR fix: stream large Keras HDF5 inspection #1663.
  • Failure 2, test_h5py_runtime_failure_bypasses_stale_clean_cache: passed immediately in isolated rerun, so it is treated as order/environment-sensitive rather than a validated PR regression.

Adversarial end-to-end probes

  • Large same-file VDS: false-positive CVE-2026-1669, exit 1.
  • Large external VDS: correctly detected, exit 1.
  • Large Keras 3 root /vars ExternalLink: clean false negative, exit 0.
  • Large Keras 3 /optimizer/vars ExternalLink: clean false negative, exit 0.
  • Base comparison for large Keras 3 artifact: max_file_read_size_exceeded, exit 2.
  • Dense fixed-width name attributes: exact-head RSS scales linearly with encoded metadata.
  • Large generic dangling /layers SoftLink: clean.
  • Large SoftLink cycle in established Keras weights: inconclusive/exit 2.
  • Large malicious Lambda: detected/security exit 1.
  • Malformed/oversized config: inconclusive/exit 2.
  • Sparse/chunked/compressed payload non-materialization tests: passed in the focused file suite.

CI snapshot

At the final snapshot, passing checks included CodeQL, CodeQL Security Analysis, lint/format, type check, dependency audit, build/package, Docker lightweight/TensorFlow images and aggregate Docker success, benchmarks, documentation formatting, vendored-proto tests/reproducibility, PR title, and Python 3.13.

Still pending at the snapshot:

  • Quick Feedback, Python 3.12
  • Test Python 3.10
  • Windows Tests, Python 3.11

Skipped matrix jobs were reported as skipped rather than failed. CI therefore was not fully complete, but the merge disposition is already request-changes based on deterministic exact-head reproductions.

Merge recommendation

Do not merge 14c0248.

Before another merge-gate review:

  1. Cover every loader-consumed Keras 3 state root, at minimum /vars and /optimizer/vars, under explicit global link/source/report budgets.
  2. Filter same-file VDS sources and add same-only, external-only, mixed, and truncated-walk tests.
  3. Replace the current Python-only metadata test with a process-level memory bound that catches native HDF5 allocation.
  4. Add the missing pinned ESM2 regression and immutable artifact identity checks for both Hugging Face fixtures.
  5. Rerun the focused scanner suite, exact adversarial probes, real-model tests, broad non-integration suite, and all CI lanes at the new exact head.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Independent exact-head re-review of f13c36f447bdbf488f5b615d9b24c4fee320df83: request changes; do not merge yet. All GitHub CI is green and all nine review threads are resolved, but two P1 blockers remain.

  1. Arbitrary Keras 3 saveable roots remain unscanned. modelaudit/scanners/keras_h5_scanner.py:985-997,1034-1055 hard-codes root vars, optimizer/vars, and immediate layer vars, while Keras recursively loads arbitrary <saveable path>/vars. An exact-head 536,871,936-byte /custom_child/vars/0 ExternalLink artifact returned clean exit 0 while loading external payload data; exact base failed closed with exit 2. Traverse arbitrary bounded saveable roots and add a regression for nested custom saveables.

  2. Dense HDF5 attributes remain native-memory-unbounded. modelaudit/scanners/keras_h5_scanner.py:183-185,1098-1124 enters h5py attribute membership/get-id before enforcing limits. A 64 MiB layer_names attribute raised exact-head RSS from 157 MB to 285 MB; exact base stayed near 106 MB because it rejected before h5py traversal. Enforce a pre-read/native-safe attribute policy and add a bounded-memory regression.

Prior /vars, /optimizer/vars, same-file and mixed VDS, external-storage, and truncated-source controls are fixed. Full scanner validation was 313 passed, 1 skipped; focused controls were 17 passed; Ruff, formatting, mypy, and the full CI matrix passed. The immutable XLM-R/ESM2 real-model QA gap should also be closed before merge. Full report: modelaudit-pr-swarm/outputs/pr-1663-independent-rereview-f13c-20260611.md.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Pinned real-model QA from Hugging Face rank 255 reproduces the large-file coverage blocker on current main.

  • Model/revision: openai/whisper-large-v2@ae4642769ce2ad8fc292556ccea8e901f1530655
  • ModelAudit: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69
  • Total: 16 selected files, 24,699,233,679 declared bytes
  • Runtime: 3,007 seconds
  • Result: exit 2, valid JSON

pytorch_model.bin (6,173,629,930 bytes) routes to zip but fails S904 at max_file_read_size=536,870,912; tf_model.h5 (6,174,574,896 bytes) routes to keras_h5 and fails the same ceiling. Both return scan_outcome=inconclusive with max_file_read_size_exceeded.

Positive control: the 6,173,264,923-byte flax_model.msgpack streams successfully (2,105 nodes, 743 tensors, no incomplete marker). Please pin this exact model/revision as end-to-end QA and prove the relevant artifact is actually inspected without a full-file read, with bounded time/memory and a clean coverage outcome.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f2efbfdc5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated
Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated
Comment thread modelaudit/scanners/keras_h5_scanner.py
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/keras_h5_scanner.py Fixed
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91bcec711d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/keras_h5_scanner.py
Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Independent review of exact head 0e957dae7e7f851b8b7441bdc554816548ee378b found two blockers:

  1. P1: root-only Keras 3 .weights.h5 external references are clean false negatives. Real Keras 3.14/JAX generated a valid weights file containing only /vars/0. External-reference scanning runs only when model_config exists or the Keras 3 layout gate sees /layers/.../vars, so root-only /vars falls through as generic_h5. A 536,875,008-byte .weights.h5 and extensionless HDF5 with /vars/0 as ExternalLink both exit 0, success true, no issues, null content hash; external storage and VDS variants also exit 0. Emulating the old 512 MiB cap fails closed, so this PR changes the large payload from exit 2 to clean exit 0. A .keras archive with the same member correctly reports CVE-2026-1669, proving direct/archive parity is broken.
  2. P2: dense attribute worker exceeds the advertised 256 MiB memory budget. Fixed-width layer_names attributes of 16/64/128 MiB drove child max RSS to approximately 200/396/659 MiB before failing closed. The outcome is safe, but the resource bound is not enforced; existing tests only prove parent-process access is avoided.

Validation: 322 Keras HDF5 tests passed with three opt-in HF skips. Custom QA covered malformed configs, Lambda code, userblock HDF5, extensionless/HF paths, link cycles, VDS, sparse/chunked/compressed datasets, and direct/archive parity. Exact-head CI is green and threads are resolved, but the P1 false-clean behavior blocks merge.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aeae89f400

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/core.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dda569c5f1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated
Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a49afb026

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/keras_h5_scanner.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08924c1c79

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/keras_h5_scanner.py
@mldangelo-oai mldangelo-oai merged commit 0ff7b62 into main Jun 12, 2026
29 checks passed
@mldangelo-oai mldangelo-oai deleted the mdangelo/codex/hf-fp-t33-large-keras-h5-streaming-20260610 branch June 12, 2026 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant