fix: stream large Keras HDF5 inspection by mldangelo-oai · Pull Request #1663 · promptfoo/modelaudit

mldangelo-oai · 2026-06-11T03:54:08Z

Summary

Make Keras HDF5 scanning file-backed for large models by using h5py metadata/link traversal instead of the generic 512 MiB whole-file read gate.
Defer whole-file aggregate hashing for large HDF5 inputs so core dispatch does not materialize multi-GB model files before the Keras scanner runs.
Add bounded JSON/name attribute reads and SoftLink/VDS-aware external-reference traversal so malformed, cyclic, oversized, external-link, external-storage, and Lambda controls still fail closed or report security findings.

Root Cause

KerasH5Scanner inherited the default BaseScanner max_file_read_size of 512 MiB, and core also attempted top-level hashing before scanner dispatch. The pinned 2.24 GB valid HDF5 model was therefore rejected before h5py.File(...) could perform file-backed inspection.

Baseline on origin/main at 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69:

{"success": false, "exit_code": 2, "bytes_scanned": 0, "scan_outcome_reasons": ["max_file_read_size_exceeded"], "file_size": 2240076248}

Security Tradeoffs

Large HDF5 files now skip whole-file integrity hashing and report content_hash=null at the aggregate level, because hashing would reintroduce the same full-file read. Small H5 files keep the existing file integrity hash. Keras H5 inspection remains bounded by explicit HDF5 JSON/name attribute, link traversal, SoftLink resolution, virtual dataset source, and external-reference reporting limits; ambiguous or malformed evidence remains inconclusive unless a security finding is already present.

Pinned Real-Model QA

Pinned model:

repo=FacebookAI/xlm-roberta-large
revision=c23d21b0620b635a76227c604d44e43a9f0ee389
filename=tf_model.h5
size=2240076248

Opt-in regression:

MODELAUDIT_RUN_REAL_HF_H5=1 MODELAUDIT_HF_CACHE_DIR=../hf-cache PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_keras_h5_scanner.py::test_real_hf_xlm_roberta_large_h5_reaches_keras_scan_without_read_cap -q

Outcome:

1 passed in 3.75s

Direct terminal scan outcome on the cached pinned file:

{"content_hash": null, "elapsed_seconds": 0.869, "exit_code": 0, "file_size": 2240076248, "files_scanned": 1, "max_rss_kb": 96920, "scan_outcome": null, "scan_outcome_reasons": null, "scanner_names": ["keras_h5"], "success": true}

The real-model test also asserts the bounded HDF5 metadata shape: root keys ["roberta", "top_level_model_weights"], attrs {"backend", "keras_version", "layer_names"}, and bounded layer_names == ["roberta"].

Malicious And Malformed Controls

Large malicious Lambda H5 still reports CVE-2025-9905 and exits with security findings.
Large malformed model_config fails closed as inconclusive without max_file_read_size_exceeded.
Large ExternalLink and SoftLink-to-ExternalLink H5 controls still report CVE-2026-1669.
Large SoftLink-to-external-storage H5 controls still report CVE-2026-1669.
Large out-of-file Virtual Dataset source H5 controls still report CVE-2026-1669.
Same-file Virtual Dataset sources stay clean.
Keras 3 root vars and optimizer/vars external-link/VDS controls still report CVE-2026-1669.
SoftLink cycles and oversized config/name attributes fail closed as inconclusive.
Oversized layer_names / weight_names attributes fail closed before materialization.
Generic HDF5 files with dangling /layers SoftLinks stay clean instead of being treated as Keras weights.
Sparse/chunked/compressed HDF5 datasets are not materialized while metadata/Lambda detection still runs.

Validation

uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_keras_h5_scanner.py -q
MODELAUDIT_RUN_REAL_HF_H5=1 MODELAUDIT_HF_CACHE_DIR=../hf-cache PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_keras_h5_scanner.py::test_real_hf_xlm_roberta_large_h5_reaches_keras_scan_without_read_cap -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1
git diff --check

Results:

ruff check: All checks passed
ruff format --check: 419 files already formatted
mypy: Success: no issues found in 474 source files
Keras H5 scanner tests: 313 passed, 1 skipped
Real HF pinned H5 regression: 1 passed
Broad non-integration suite: 18558 passed, 793 skipped, 40 warnings in 1138.26s
git diff --check: clean

mldangelo-oai · 2026-06-11T03:54:28Z

@codex review

github-actions · 2026-06-11T03:56:29Z

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.489s -> 1.476s (-0.9%).

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	124.54ms	118.83ms	-4.6%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	514.4us	525.3us	+2.1%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	512.1us	518.8us	+1.3%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	547.8us	554.3us	+1.2%	stable
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	146.02ms	144.34ms	-1.1%	stable
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	113.78ms	112.70ms	-0.9%	stable
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	497.43ms	492.98ms	-0.9%	stable
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	74.36ms	74.93ms	+0.8%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	587.5us	590.9us	+0.6%	stable
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	116.18ms	115.61ms	-0.5%	stable
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	461.6us	463.3us	+0.4%	stable
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	414.42ms	413.48ms	-0.2%	stable

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8da917608

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T04:10:41Z

Independent review of exact head 9ece3b123f4e1dad285d5def0fc894dac5910c40: request changes / do not merge.

P1: external HDF5 virtual datasets are not inspected on the newly admitted >512 MiB path. A 536,875,008-byte Keras-shaped sparse H5 with an external VDS source returns clean exit 0. Add bounded VDS source inspection equivalent to Keras ZIP, with external/same-file/mixed/truncated-report tests.
P1: layer_names/weight_names attributes are fully materialized before element/byte limits. A 2M-element attribute drove ~689 MB RSS versus ~75 MB baseline. Bound at dataspace/type level before Python conversion.
P2: any root /layers SoftLink makes generic HDF5 look like Keras and return exit 2. Establish Keras-shaped ownership before fail-closing; test generic dangling links and Keras dangling/external links separately.
P3: document that >512 MiB HDF5 no longer emits aggregate content_hash, even for callers that previously opted into large reads.

Strengthen pinned real-model QA to assert actual Keras layout/roots and an immutable digest, not only a passing generic format check. Full report: modelaudit-pr-swarm/outputs/pr-1663-independent-review-20260611.md.

@codex address these blockers, add regressions and bounded-memory proof, rerun pinned QA + focused/broad validation, push, and request exact-head review.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9ece3b123f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T04:32:47Z

@codex incorporate this additional pinned large-HDF5 regression before merge:

facebook/esm2_t33_650M_UR50D@08e4846e537177426273712802403f7ba8261b6c contains a valid 2,605,109,760-byte tf_model.h5. Current origin/main rejects it at the generic 512 MiB cap with max_file_read_size_exceeded and exits 2. The final fix should reach truthful bounded Keras/HDF5 inspection without whole-file materialization, alongside the existing XLM-R fixture.

chatgpt-codex-connector · 2026-06-11T04:37:50Z

Summary

Added shared pinned Hugging Face large-HDF5 regression helpers that verify immutable Hub metadata, file size, model_config, and Keras root layout before asserting scans reach keras_h5 without max_file_read_size_exceeded.
Kept the existing XLM-R pinned fixture and strengthened it with expected Xet/SHA256 metadata and real Keras layout assertions.
Added the requested pinned ESM2 regression for facebook/esm2_t33_650M_UR50D@08e4846e537177426273712802403f7ba8261b6c, including the expected 2,605,109,760-byte size and immutable digest metadata.
Resolved the mixed modelaudit.core import style by using direct imports for determine_exit_code and scan_model_directory_or_file.

Testing

✅ uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
✅ uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
✅ uv run mypy tests/scanners/test_keras_h5_scanner.py
✅ uv run python -m py_compile tests/scanners/test_keras_h5_scanner.py
✅ git diff --check
⚠️ uv sync --extra all-ci (environment proxy blocked downloading nvidia-nvshmem-cu13 from PyPI)
⚠️ PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_keras_h5_scanner.py::test_real_hf_xlm_roberta_large_h5_reaches_keras_scan_without_read_cap tests/scanners/test_keras_h5_scanner.py::test_real_hf_esm2_large_h5_reaches_keras_scan_without_read_cap -q (environment lacks h5py, so the h5py-gated module skipped during collection)

Committed changes on the current branch with commit 7290161f9927df61c2422d6992ac3c6cb66dee66 and created the PR via make_pr.

View task →

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14c0248cb0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T05:08:49Z

PR #1663 independent re-review

Merge disposition

REQUEST CHANGES — DO NOT MERGE exact head 14c0248.

Two security/correctness blockers remain after the remediation push:

Keras 3 loader-consumed root and optimizer variable groups are omitted from external-reference traversal, so newly admitted large malicious weights can return clean exit 0.
The name-attribute guard prevents Python conversion but does not bound native HDF5/h5py memory incurred while opening a dense attribute; RSS remains linear in attacker-controlled metadata size.

A separate same-file VDS false positive causes benign aliases to emit CVE-2026-1669 and exit 1. The prior generic SoftLink and changelog blockers are fixed.

Review identity and live state

Repository: promptfoo/modelaudit
PR: fix: stream large Keras HDF5 inspection #1663, fix: stream large Keras HDF5 inspection
Requested base: 8d6c486
Requested and verified head: 14c0248
Live PR head at final review snapshot: 14c0248
Live PR base at final review snapshot: 8d6c486
Mergeability: MERGEABLE, but merge state BLOCKED and review decision REVIEW_REQUIRED
Diff: 4 files, +1,169/-113, 4 commits
Changed files: CHANGELOG.md, modelaudit/core.py, modelaudit/scanners/keras_h5_scanner.py, tests/scanners/test_keras_h5_scanner.py
Review checkout: isolated clean detached checkout at /tmp/modelaudit-pr1663-review-U25wD4; the dirty user checkout was not modified
Security scan bundle: /tmp/codex-security-scans/modelaudit/14c0248cb03c915a3483e62b9be2a06bc17246aa_20260611T045750Z/report.md and /tmp/codex-security-scans/modelaudit/14c0248cb03c915a3483e62b9be2a06bc17246aa_20260611T045750Z/report.html

Findings

P1 — Keras 3 /vars and /optimizer/vars external references still pass clean

Affected control: modelaudit/scanners/keras_h5_scanner.py:911-1039, especially the Keras 3 root enumeration at lines 1008-1039. Traversal only examines the returned roots at lines 1368-1431.

The remediation adds direct VDS inspection, but its Keras 3 root discovery only enumerates layers//vars. It does not include:

root /vars, which Keras H5IOStore.get reads when the saveable path is empty; or
/optimizer/vars, which Keras reaches through recursive child-saveable loading.

Official Keras v3.13.1 evidence is in keras/src/saving/saving_lib.py: H5IOStore.get reads /vars for an empty path and path/vars for child saveables, while _load_state recursively visits children such as optimizers.

I reproduced two independent 536,871,936-byte exact-head artifacts. Both contained an ordinary layers/dense/vars/0 dataset to establish Keras 3 ownership.

Probe	Malicious location	Head result
Keras 3 model variables	/vars/0 ExternalLink to external.h5:/payload	no CVE-2026-1669, scanner success true, aggregate exit 0
Keras 3 optimizer state	/optimizer/vars/0 ExternalLink to external.h5:/payload	no CVE-2026-1669, scanner success true, aggregate exit 0

The exact base rejects the same large Keras 3 shape with max_file_read_size_exceeded and exit 2. This PR therefore changes a fail-closed result into a clean false negative on loader-consumed paths.

This validates the newest live P1 thread, “Scan all Keras 3 vars roots before lifting size cap.” It is not fixed at the requested head.

Required fix: derive all proven Keras 3 loader-consumed saveable roots under a bounded ownership strategy, at minimum root /vars and /optimizer/vars, and test ExternalLink, external storage, external VDS, same-file VDS, mixed VDS, truncation, and clean controls for each independently reachable root.

P1 — Oversized name metadata remains native-memory-unbounded

Affected control: modelaudit/scanners/keras_h5_scanner.py:1075-1115.

Commit f9444f6 correctly avoids attrs[name] and Python list conversion after the dataspace/storage checks. However, the helper first performs attribute membership and attrs.get_id. With dense fixed-width attributes, the native HDF5/h5py layer allocates memory proportional to the encoded attribute before the code can reject its point count or storage size.

Exact-head isolated subprocess measurements for valid HDF5 files sparsely inflated to 536,871,936 bytes:

layer_names elements	Encoded attribute bytes	Exact-head max RSS
500,000	16,000,000	107,921,408
1,000,000	32,000,000	140,296,192
2,000,000	64,000,000	203,456,512
4,000,000	128,000,000	330,956,800

For the 64 MiB attribute, exact base stayed at 75,464,704 bytes because it rejected before h5py metadata access; exact head reached 202,866,688 bytes. A component probe placed h5py.File open near 39 MiB but attribute membership/get-id near 167 MiB, proving the residual growth occurs before Python decoding.

The scan now returns inconclusive rather than materializing the full Python array, which is an improvement, but a sufficiently large attribute can still OOM a worker. The production behavior is not bounded by the advertised 10 MiB/4,096-item limits.

The added regression at tests/scanners/test_keras_h5_scanner.py:980-1015 uses a tiny two-element attribute and monkeypatches AttributeManager.getitem. It proves only that Python-level conversion is skipped; it cannot detect native allocation during membership/get_id.

This prior P1 blocker is partially addressed but remains merge-blocking.

Required fix: retain a conservative cap, inspect risky HDF5 parsing in a hard memory-limited subprocess, or add a pre-h5py mechanism that can reject oversized attribute storage without opening it through the allocating native path. Add a real dense-attribute RSS/memory-limit regression.

P2 — Same-file VDS is falsely reported as CVE-2026-1669

Affected control: modelaudit/scanners/keras_h5_scanner.py:1271-1313.

record_virtual_dataset_sources increments external_reference_count for every virtual dataset before checking source filenames. HDF5 filename . means the virtual source is in the same file and does not cross the external-file boundary.

A 536,871,936-byte Keras HDF5 with model_weights/dense/kernel mapped to VirtualSource(".", "/source") produced:

one CVE-2026-1669 issue;
evidence identifying filename "." and path "/source";
aggregate exit 1.

An equivalent mapping to external.h5 also produced exit 1, as expected. The unchanged Keras ZIP implementation already has the correct distinction at modelaudit/scanners/keras_zip_scanner.py:3724-3727: it skips source_filename == "." before incrementing the external count.

This validates the newest live P2 thread, “Ignore same-file virtual dataset sources.”

Required fix: filter same-file VDS mappings before counting or reporting external sources, keep the dataset clean when all mappings are same-file, preserve external findings for mixed mappings, and fail closed if the bounded source walk cannot determine whether unvisited mappings are external.

P3 validation gap — pinned real Hugging Face QA is still incomplete

Affected tests: tests/scanners/test_keras_h5_scanner.py:1018-1055.

The XLM-R test is stronger than the prior head: it now asserts exact root keys, exact root attributes, bounded layer_names decoding, and layer_names == ["roberta"]. That closes the prior concern that a generic HDF5 pass alone could satisfy the test.

Two requested pieces are still absent at exact head:

no immutable SHA-256/Xet digest assertion exists for the downloaded XLM-R artifact;
there is no facebook/esm2_t33_650M_UR50D test anywhere in the file, despite the live PR request and a connector comment claiming that it was added.

The connector comment references commit 7290161f9927df61c2422d6992ac3c6cb66dee66, but that commit is not in the exact four-commit PR head. The current branch contains only e8da917, 9ece3b1, f9444f6, and 14c0248.

The opt-in XLM-R test was skipped locally because no cached 2.24 GB artifact was present and huggingface.co is unavailable from this environment. This gap does not supersede the reproduced source bugs above, but it means realistic multi-GB Hugging Face compatibility is not independently closed.

Prior-blocker reconciliation

Prior blocker or requested focus	Exact-head disposition	Evidence
Direct legacy external VDS clean bypass	Fixed for the original model_weights path	Large external VDS now reports CVE-2026-1669 and exit 1.
All loader-consumed VDS/external-reference paths	Unfixed	Root /vars and /optimizer/vars return clean exit 0.
Same-file VDS handling	New regression	Valid "." mapping reports CVE and exit 1.
Unbounded layer_names/weight_names Python materialization	Fixed narrowly	attrs[name]/tolist path is skipped after metadata checks.
Bounded metadata memory overall	Unfixed	Native RSS scales from 108 MB to 331 MB with 16 MB to 128 MB attributes.
Generic dangling /layers SoftLink false positive	Fixed	Exact regression passes clean with generic_h5 format.
Established Keras SoftLink-to-external handling	Fixed	ExternalLink and external-storage aliases are detected.
SoftLink cycle fail-open	Fixed	Cycle remains inconclusive/exit 2 when no security finding takes precedence.
External-storage segment/report limits	Fixed in inspected roots	Existing bounded tests pass; omitted Keras 3 roots remain outside analysis.
VDS source/report limits	Partially fixed	Per-dataset source reporting is bounded and truncation fails closed, but same-file classification and root coverage are wrong.
Oversized model_config/training_config	Fixed for Python parsing path	Oversized/malformed config fails closed before JSON parsing.
Missing h5py	Fail-closed preserved	Direct missing-h5py regression remains inconclusive/exit 2.
Large Lambda detection	Preserved	Large malicious Lambda regression passes and yields security exit 1.
Aggregate content_hash compatibility disclosure	Fixed	CHANGELOG now states that large file-backed HDF5 omits aggregate content_hash.
Real XLM-R structural QA	Partially fixed	Structure and layer_names are asserted; immutable digest is absent and test was not rerun locally.
Requested ESM2 2.605 GB regression	Unfixed	No esm2 test exists at exact head.
Core directory/single/stream hash deferral	No new correctness defect found	Changed paths consistently mark aggregate hash incomplete and avoid whole-file hash.
Fail-open behavior overall	Unfixed	Keras 3 root and optimizer external references return clean exit 0.
Realistic false positives	Unfixed	Same-file VDS produces a concrete false positive.

Review-thread reconciliation

Live review threads inspected at exact head:

Resolved import-style thread: fixed.
Older unbounded-attribute threads: still marked unresolved; the Python conversion issue is improved, but native resource growth validates the blocker in a narrower residual form.
Older VDS thread: direct legacy VDS is fixed, but the current exact-head root-coverage P1 is valid.
Older generic SoftLink thread: outdated and fixed by current code/test.
Older changelog thread: outdated and fixed.
Pinned QA thread: still unresolved and only partially addressed.
Current “all Keras 3 vars roots” P1: validated.
Current “same-file VDS” P2: validated.

No GitHub comments or reviews were posted by this re-review.

Exact-head validation

Static checks

ruff check on modelaudit/core.py, modelaudit/scanners/keras_h5_scanner.py, and tests/scanners/test_keras_h5_scanner.py: passed.
ruff format --check on the same files: passed.
mypy on the same files: passed.
git diff --check for base..head: passed.

Focused tests

Targeted remediation filter covering large HDF5 external link/storage/VDS, SoftLink, oversized weight names, and generic dangling /layers: 8 passed.
Full tests/scanners/test_keras_h5_scanner.py: 306 passed, 2 failed, 1 skipped in 17.42s.
Skipped: opt-in real XLM-R Hugging Face download test.
Failure 1, test_missing_h5py_invalidates_stale_cache_entries: also fails on exact base in this local cache-identity harness, so it is not attributed to PR fix: stream large Keras HDF5 inspection #1663.
Failure 2, test_h5py_runtime_failure_bypasses_stale_clean_cache: passed immediately in isolated rerun, so it is treated as order/environment-sensitive rather than a validated PR regression.

Adversarial end-to-end probes

Large same-file VDS: false-positive CVE-2026-1669, exit 1.
Large external VDS: correctly detected, exit 1.
Large Keras 3 root /vars ExternalLink: clean false negative, exit 0.
Large Keras 3 /optimizer/vars ExternalLink: clean false negative, exit 0.
Base comparison for large Keras 3 artifact: max_file_read_size_exceeded, exit 2.
Dense fixed-width name attributes: exact-head RSS scales linearly with encoded metadata.
Large generic dangling /layers SoftLink: clean.
Large SoftLink cycle in established Keras weights: inconclusive/exit 2.
Large malicious Lambda: detected/security exit 1.
Malformed/oversized config: inconclusive/exit 2.
Sparse/chunked/compressed payload non-materialization tests: passed in the focused file suite.

CI snapshot

At the final snapshot, passing checks included CodeQL, CodeQL Security Analysis, lint/format, type check, dependency audit, build/package, Docker lightweight/TensorFlow images and aggregate Docker success, benchmarks, documentation formatting, vendored-proto tests/reproducibility, PR title, and Python 3.13.

Still pending at the snapshot:

Quick Feedback, Python 3.12
Test Python 3.10
Windows Tests, Python 3.11

Skipped matrix jobs were reported as skipped rather than failed. CI therefore was not fully complete, but the merge disposition is already request-changes based on deterministic exact-head reproductions.

Merge recommendation

Do not merge 14c0248.

Before another merge-gate review:

Cover every loader-consumed Keras 3 state root, at minimum /vars and /optimizer/vars, under explicit global link/source/report budgets.
Filter same-file VDS sources and add same-only, external-only, mixed, and truncated-walk tests.
Replace the current Python-only metadata test with a process-level memory bound that catches native HDF5 allocation.
Add the missing pinned ESM2 regression and immutable artifact identity checks for both Hugging Face fixtures.
Rerun the focused scanner suite, exact adversarial probes, real-model tests, broad non-integration suite, and all CI lanes at the new exact head.

mldangelo-oai · 2026-06-11T05:36:34Z

@codex review

chatgpt-codex-connector · 2026-06-11T05:40:16Z

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T05:52:26Z

@codex review

chatgpt-codex-connector · 2026-06-11T05:56:34Z

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T06:22:43Z

Independent exact-head re-review of f13c36f447bdbf488f5b615d9b24c4fee320df83: request changes; do not merge yet. All GitHub CI is green and all nine review threads are resolved, but two P1 blockers remain.

Arbitrary Keras 3 saveable roots remain unscanned. modelaudit/scanners/keras_h5_scanner.py:985-997,1034-1055 hard-codes root vars, optimizer/vars, and immediate layer vars, while Keras recursively loads arbitrary <saveable path>/vars. An exact-head 536,871,936-byte /custom_child/vars/0 ExternalLink artifact returned clean exit 0 while loading external payload data; exact base failed closed with exit 2. Traverse arbitrary bounded saveable roots and add a regression for nested custom saveables.
Dense HDF5 attributes remain native-memory-unbounded. modelaudit/scanners/keras_h5_scanner.py:183-185,1098-1124 enters h5py attribute membership/get-id before enforcing limits. A 64 MiB layer_names attribute raised exact-head RSS from 157 MB to 285 MB; exact base stayed near 106 MB because it rejected before h5py traversal. Enforce a pre-read/native-safe attribute policy and add a bounded-memory regression.

Prior /vars, /optimizer/vars, same-file and mixed VDS, external-storage, and truncated-source controls are fixed. Full scanner validation was 313 passed, 1 skipped; focused controls were 17 passed; Ruff, formatting, mypy, and the full CI matrix passed. The immutable XLM-R/ESM2 real-model QA gap should also be closed before merge. Full report: modelaudit-pr-swarm/outputs/pr-1663-independent-rereview-f13c-20260611.md.

mldangelo-oai · 2026-06-11T06:56:01Z

Pinned real-model QA from Hugging Face rank 255 reproduces the large-file coverage blocker on current main.

Model/revision: openai/whisper-large-v2@ae4642769ce2ad8fc292556ccea8e901f1530655
ModelAudit: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69
Total: 16 selected files, 24,699,233,679 declared bytes
Runtime: 3,007 seconds
Result: exit 2, valid JSON

pytorch_model.bin (6,173,629,930 bytes) routes to zip but fails S904 at max_file_read_size=536,870,912; tf_model.h5 (6,174,574,896 bytes) routes to keras_h5 and fails the same ceiling. Both return scan_outcome=inconclusive with max_file_read_size_exceeded.

Positive control: the 6,173,264,923-byte flax_model.msgpack streams successfully (2,105 nodes, 743 tensors, no incomplete marker). Please pin this exact model/revision as end-to-end QA and prove the relevant artifact is actually inspected without a full-file read, with bounded time/memory and a clean coverage outcome.

mldangelo-oai · 2026-06-11T07:19:28Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f2efbfdc5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T08:11:21Z

@codex review

chatgpt-codex-connector · 2026-06-11T08:15:23Z

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T08:19:19Z

@codex review

chatgpt-codex-connector · 2026-06-11T08:22:45Z

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T08:55:15Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91bcec711d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T09:58:33Z

@codex review

chatgpt-codex-connector · 2026-06-11T10:02:57Z

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T11:38:42Z

Independent review of exact head 0e957dae7e7f851b8b7441bdc554816548ee378b found two blockers:

P1: root-only Keras 3 .weights.h5 external references are clean false negatives. Real Keras 3.14/JAX generated a valid weights file containing only /vars/0. External-reference scanning runs only when model_config exists or the Keras 3 layout gate sees /layers/.../vars, so root-only /vars falls through as generic_h5. A 536,875,008-byte .weights.h5 and extensionless HDF5 with /vars/0 as ExternalLink both exit 0, success true, no issues, null content hash; external storage and VDS variants also exit 0. Emulating the old 512 MiB cap fails closed, so this PR changes the large payload from exit 2 to clean exit 0. A .keras archive with the same member correctly reports CVE-2026-1669, proving direct/archive parity is broken.
P2: dense attribute worker exceeds the advertised 256 MiB memory budget. Fixed-width layer_names attributes of 16/64/128 MiB drove child max RSS to approximately 200/396/659 MiB before failing closed. The outcome is safe, but the resource bound is not enforced; existing tests only prove parent-process access is avoided.

Validation: 322 Keras HDF5 tests passed with three opt-in HF skips. Custom QA covered malformed configs, Lambda code, userblock HDF5, extensionless/HF paths, link cycles, VDS, sparse/chunked/compressed datasets, and direct/archive parity. Exact-head CI is green and threads are resolved, but the P1 false-clean behavior blocks merge.

…t33-large-keras-h5-streaming-20260610 # Conflicts: # modelaudit/core.py

…t33-large-keras-h5-streaming-20260610

…t33-large-keras-h5-streaming-20260610 # Conflicts: # modelaudit/core.py

…t33-large-keras-h5-streaming-20260610

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aeae89f400

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…t33-large-keras-h5-streaming-20260610

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dda569c5f1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…t33-large-keras-h5-streaming-20260610

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2a49afb026

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08924c1c79

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…t33-large-keras-h5-streaming-20260610

fix: stream large Keras HDF5 inspection

e8da917

github-code-quality Bot found potential problems Jun 11, 2026

View reviewed changes

Comment thread tests/scanners/test_keras_h5_scanner.py Fixed

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/keras_h5_scanner.py

test: use core module import in Keras H5 tests

9ece3b1

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/keras_h5_scanner.py

Comment thread modelaudit/scanners/keras_h5_scanner.py

Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated

Comment thread CHANGELOG.md Outdated

Comment thread tests/scanners/test_keras_h5_scanner.py

fix: bound Keras HDF5 weight metadata reads

f9444f6

fix: inspect HDF5 virtual weight sources

14c0248

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/keras_h5_scanner.py

Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated

fix: inspect Keras 3 HDF5 variable roots

f13c36f

fix: bound large Keras H5 saveable traversal

7f2efbf

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated

Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated

Comment thread modelaudit/scanners/keras_h5_scanner.py

fix: bound Keras H5 attribute workers

3079556

github-code-quality Bot found potential problems Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/keras_h5_scanner.py Fixed

chore: address HDF5 worker review nit

b11d62a

test: pin Whisper large H5 QA fixture

91bcec7

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/keras_h5_scanner.py

Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated

fix: close Keras H5 review gaps

0e957da

mldangelo-oai added 6 commits June 12, 2026 01:08

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

5e35467

…t33-large-keras-h5-streaming-20260610 # Conflicts: # modelaudit/core.py

test: cover streaming Keras H5 hash deferral

c18ba16

fix: inspect Keras 3 vars with model config

bb46735

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

845ab10

…t33-large-keras-h5-streaming-20260610

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

127a4e1

…t33-large-keras-h5-streaming-20260610 # Conflicts: # modelaudit/core.py

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

aeae89f

…t33-large-keras-h5-streaming-20260610

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread modelaudit/core.py

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

dda569c

…t33-large-keras-h5-streaming-20260610

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated

Comment thread modelaudit/scanners/keras_h5_scanner.py Outdated

mldangelo-oai added 4 commits June 12, 2026 05:01

fix: allow descriptor-bound owner scans for large hdf5

6b8eb5a

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

abdd2fd

…t33-large-keras-h5-streaming-20260610

fix: bound keras h5 virtual source inspection

522462c

fix: handle empty hdf5 name attributes in worker

2a49afb

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread modelaudit/scanners/keras_h5_scanner.py

fix: traverse soft-linked hdf5 weight groups

08924c1

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread modelaudit/scanners/keras_h5_scanner.py

mldangelo-oai added 2 commits June 12, 2026 07:10

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

c5baf8a

…t33-large-keras-h5-streaming-20260610

fix: detect keras3 root h5 vars layouts

50fa81a

mldangelo-oai merged commit 0ff7b62 into main Jun 12, 2026
29 checks passed

mldangelo-oai deleted the mdangelo/codex/hf-fp-t33-large-keras-h5-streaming-20260610 branch June 12, 2026 08:03

Conversation

mldangelo-oai commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Security Tradeoffs

Pinned Real-Model QA

Malicious And Malformed Controls

Validation

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

PR #1663 independent re-review

Merge disposition

Review identity and live state

Findings

P1 — Keras 3 /vars and /optimizer/vars external references still pass clean

P1 — Oversized name metadata remains native-memory-unbounded

P2 — Same-file VDS is falsely reported as CVE-2026-1669

P3 validation gap — pinned real Hugging Face QA is still incomplete

Prior-blocker reconciliation

Review-thread reconciliation

Exact-head validation

Static checks

Focused tests

Adversarial end-to-end probes

CI snapshot

Merge recommendation

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading