fix(serialization): remove monty dependency by njzjz · Pull Request #979 · deepmodeling/dpdata

njzjz · 2026-06-19T17:41:23Z

Summary

replace monty serialization usage with an internal dpdata.serialization module
keep loading existing monty-style JSON numpy arrays and as_dict/from_dict objects
remove monty from project/docs dependency declarations and update JSON round-trip tests

Tests

ruff check dpdata/ tests/test_json.py tests/test_to_pymatgen_entry.py
cd tests && python -m unittest test_json.py test_to_pymatgen_entry.py

Summary by CodeRabbit

New Features
- Added native JSON, YAML, and msgpack serialization support with automatic format detection and transparent gzip/bzip2 compression.
Chores
- Removed monty as a project dependency; serialization functionality is now built-in to the package.

codspeed-hq · 2026-06-19T17:43:41Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 2 untouched benchmarks

_{Comparing njzjz:fix/remove-monty-dependency (e84be01) with master (1b63c9b)}

coderabbitai · 2026-06-19T17:49:47Z

Warning

Review limit reached

@njzjz, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 2 minutes and 5 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 68ff05f0-b6cc-4dbf-985a-45b8c41a7b26

📥 Commits

Reviewing files that changed from the base of the PR and between 7a6f287 and e84be01.

📒 Files selected for processing (8)

AGENTS.md
docs/conf.py
docs/environment.yml
dpdata/serialization.py
dpdata/system.py
pyproject.toml
tests/test_json.py
tests/test_to_pymatgen_entry.py

📝 Walkthrough

Walkthrough

Removes monty as a runtime dependency by introducing a new dpdata/serialization.py module that re-implements monty-style JSON/YAML/msgpack serialization (format detection, encoding of NumPy arrays/datetime/UUID/Path/Enum/as_dict objects, and @module/@class-based reconstruction). System.dump, System.load, and System.from_dict are rewired to use dpdata.serialization. All references to monty are removed from pyproject.toml, AGENTS.md, docs/conf.py, and docs/environment.yml.

Changes

Remove monty dependency: introduce dpdata.serialization

Layer / File(s)	Summary
New `dpdata/serialization.py` module `dpdata/serialization.py`	Adds format detection and gzip/bzip2-aware file-open helpers; YAML dump/load wrappers (PyYAML preferred, ruamel.yaml fallback); `to_serializable` encoder for NumPy, datetime, UUID, Path, Enum, and `as_dict` objects into monty-style payloads; `process_decoded` recursive decoder reconstructing `@module`/`@class` entries via dynamic import and `from_dict`; and `dumpfn`/`loadfn` public entry points.
`System` wired to `dpdata.serialization` `dpdata/system.py`	`System.dump` and `System.load` switch their `dumpfn`/`loadfn` imports from `monty.serialization` to `dpdata.serialization`; `System.from_dict` replaces `MontyDecoder().process_decoded` with `dpdata.serialization.process_decoded` while keeping the `@`-key filter.
Test coverage `tests/test_json.py`, `tests/test_to_pymatgen_entry.py`	Adds `TestJsonDumpLoad` exercising round-trip dump/load of a `LabeledSystem` to a temp JSON file; updates `test_to_pymatgen_entry.py` to import `loadfn` from `dpdata.serialization` instead of `monty.serialization`.
Dependency and config cleanup `pyproject.toml`, `AGENTS.md`, `docs/conf.py`, `docs/environment.yml`	Removes `monty` from runtime dependencies, Ruff banned-imports config, AGENTS.md install/troubleshooting instructions, Sphinx intersphinx mapping, and conda environment file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.18% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix(serialization): remove monty dependency' directly and accurately describes the main objective: removing the monty dependency by implementing internal serialization.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

dpdata/serialization.py (1)

27-34: Add return type annotation to _open_text.

The function lacks a return type hint, which may cause type checkers to infer imprecise union types for gzip.open/bz2.open even though all code paths return text I/O objects compatible with json.dump and json.load.

Proposed fix

-from typing import Any
+from typing import Any, TextIO, cast
@@
-def _open_text(filename: str | Path, mode: str):
+def _open_text(filename: str | Path, mode: str) -> TextIO:
     path = str(filename)
     lower_path = path.lower()
     if lower_path.endswith((".gz", ".z")):
-        return gzip.open(path, mode, encoding="utf-8")
+        return cast(TextIO, gzip.open(path, mode, encoding="utf-8"))
     if lower_path.endswith(".bz2"):
-        return bz2.open(path, mode, encoding="utf-8")
+        return cast(TextIO, bz2.open(path, mode, encoding="utf-8"))
     return open(path, mode, encoding="utf-8")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dpdata/serialization.py` around lines 27 - 34, The _open_text function is
missing a return type annotation which prevents type checkers from accurately
inferring the type. Add a return type hint to the function signature after the
mode parameter by specifying the appropriate return type that represents a text
I/O object (such as TextIO from the typing module) since all three code
paths—gzip.open, bz2.open, and the built-in open function—all return compatible
text I/O objects when called with encoding="utf-8".

Sources: Linters/SAST tools, Pipeline failures

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@AGENTS.md`:
- Line 135: The individual package installation example for troubleshooting is
incomplete and missing required core dependencies. Update the `uv pip install`
command that currently lists numpy scipy h5py wcmatch to also include lmdb and
msgpack-numpy at the end of the package list. This ensures users following the
troubleshooting step have all necessary core dependencies installed for a
complete setup.
- Line 96: The Core dependencies section in AGENTS.md is incomplete and missing
two dependencies that are listed in pyproject.toml. Update the Core line that
currently reads "Core: numpy>=1.14.3, scipy, h5py, wcmatch" to include lmdb and
msgpack-numpy in the comma-separated dependency list so it matches the complete
set of core dependencies defined in pyproject.toml.
- Line 12: The documentation comment on the `uv pip install -e .` line in
AGENTS.md is incomplete and does not match the actual core dependencies declared
in pyproject.toml. Update the inline comment that lists the core dependencies to
include all six dependencies: numpy, scipy, h5py, wcmatch, lmdb, and
msgpack-numpy. Ensure the comment accurately reflects what is actually installed
by the development mode installation.

In `@dpdata/serialization.py`:
- Around line 149-154: The datetime deserialization logic in the
datetime.datetime class handler is losing timezone information by using
split("+")[0] which strips positive UTC offsets and causes failures on negative
offsets. Replace the current approach with
datetime.datetime.fromisoformat(obj["string"]) as the primary decoder to
properly preserve timezone data, and keep the existing strptime calls as
fallback for backward compatibility with older formats. This ensures
round-tripping of timezone-aware datetimes without converting them to naive
datetimes.

---

Nitpick comments:
In `@dpdata/serialization.py`:
- Around line 27-34: The _open_text function is missing a return type annotation
which prevents type checkers from accurately inferring the type. Add a return
type hint to the function signature after the mode parameter by specifying the
appropriate return type that represents a text I/O object (such as TextIO from
the typing module) since all three code paths—gzip.open, bz2.open, and the
built-in open function—all return compatible text I/O objects when called with
encoding="utf-8".

🪄 Autofix (Beta)

❌ Autofix failed (check again to retry)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1ccd9ec6-d46e-43ed-b7ca-eaa866d12722

📥 Commits

Reviewing files that changed from the base of the PR and between 1b63c9b and 7a6f287.

📒 Files selected for processing (8)

AGENTS.md
docs/conf.py
docs/environment.yml
dpdata/serialization.py
dpdata/system.py
pyproject.toml
tests/test_json.py
tests/test_to_pymatgen_entry.py

💤 Files with no reviewable changes (3)

docs/environment.yml
docs/conf.py
pyproject.toml

codecov · 2026-06-19T17:50:18Z

Codecov Report

❌ Patch coverage is 50.00000% with 85 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.01%. Comparing base (1b63c9b) to head (e84be01).

Files with missing lines	Patch %	Lines
dpdata/serialization.py	49.10%	85 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #979      +/-   ##
==========================================
- Coverage   86.75%   86.01%   -0.74%     
==========================================
  Files          89       90       +1     
  Lines        8093     8260     +167     
==========================================
+ Hits         7021     7105      +84     
- Misses       1072     1155      +83

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai · 2026-06-19T18:12:45Z

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

An unexpected error occurred while generating fixes: Not Found - https://docs.github.com/rest/git/refs#get-a-reference

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. dependencies Pull requests that update a dependency file dpdata labels Jun 19, 2026

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread AGENTS.md

Comment thread AGENTS.md

Comment thread AGENTS.md

Comment thread dpdata/serialization.py

fix(serialization): remove monty dependency

e84be01

njzjz force-pushed the fix/remove-monty-dependency branch from 7a6f287 to e84be01 Compare June 19, 2026 17:53

njzjz requested a review from wanghan-iapcm June 19, 2026 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(serialization): remove monty dependency#979

fix(serialization): remove monty dependency#979
njzjz wants to merge 1 commit into
deepmodeling:masterfrom
njzjz:fix/remove-monty-dependency

njzjz commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

codspeed-hq Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

njzjz commented Jun 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Summary by CodeRabbit

Uh oh!

codspeed-hq Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

njzjz commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

codspeed-hq Bot commented Jun 19, 2026 •

edited

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

coderabbitai Bot left a comment •

edited

Loading

codecov Bot commented Jun 19, 2026 •

edited

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading