Skip to content

fix(serialization): remove monty dependency#979

Open
njzjz wants to merge 1 commit into
deepmodeling:masterfrom
njzjz:fix/remove-monty-dependency
Open

fix(serialization): remove monty dependency#979
njzjz wants to merge 1 commit into
deepmodeling:masterfrom
njzjz:fix/remove-monty-dependency

Conversation

@njzjz

@njzjz njzjz commented Jun 19, 2026

Copy link
Copy Markdown
Member

Summary

  • replace monty serialization usage with an internal dpdata.serialization module
  • keep loading existing monty-style JSON numpy arrays and as_dict/from_dict objects
  • remove monty from project/docs dependency declarations and update JSON round-trip tests

Tests

  • ruff check dpdata/ tests/test_json.py tests/test_to_pymatgen_entry.py
  • cd tests && python -m unittest test_json.py test_to_pymatgen_entry.py

Summary by CodeRabbit

  • New Features

    • Added native JSON, YAML, and msgpack serialization support with automatic format detection and transparent gzip/bzip2 compression.
  • Chores

    • Removed monty as a project dependency; serialization functionality is now built-in to the package.

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. dependencies Pull requests that update a dependency file dpdata labels Jun 19, 2026
@codspeed-hq

codspeed-hq Bot commented Jun 19, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 2 untouched benchmarks


Comparing njzjz:fix/remove-monty-dependency (e84be01) with master (1b63c9b)

Open in CodSpeed

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@njzjz, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 2 minutes and 5 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 68ff05f0-b6cc-4dbf-985a-45b8c41a7b26

📥 Commits

Reviewing files that changed from the base of the PR and between 7a6f287 and e84be01.

📒 Files selected for processing (8)
  • AGENTS.md
  • docs/conf.py
  • docs/environment.yml
  • dpdata/serialization.py
  • dpdata/system.py
  • pyproject.toml
  • tests/test_json.py
  • tests/test_to_pymatgen_entry.py
📝 Walkthrough

Walkthrough

Removes monty as a runtime dependency by introducing a new dpdata/serialization.py module that re-implements monty-style JSON/YAML/msgpack serialization (format detection, encoding of NumPy arrays/datetime/UUID/Path/Enum/as_dict objects, and @module/@class-based reconstruction). System.dump, System.load, and System.from_dict are rewired to use dpdata.serialization. All references to monty are removed from pyproject.toml, AGENTS.md, docs/conf.py, and docs/environment.yml.

Changes

Remove monty dependency: introduce dpdata.serialization

Layer / File(s) Summary
New dpdata/serialization.py module
dpdata/serialization.py
Adds format detection and gzip/bzip2-aware file-open helpers; YAML dump/load wrappers (PyYAML preferred, ruamel.yaml fallback); to_serializable encoder for NumPy, datetime, UUID, Path, Enum, and as_dict objects into monty-style payloads; process_decoded recursive decoder reconstructing @module/@class entries via dynamic import and from_dict; and dumpfn/loadfn public entry points.
System wired to dpdata.serialization
dpdata/system.py
System.dump and System.load switch their dumpfn/loadfn imports from monty.serialization to dpdata.serialization; System.from_dict replaces MontyDecoder().process_decoded with dpdata.serialization.process_decoded while keeping the @-key filter.
Test coverage
tests/test_json.py, tests/test_to_pymatgen_entry.py
Adds TestJsonDumpLoad exercising round-trip dump/load of a LabeledSystem to a temp JSON file; updates test_to_pymatgen_entry.py to import loadfn from dpdata.serialization instead of monty.serialization.
Dependency and config cleanup
pyproject.toml, AGENTS.md, docs/conf.py, docs/environment.yml
Removes monty from runtime dependencies, Ruff banned-imports config, AGENTS.md install/troubleshooting instructions, Sphinx intersphinx mapping, and conda environment file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix(serialization): remove monty dependency' directly and accurately describes the main objective: removing the monty dependency by implementing internal serialization.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
dpdata/serialization.py (1)

27-34: Add return type annotation to _open_text.

The function lacks a return type hint, which may cause type checkers to infer imprecise union types for gzip.open/bz2.open even though all code paths return text I/O objects compatible with json.dump and json.load.

Proposed fix
-from typing import Any
+from typing import Any, TextIO, cast
@@
-def _open_text(filename: str | Path, mode: str):
+def _open_text(filename: str | Path, mode: str) -> TextIO:
     path = str(filename)
     lower_path = path.lower()
     if lower_path.endswith((".gz", ".z")):
-        return gzip.open(path, mode, encoding="utf-8")
+        return cast(TextIO, gzip.open(path, mode, encoding="utf-8"))
     if lower_path.endswith(".bz2"):
-        return bz2.open(path, mode, encoding="utf-8")
+        return cast(TextIO, bz2.open(path, mode, encoding="utf-8"))
     return open(path, mode, encoding="utf-8")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dpdata/serialization.py` around lines 27 - 34, The _open_text function is
missing a return type annotation which prevents type checkers from accurately
inferring the type. Add a return type hint to the function signature after the
mode parameter by specifying the appropriate return type that represents a text
I/O object (such as TextIO from the typing module) since all three code
paths—gzip.open, bz2.open, and the built-in open function—all return compatible
text I/O objects when called with encoding="utf-8".

Sources: Linters/SAST tools, Pipeline failures

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@AGENTS.md`:
- Line 135: The individual package installation example for troubleshooting is
incomplete and missing required core dependencies. Update the `uv pip install`
command that currently lists numpy scipy h5py wcmatch to also include lmdb and
msgpack-numpy at the end of the package list. This ensures users following the
troubleshooting step have all necessary core dependencies installed for a
complete setup.
- Line 96: The Core dependencies section in AGENTS.md is incomplete and missing
two dependencies that are listed in pyproject.toml. Update the Core line that
currently reads "Core: numpy>=1.14.3, scipy, h5py, wcmatch" to include lmdb and
msgpack-numpy in the comma-separated dependency list so it matches the complete
set of core dependencies defined in pyproject.toml.
- Line 12: The documentation comment on the `uv pip install -e .` line in
AGENTS.md is incomplete and does not match the actual core dependencies declared
in pyproject.toml. Update the inline comment that lists the core dependencies to
include all six dependencies: numpy, scipy, h5py, wcmatch, lmdb, and
msgpack-numpy. Ensure the comment accurately reflects what is actually installed
by the development mode installation.

In `@dpdata/serialization.py`:
- Around line 149-154: The datetime deserialization logic in the
datetime.datetime class handler is losing timezone information by using
split("+")[0] which strips positive UTC offsets and causes failures on negative
offsets. Replace the current approach with
datetime.datetime.fromisoformat(obj["string"]) as the primary decoder to
properly preserve timezone data, and keep the existing strptime calls as
fallback for backward compatibility with older formats. This ensures
round-tripping of timezone-aware datetimes without converting them to naive
datetimes.

---

Nitpick comments:
In `@dpdata/serialization.py`:
- Around line 27-34: The _open_text function is missing a return type annotation
which prevents type checkers from accurately inferring the type. Add a return
type hint to the function signature after the mode parameter by specifying the
appropriate return type that represents a text I/O object (such as TextIO from
the typing module) since all three code paths—gzip.open, bz2.open, and the
built-in open function—all return compatible text I/O objects when called with
encoding="utf-8".
🪄 Autofix (Beta)

❌ Autofix failed (check again to retry)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1ccd9ec6-d46e-43ed-b7ca-eaa866d12722

📥 Commits

Reviewing files that changed from the base of the PR and between 1b63c9b and 7a6f287.

📒 Files selected for processing (8)
  • AGENTS.md
  • docs/conf.py
  • docs/environment.yml
  • dpdata/serialization.py
  • dpdata/system.py
  • pyproject.toml
  • tests/test_json.py
  • tests/test_to_pymatgen_entry.py
💤 Files with no reviewable changes (3)
  • docs/environment.yml
  • docs/conf.py
  • pyproject.toml

Comment thread AGENTS.md
Comment thread AGENTS.md
Comment thread AGENTS.md
Comment thread dpdata/serialization.py
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 50.00000% with 85 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.01%. Comparing base (1b63c9b) to head (e84be01).

Files with missing lines Patch % Lines
dpdata/serialization.py 49.10% 85 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #979      +/-   ##
==========================================
- Coverage   86.75%   86.01%   -0.74%     
==========================================
  Files          89       90       +1     
  Lines        8093     8260     +167     
==========================================
+ Hits         7021     7105      +84     
- Misses       1072     1155      +83     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@njzjz njzjz force-pushed the fix/remove-monty-dependency branch from 7a6f287 to e84be01 Compare June 19, 2026 17:53
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

An unexpected error occurred while generating fixes: Not Found - https://docs.github.com/rest/git/refs#get-a-reference

@njzjz njzjz requested a review from wanghan-iapcm June 19, 2026 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file dpdata size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant