docs: document the backup/restore ZIP archive format by irfanuddinahmad · Pull Request #604 · openedx/openedx-core

irfanuddinahmad · 2026-06-08T15:18:54Z

Summary

Adds docs/openedx_content/backup_restore.rst — a full reference page for the TOML-based ZIP format produced by lp_dump / create_zip_file and consumed by lp_load / load_learning_package.
Covers archive layout, all TOML file schemas with field-level descriptions, annotated examples drawn from the test fixtures, XBlock XML placement, and quick-start snippets for both management commands and the Python API.
Links the new page from docs/openedx_content/index.rst.

Closes #492

Test plan

cd docs && make html (or make dirhtml) — confirms RST renders without Sphinx warnings
Spot-check that the TOML examples match the test fixtures under tests/openedx_content/applets/backup_restore/fixtures/library_backup/
Run lp_dump on a real library and compare the output ZIP layout to the documented structure

🤖 Generated with Claude Code

Adds a reference page describing the TOML-based ZIP format produced by `create_zip_file` / `lp_dump` and consumed by `load_learning_package` / `lp_load`. Covers the full archive layout, every TOML file schema with field-level descriptions and annotated examples drawn from the test fixtures, the XBlock XML placement convention, and quick-start usage snippets for both the management commands and the Python API. Closes openedx#492 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

openedx-webhooks · 2026-06-08T15:19:01Z

Thanks for the pull request, @irfanuddinahmad!

This repository is currently maintained by @axim-engineering.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
- This process (including the steps you'll need to take) is documented here.
If it doesn't, simply proceed with the next step.

🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

Dependencies

This PR must be merged before / after / at the same time as ...
Blockers

This PR is waiting for OEP-1234 to be accepted.
Timeline information

This PR must be merged by XX date because ...
Partner information

This is for a course on edx.org.
Supporting documentation
Relevant Open edX discussion forum threads

🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details

Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

The size and impact of the changes that it introduces
The need for product review
Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

Copilot

Pull request overview

This PR adds official documentation for the ZIP-based learning-package backup/restore format used by the backup_restore applet, and links it into the openedx_content docs section so operators and developers can understand and inspect archives produced/consumed by lp_dump / lp_load.

Changes:

Add a new reference page documenting the archive layout and TOML/XML schemas used in backup ZIPs.
Include export/restore quick-start examples for both management commands and the Python API.
Link the new page from the docs/openedx_content index.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
docs/openedx_content/index.rst	Adds the new backup/restore format page to the openedx_content docs toctree.
docs/openedx_content/backup_restore.rst	New documentation page describing the backup ZIP layout and file formats.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Overview: clarify only draft+published versions exported, not full history - origin_server: free-form string, not validated hostname - [learning_package] heading: note key may be overridden, updated not restored - updated field: mark as reference-only, not applied during restore - [entity.published]: always present (empty table with comment when unpublished) - [[version]]: at most 2 entries — draft first, then published if different - Example: fix version order to draft (v5) first, then published (v4) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kdmccormick · 2026-06-24T12:49:53Z

@ormsbee did you say that you had a pending review on this, or should I do a close read?

ormsbee

Thank you for your patience on this review.

ormsbee · 2026-06-09T15:06:23Z

+Backup / Restore Format
+=======================
+
+The ``backup_restore`` applet lets you export a learning package (V2 content


Suggested change

The ``backup_restore`` applet lets you export a learning package (V2 content

The ``backup_restore`` applet lets you back up a learning package (V2 content

We're intentionally trying to use "backup/restore" to distinguish it between incremental import/export functionality that we plan to add in the future.

Updated to use "back up" consistently. That distinction from future incremental import/export will matter.

ormsbee · 2026-06-09T15:09:54Z

+published versions are exported — the full version history is not preserved.
+
+The archive uses `TOML <https://toml.io>`_ for all metadata files and keeps the
+actual XBlock content as XML (the same ``block.xml`` format Studio has always


Suggested change

actual XBlock content as XML (the same ``block.xml`` format Studio has always

component XBlock content as XML (the same OLX format Studio has always

In modulestore, the XML files are not named block.xml. Also, the old XML format is being kept for components (e.g. problems, videos), but not for structural container types like units and subsections.

Also, it's probably worth noting that the naming is different--in courses, each component would be exported with it's block_id as the name of the file. That's usually a machine-generated ID (since that's the default in Split) but sometimes it's a meaningful identifier when authored by hand. For our export format, it the OLX is always block.xml, and it's the metadata in the parent TOML file that gives the identifier.

I'll add a note in the block.xml section clarifying that unlike the old modulestore OLX export (where each component file was named by its block_id), this format always uses block.xml with the identifier recorded in the parent TOML. That should help readers familiar with the old format understand the difference.

Applied — "OLX format" is more precise and the "component" qualifier correctly limits the claim to XBlocks, not structural containers.

ormsbee · 2026-06-30T16:00:25Z

+--------
+
+A backup ZIP is a self-contained snapshot of one learning package.  It captures
+every component, collection, container (sections / subsections / units), and


Suggested change

every component, collection, container (sections / subsections / units), and

every component, collection, container (section / subsection / unit), and

ormsbee · 2026-06-30T16:09:23Z

+Overview
+--------
+
+A backup ZIP is a self-contained snapshot of one learning package.  It captures


We should clarify the difference between a Learning Package and a Library. Namely, that a Library has one and only one Learning Package where it stores its content, but Learning Packages can also stand alone. The restore process creates a temporary Learning Package that can be reviewed by the user, and then later associates that Learning Package with a newly created Library.

The doc was using the two interchangeably — I'll add a note to the Overview explaining: a Library holds exactly one Learning Package; Learning Packages can also exist independently. The restore flow reflects this — it first creates a standalone Learning Package for inspection, then the user associates it with a new Library.

ormsbee · 2026-06-30T16:12:35Z

+   When provided it overrides the ``key`` stored in ``package.toml``, which
+   is useful when importing a library under a new reference.


We should use stronger language here. It's really dangerous to trust the archive for either the package_ref or the user, and callers should explicitly pass those to load_learning_package unless they really, really know what they're doing.

Updated — I'll add an explicit warning that callers should always pass package_ref rather than relying on the key in the archive, since trusting untrusted archive content is a security risk.

ormsbee · 2026-06-30T16:16:11Z

+    title = "Text"
+    version_num = 4
+
+Container entity TOML (``entities/<slug>.toml``)


We should explain what a <slug> is: This is the last part of the entity_ref if there is no collision, but if the last parts of the entity_ref collide (e.g. a Unit and an HTMLBlock that are both "intro"), then a short hash gets appended.

I'll add an explanation: <slug> is derived from the last segment of the entity_ref; when two entities share the same last segment (e.g. a Unit and an HTMLBlock both named "intro"), a short hash is appended to keep filenames unique.

ormsbee · 2026-06-30T17:26:49Z

+XBlock content (``component_versions/v<N>/block.xml``)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Standard XBlock XML, identical to what Studio stores internally.  Static assets


There is a difference in HTMLBlock storage. Namely, we don't currently support storing a separate HTML file, so we inline the HTML with CDATA. In courses, we'd have a tiny XML file for the HTMLBlock that pointed to the HTML file.

This is a limitation of our XBlock serialization, but one I hope we can fix before Willow.

I'll add a caveat noting that HTMLBlock content is currently serialized inline (CDATA in the XML) rather than as a separate .html file, which differs from old course OLX exports. I'll flag it as a known limitation to be addressed.

- Use "back up" consistently to distinguish from future import/export - Fix "OLX format" and "component" qualifier (containers don't use OLX) - Clarify Library vs Learning Package relationship in Overview - Add security warning: always pass package_ref explicitly, don't trust archive - Explain <slug> derivation and hash-collision disambiguation - Note modulestore naming difference (block_id vs block.xml + parent TOML) - Note HTMLBlock CDATA limitation vs separate .html file in old course OLX - Fix singular: section / subsection / unit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

openedx-webhooks added open-source-contribution PR author is not from Axim or 2U core contributor PR author is a Core Contributor (who may or may not have write access to this repo). labels Jun 8, 2026

openedx-webhooks added this to Contributions Jun 8, 2026

github-project-automation Bot moved this to Needs Triage in Contributions Jun 8, 2026

mphilbrick211 moved this from Needs Triage to Ready for Review in Contributions Jun 8, 2026

mphilbrick211 requested a review from ormsbee June 8, 2026 19:19

farhan requested a review from Copilot June 9, 2026 09:17

Copilot started reviewing on behalf of farhan June 9, 2026 09:18 View session

Copilot AI reviewed Jun 9, 2026

View reviewed changes

ormsbee requested changes Jun 30, 2026

View reviewed changes

	The ``backup_restore`` applet lets you export a learning package (V2 content
	The ``backup_restore`` applet lets you back up a learning package (V2 content

	actual XBlock content as XML (the same ``block.xml`` format Studio has always
	component XBlock content as XML (the same OLX format Studio has always

	every component, collection, container (sections / subsections / units), and
	every component, collection, container (section / subsection / unit), and

		When provided it overrides the ``key`` stored in ``package.toml``, which
		is useful when importing a library under a new reference.

Uh oh!

Conversation

irfanuddinahmad commented Jun 8, 2026

Summary

Test plan

Uh oh!

openedx-webhooks commented Jun 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kdmccormick commented Jun 24, 2026

Uh oh!

ormsbee left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants