Skip to content

docs: document the backup/restore ZIP archive format#604

Open
irfanuddinahmad wants to merge 3 commits into
openedx:mainfrom
irfanuddinahmad:irfanuddinahmad/document-backup-restore-format-492
Open

docs: document the backup/restore ZIP archive format#604
irfanuddinahmad wants to merge 3 commits into
openedx:mainfrom
irfanuddinahmad:irfanuddinahmad/document-backup-restore-format-492

Conversation

@irfanuddinahmad

Copy link
Copy Markdown
Contributor

Summary

  • Adds docs/openedx_content/backup_restore.rst — a full reference page for the TOML-based ZIP format produced by lp_dump / create_zip_file and consumed by lp_load / load_learning_package.
  • Covers archive layout, all TOML file schemas with field-level descriptions, annotated examples drawn from the test fixtures, XBlock XML placement, and quick-start snippets for both management commands and the Python API.
  • Links the new page from docs/openedx_content/index.rst.

Closes #492

Test plan

  • cd docs && make html (or make dirhtml) — confirms RST renders without Sphinx warnings
  • Spot-check that the TOML examples match the test fixtures under tests/openedx_content/applets/backup_restore/fixtures/library_backup/
  • Run lp_dump on a real library and compare the output ZIP layout to the documented structure

🤖 Generated with Claude Code

Adds a reference page describing the TOML-based ZIP format produced by
`create_zip_file` / `lp_dump` and consumed by `load_learning_package` /
`lp_load`.  Covers the full archive layout, every TOML file schema with
field-level descriptions and annotated examples drawn from the test
fixtures, the XBlock XML placement convention, and quick-start usage
snippets for both the management commands and the Python API.

Closes openedx#492

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@openedx-webhooks openedx-webhooks added open-source-contribution PR author is not from Axim or 2U core contributor PR author is a Core Contributor (who may or may not have write access to this repo). labels Jun 8, 2026
@openedx-webhooks

Copy link
Copy Markdown

Thanks for the pull request, @irfanuddinahmad!

This repository is currently maintained by @axim-engineering.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@github-project-automation github-project-automation Bot moved this to Needs Triage in Contributions Jun 8, 2026
@mphilbrick211 mphilbrick211 moved this from Needs Triage to Ready for Review in Contributions Jun 8, 2026
@mphilbrick211 mphilbrick211 requested a review from ormsbee June 8, 2026 19:19
@farhan farhan requested a review from Copilot June 9, 2026 09:17

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds official documentation for the ZIP-based learning-package backup/restore format used by the backup_restore applet, and links it into the openedx_content docs section so operators and developers can understand and inspect archives produced/consumed by lp_dump / lp_load.

Changes:

  • Add a new reference page documenting the archive layout and TOML/XML schemas used in backup ZIPs.
  • Include export/restore quick-start examples for both management commands and the Python API.
  • Link the new page from the docs/openedx_content index.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
docs/openedx_content/index.rst Adds the new backup/restore format page to the openedx_content docs toctree.
docs/openedx_content/backup_restore.rst New documentation page describing the backup ZIP layout and file formats.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/openedx_content/backup_restore.rst Outdated
Comment thread docs/openedx_content/backup_restore.rst Outdated
Comment thread docs/openedx_content/backup_restore.rst Outdated
Comment thread docs/openedx_content/backup_restore.rst Outdated
Comment thread docs/openedx_content/backup_restore.rst Outdated
Comment thread docs/openedx_content/backup_restore.rst Outdated
Comment thread docs/openedx_content/backup_restore.rst
- Overview: clarify only draft+published versions exported, not full history
- origin_server: free-form string, not validated hostname
- [learning_package] heading: note key may be overridden, updated not restored
- updated field: mark as reference-only, not applied during restore
- [entity.published]: always present (empty table with comment when unpublished)
- [[version]]: at most 2 entries — draft first, then published if different
- Example: fix version order to draft (v5) first, then published (v4)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kdmccormick

Copy link
Copy Markdown
Member

@ormsbee did you say that you had a pending review on this, or should I do a close read?

@ormsbee ormsbee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patience on this review.

Comment thread docs/openedx_content/backup_restore.rst Outdated
Backup / Restore Format
=======================

The ``backup_restore`` applet lets you export a learning package (V2 content

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The ``backup_restore`` applet lets you export a learning package (V2 content
The ``backup_restore`` applet lets you back up a learning package (V2 content

We're intentionally trying to use "backup/restore" to distinguish it between incremental import/export functionality that we plan to add in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use "back up" consistently. That distinction from future incremental import/export will matter.

Comment thread docs/openedx_content/backup_restore.rst Outdated
published versions are exported — the full version history is not preserved.

The archive uses `TOML <https://toml.io>`_ for all metadata files and keeps the
actual XBlock content as XML (the same ``block.xml`` format Studio has always

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
actual XBlock content as XML (the same ``block.xml`` format Studio has always
component XBlock content as XML (the same OLX format Studio has always

In modulestore, the XML files are not named block.xml. Also, the old XML format is being kept for components (e.g. problems, videos), but not for structural container types like units and subsections.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's probably worth noting that the naming is different--in courses, each component would be exported with it's block_id as the name of the file. That's usually a machine-generated ID (since that's the default in Split) but sometimes it's a meaningful identifier when authored by hand. For our export format, it the OLX is always block.xml, and it's the metadata in the parent TOML file that gives the identifier.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a note in the block.xml section clarifying that unlike the old modulestore OLX export (where each component file was named by its block_id), this format always uses block.xml with the identifier recorded in the parent TOML. That should help readers familiar with the old format understand the difference.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied — "OLX format" is more precise and the "component" qualifier correctly limits the claim to XBlocks, not structural containers.

Comment thread docs/openedx_content/backup_restore.rst Outdated
--------

A backup ZIP is a self-contained snapshot of one learning package. It captures
every component, collection, container (sections / subsections / units), and

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
every component, collection, container (sections / subsections / units), and
every component, collection, container (section / subsection / unit), and

Overview
--------

A backup ZIP is a self-contained snapshot of one learning package. It captures

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clarify the difference between a Learning Package and a Library. Namely, that a Library has one and only one Learning Package where it stores its content, but Learning Packages can also stand alone. The restore process creates a temporary Learning Package that can be reviewed by the user, and then later associates that Learning Package with a newly created Library.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc was using the two interchangeably — I'll add a note to the Overview explaining: a Library holds exactly one Learning Package; Learning Packages can also exist independently. The restore flow reflects this — it first creates a standalone Learning Package for inspection, then the user associates it with a new Library.

Comment on lines +69 to +70
When provided it overrides the ``key`` stored in ``package.toml``, which
is useful when importing a library under a new reference.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use stronger language here. It's really dangerous to trust the archive for either the package_ref or the user, and callers should explicitly pass those to load_learning_package unless they really, really know what they're doing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated — I'll add an explicit warning that callers should always pass package_ref rather than relying on the key in the archive, since trusting untrusted archive content is a security risk.

title = "Text"
version_num = 4

Container entity TOML (``entities/<slug>.toml``)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should explain what a <slug> is: This is the last part of the entity_ref if there is no collision, but if the last parts of the entity_ref collide (e.g. a Unit and an HTMLBlock that are both "intro"), then a short hash gets appended.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add an explanation: <slug> is derived from the last segment of the entity_ref; when two entities share the same last segment (e.g. a Unit and an HTMLBlock both named "intro"), a short hash is appended to keep filenames unique.

Comment thread docs/openedx_content/backup_restore.rst Outdated
XBlock content (``component_versions/v<N>/block.xml``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Standard XBlock XML, identical to what Studio stores internally. Static assets

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference in HTMLBlock storage. Namely, we don't currently support storing a separate HTML file, so we inline the HTML with CDATA. In courses, we'd have a tiny XML file for the HTMLBlock that pointed to the HTML file.

This is a limitation of our XBlock serialization, but one I hope we can fix before Willow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a caveat noting that HTMLBlock content is currently serialized inline (CDATA in the XML) rather than as a separate .html file, which differs from old course OLX exports. I'll flag it as a known limitation to be addressed.

- Use "back up" consistently to distinguish from future import/export
- Fix "OLX format" and "component" qualifier (containers don't use OLX)
- Clarify Library vs Learning Package relationship in Overview
- Add security warning: always pass package_ref explicitly, don't trust archive
- Explain <slug> derivation and hash-collision disambiguation
- Note modulestore naming difference (block_id vs block.xml + parent TOML)
- Note HTMLBlock CDATA limitation vs separate .html file in old course OLX
- Fix singular: section / subsection / unit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core contributor PR author is a Core Contributor (who may or may not have write access to this repo). open-source-contribution PR author is not from Axim or 2U

Projects

Status: Ready for Review

Development

Successfully merging this pull request may close these issues.

Document backup/restore format

6 participants