Skip to content

Example robot subset schema does not match LerobotG1DataConfig & convert_to_lerobot.py #4

Description

@dnjsxor999

Summary

Thanks for the great work and for releasing the example data!

While trying to run the training pipeline on the provided sample dataset (OpenDriveLab/EgoHumanoid, example/), I found that the robot subset cannot be loaded by the released training config, while the human subset works out of the box. The two subsets are published with different LeRobot schemas, and the robot one does not match what LerobotG1DataConfig / convert_to_lerobot.py expect.

Since co-training robot + human data is the core contribution, it would be very helpful to be able to use the released robot subset directly.

Details

The training config LerobotG1DataConfig (and convert_to_lerobot.py's G1_CONFIG) expect these feature keys:

  • observation.images.left
  • action_delta_eef (12)
  • delta_height (1)
  • teleop_navigate (3)
  • hand_status (2)

example/human (meta/info.json) matches this exactly — fps 12:

  • observation.images.left [720, 1280, 3]
  • action_eef (14), action_delta_eef (12), teleop_navigate (3), delta_height (1), hand_status (2)

example/robot (meta/info.json) uses a different / rawer schema — fps 30:

  • observation.images.ego_view [480, 640, 3] ← not observation.images.left
  • observation.state (43), action (43)
  • action.eef (14), observation.eef_state (14)
  • teleop.navigate_command (3) ← dot-named, not teleop_navigate
  • teleop.base_height_command (1) ← absolute, not delta_height
  • observation.img_state_delta (1)
  • (no action_delta_eef, no hand_status)

So the mismatch is at two levels:

  1. Key naming (ego_view vs left, teleop.navigate_command vs teleop_navigate, etc.)
  2. Missing derived fieldsaction_delta_eef and hand_status are not present
    as columns, and height is provided as an absolute command rather than a delta.

Because of this, only example/human can be passed to LerobotG1DataConfig; the robot subset raises errors during data loading (keys not found).

Why existing scripts don't fix it

Both data_alignment/robot_data_process/merge_data.py and data_alignment/convert_to_lerobot.py take raw episode_*.hdf5 as input, but the released example/robot is already a LeRobot parquet dataset. So the released robot subset can't be re-processed by the existing scripts without writing custom code.

Questions / requests

  1. Is the robot subset expected to be usable with the current training config, or was it exported by a different (older/internal) pipeline?
  2. Could you either:
    • re-export example/robot in the same unified schema as example/human (i.e. the convert_to_lerobot.py output format), or
    • release the raw episode_*.hdf5 for the robot example so it can be run through merge_data.pyconvert_to_lerobot.py, or
    • provide a small script to repack the released robot parquet into the unified 18-dim schema?

Environment

  • Following the README setup (uv sync), single GPU.
  • Data: hf download OpenDriveLab/EgoHumanoid --repo-type=dataset --local-dir ./data

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions