Summary
Thanks for the great work and for releasing the example data!
While trying to run the training pipeline on the provided sample dataset (OpenDriveLab/EgoHumanoid, example/), I found that the robot subset cannot be loaded by the released training config, while the human subset works out of the box. The two subsets are published with different LeRobot schemas, and the robot one does not match what LerobotG1DataConfig / convert_to_lerobot.py expect.
Since co-training robot + human data is the core contribution, it would be very helpful to be able to use the released robot subset directly.
Details
The training config LerobotG1DataConfig (and convert_to_lerobot.py's G1_CONFIG) expect these feature keys:
observation.images.left
action_delta_eef (12)
delta_height (1)
teleop_navigate (3)
hand_status (2)
example/human (meta/info.json) matches this exactly — fps 12:
observation.images.left [720, 1280, 3]
action_eef (14), action_delta_eef (12), teleop_navigate (3), delta_height (1), hand_status (2)
example/robot (meta/info.json) uses a different / rawer schema — fps 30:
observation.images.ego_view [480, 640, 3] ← not observation.images.left
observation.state (43), action (43)
action.eef (14), observation.eef_state (14)
teleop.navigate_command (3) ← dot-named, not teleop_navigate
teleop.base_height_command (1) ← absolute, not delta_height
observation.img_state_delta (1)
- (no
action_delta_eef, no hand_status)
So the mismatch is at two levels:
- Key naming (
ego_view vs left, teleop.navigate_command vs teleop_navigate, etc.)
- Missing derived fields —
action_delta_eef and hand_status are not present
as columns, and height is provided as an absolute command rather than a delta.
Because of this, only example/human can be passed to LerobotG1DataConfig; the robot subset raises errors during data loading (keys not found).
Why existing scripts don't fix it
Both data_alignment/robot_data_process/merge_data.py and data_alignment/convert_to_lerobot.py take raw episode_*.hdf5 as input, but the released example/robot is already a LeRobot parquet dataset. So the released robot subset can't be re-processed by the existing scripts without writing custom code.
Questions / requests
- Is the
robot subset expected to be usable with the current training config, or was it exported by a different (older/internal) pipeline?
- Could you either:
- re-export
example/robot in the same unified schema as example/human (i.e. the convert_to_lerobot.py output format), or
- release the raw
episode_*.hdf5 for the robot example so it can be run through merge_data.py → convert_to_lerobot.py, or
- provide a small script to repack the released robot parquet into the unified 18-dim schema?
Environment
- Following the README setup (
uv sync), single GPU.
- Data:
hf download OpenDriveLab/EgoHumanoid --repo-type=dataset --local-dir ./data
Thanks in advance!
Summary
Thanks for the great work and for releasing the example data!
While trying to run the training pipeline on the provided sample dataset (
OpenDriveLab/EgoHumanoid,example/), I found that therobotsubset cannot be loaded by the released training config, while thehumansubset works out of the box. The two subsets are published with different LeRobot schemas, and therobotone does not match whatLerobotG1DataConfig/convert_to_lerobot.pyexpect.Since co-training robot + human data is the core contribution, it would be very helpful to be able to use the released
robotsubset directly.Details
The training config
LerobotG1DataConfig(andconvert_to_lerobot.py'sG1_CONFIG) expect these feature keys:observation.images.leftaction_delta_eef(12)delta_height(1)teleop_navigate(3)hand_status(2)example/human(meta/info.json) matches this exactly — fps 12:observation.images.left[720, 1280, 3]action_eef(14),action_delta_eef(12),teleop_navigate(3),delta_height(1),hand_status(2)example/robot(meta/info.json) uses a different / rawer schema — fps 30:observation.images.ego_view[480, 640, 3] ← notobservation.images.leftobservation.state(43),action(43)action.eef(14),observation.eef_state(14)teleop.navigate_command(3) ← dot-named, notteleop_navigateteleop.base_height_command(1) ← absolute, notdelta_heightobservation.img_state_delta(1)action_delta_eef, nohand_status)So the mismatch is at two levels:
ego_viewvsleft,teleop.navigate_commandvsteleop_navigate, etc.)action_delta_eefandhand_statusare not presentas columns, and height is provided as an absolute command rather than a delta.
Because of this, only
example/humancan be passed toLerobotG1DataConfig; therobotsubset raises errors during data loading (keys not found).Why existing scripts don't fix it
Both
data_alignment/robot_data_process/merge_data.pyanddata_alignment/convert_to_lerobot.pytake rawepisode_*.hdf5as input, but the releasedexample/robotis already a LeRobot parquet dataset. So the released robot subset can't be re-processed by the existing scripts without writing custom code.Questions / requests
robotsubset expected to be usable with the current training config, or was it exported by a different (older/internal) pipeline?example/robotin the same unified schema asexample/human(i.e. theconvert_to_lerobot.pyoutput format), orepisode_*.hdf5for the robot example so it can be run throughmerge_data.py→convert_to_lerobot.py, orEnvironment
uv sync), single GPU.hf download OpenDriveLab/EgoHumanoid --repo-type=dataset --local-dir ./dataThanks in advance!