Example robot subset schema does not match LerobotG1DataConfig & convert_to_lerobot.py

### Summary

Thanks for the great work and for releasing the example data!

While trying to run the training pipeline on the provided sample dataset (`OpenDriveLab/EgoHumanoid`, `example/`), I found that the **`robot` subset cannot be loaded by the released training config**, while the **`human` subset works out of the box**. The two subsets are published with different LeRobot schemas, and the `robot` one does not match what `LerobotG1DataConfig` / `convert_to_lerobot.py` expect.

Since co-training robot + human data is the core contribution, it would be very helpful to be able to use the released `robot` subset directly.

### Details

The training config `LerobotG1DataConfig` (and `convert_to_lerobot.py`'s `G1_CONFIG`) expect these feature keys:

- `observation.images.left`
- `action_delta_eef` (12)
- `delta_height` (1)
- `teleop_navigate` (3)
- `hand_status` (2)

**`example/human`** (`meta/info.json`) matches this exactly — fps 12:
- `observation.images.left` [720, 1280, 3]
- `action_eef` (14), `action_delta_eef` (12), `teleop_navigate` (3), `delta_height` (1), `hand_status` (2)

**`example/robot`** (`meta/info.json`) uses a different / rawer schema — fps 30:
- `observation.images.ego_view` [480, 640, 3]   ← not `observation.images.left`
- `observation.state` (43), `action` (43)
- `action.eef` (14), `observation.eef_state` (14)
- `teleop.navigate_command` (3)                  ← dot-named, not `teleop_navigate`
- `teleop.base_height_command` (1)               ← absolute, not `delta_height`
- `observation.img_state_delta` (1)
- (no `action_delta_eef`, no `hand_status`)

So the mismatch is at two levels:
1. **Key naming** (`ego_view` vs `left`, `teleop.navigate_command` vs `teleop_navigate`, etc.)
2. **Missing derived fields** — `action_delta_eef` and `hand_status` are not present
   as columns, and height is provided as an absolute command rather than a delta.

Because of this, only `example/human` can be passed to `LerobotG1DataConfig`; the `robot` subset raises errors during data loading (keys not found).

### Why existing scripts don't fix it

Both `data_alignment/robot_data_process/merge_data.py` and `data_alignment/convert_to_lerobot.py` take **raw `episode_*.hdf5`** as input, but the released `example/robot` is already a **LeRobot parquet** dataset. So the released robot subset can't be re-processed by the existing scripts without writing custom code.

### Questions / requests

1. Is the `robot` subset expected to be usable with the current training config, or was it exported by a different (older/internal) pipeline?
2. Could you either:
   - re-export `example/robot` in the same unified schema as `example/human` (i.e. the `convert_to_lerobot.py` output format), **or**
   - release the **raw `episode_*.hdf5`** for the robot example so it can be run through `merge_data.py` → `convert_to_lerobot.py`, **or**
   - provide a small script to repack the released robot parquet into the unified 18-dim schema?

### Environment

- Following the README setup (`uv sync`), single GPU.
- Data: `hf download OpenDriveLab/EgoHumanoid --repo-type=dataset --local-dir ./data`

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Example robot subset schema does not match LerobotG1DataConfig & convert_to_lerobot.py #4

Summary

Details

Why existing scripts don't fix it

Questions / requests

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Example robot subset schema does not match LerobotG1DataConfig & convert_to_lerobot.py #4

Description

Summary

Details

Why existing scripts don't fix it

Questions / requests

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions