Sample dataset

This is what a dataset drop looks like. Structure, not content—we're showing you the format, not giving away the data.

Folder layout

Each drop is a versioned directory with consistent structure.

dataset_v0.1/
dataset_v0.1/
├── manifest.jsonl
├── splits/
│   ├── train.txt
│   └── val.txt
├── episodes/
│   ├── ep_000001/
│   │   ├── instruction.txt
│   │   ├── outcome.json
│   │   ├── cam_front.mp4
│   │   ├── cam_left.mp4
│   │   ├── cam_wrist.mp4
│   │   ├── robot_state.parquet
│   │   ├── robot_action.parquet
│   │   └── calibration.json
│   └── ep_000002/
│       └── ...
├── qa/
│   ├── qa_report.csv
│   └── rejected_episodes.csv
└── tools/
    └── loader.py

Manifest

Each line is one episode. Your loader can stream this.

manifest.jsonl
{"episode_id": "ep_000001", "task": "pick_mug", "outcome": "success", "duration_sec": 14.2}
{"episode_id": "ep_000002", "task": "pick_mug", "outcome": "success", "duration_sec": 12.8}
{"episode_id": "ep_000003", "task": "drawer_open", "outcome": "failure", "failure_reason": "gripper_slip"}

QA report

Every drop includes a QA report. Full transparency about what was rejected and why.

Metric	Value	Threshold	Status
Total episodes collected	847	—	—
Accepted	812	—	—
Rejected	35	—	—
Rejection rate	4.1%	<10%	Pass
Avg frame drop rate	0.3%	<1%	Pass
Max timestamp jitter	8.2ms	<20ms	Pass

Loader script

We include a loader so your team can start training immediately.

tools/loader.py
from pathlib import Path
import json, pandas as pd

class MotionLedgerDataset:
    def __init__(self, path: str):
        self.root = Path(path)
        with open(self.root / "manifest.jsonl") as f:
            self.manifest = [json.loads(l) for l in f]

    def __len__(self):
        return len(self.manifest)

    def __getitem__(self, idx):
        ep = self.manifest[idx]
        ep_path = self.root / "episodes" / ep["episode_id"]
        return {
            "instruction": (ep_path / "instruction.txt").read_text(),
            "state": pd.read_parquet(ep_path / "robot_state.parquet"),
            "action": pd.read_parquet(ep_path / "robot_action.parquet"),
            "outcome": ep["outcome"],
        }

# Usage
dataset = MotionLedgerDataset("./dataset_v0.1")
print(f"Episodes: {len(dataset)}")

Get a sample pack

Send us your spec and we'll build a sample pack in your exact format.

Request sample pack →