Sample dataset
This is what a dataset drop looks like. Structure, not content—we're showing you the format, not giving away the data.
Folder layout
Each drop is a versioned directory with consistent structure.
dataset_v0.1/
dataset_v0.1/
├── manifest.jsonl
├── splits/
│ ├── train.txt
│ └── val.txt
├── episodes/
│ ├── ep_000001/
│ │ ├── instruction.txt
│ │ ├── outcome.json
│ │ ├── cam_front.mp4
│ │ ├── cam_left.mp4
│ │ ├── cam_wrist.mp4
│ │ ├── robot_state.parquet
│ │ ├── robot_action.parquet
│ │ └── calibration.json
│ └── ep_000002/
│ └── ...
├── qa/
│ ├── qa_report.csv
│ └── rejected_episodes.csv
└── tools/
└── loader.pyManifest
Each line is one episode. Your loader can stream this.
manifest.jsonl
{"episode_id": "ep_000001", "task": "pick_mug", "outcome": "success", "duration_sec": 14.2}
{"episode_id": "ep_000002", "task": "pick_mug", "outcome": "success", "duration_sec": 12.8}
{"episode_id": "ep_000003", "task": "drawer_open", "outcome": "failure", "failure_reason": "gripper_slip"}QA report
Every drop includes a QA report. Full transparency about what was rejected and why.
| Metric | Value | Threshold | Status |
|---|---|---|---|
| Total episodes collected | 847 | — | — |
| Accepted | 812 | — | — |
| Rejected | 35 | — | — |
| Rejection rate | 4.1% | <10% | Pass |
| Avg frame drop rate | 0.3% | <1% | Pass |
| Max timestamp jitter | 8.2ms | <20ms | Pass |
Loader script
We include a loader so your team can start training immediately.
tools/loader.py
from pathlib import Path
import json, pandas as pd
class MotionLedgerDataset:
def __init__(self, path: str):
self.root = Path(path)
with open(self.root / "manifest.jsonl") as f:
self.manifest = [json.loads(l) for l in f]
def __len__(self):
return len(self.manifest)
def __getitem__(self, idx):
ep = self.manifest[idx]
ep_path = self.root / "episodes" / ep["episode_id"]
return {
"instruction": (ep_path / "instruction.txt").read_text(),
"state": pd.read_parquet(ep_path / "robot_state.parquet"),
"action": pd.read_parquet(ep_path / "robot_action.parquet"),
"outcome": ep["outcome"],
}
# Usage
dataset = MotionLedgerDataset("./dataset_v0.1")
print(f"Episodes: {len(dataset)}")Get a sample pack
Send us your spec and we'll build a sample pack in your exact format.
Request sample pack →