Human data for robot world models

Teach your robot
to understand people.

Biomechanically faithful human motion, the objects people interact with, and the space they move through — captured in real time from standard RGB cameras, or extracted from video with natural-language annotations. Training data for world models. Perception for robots that work alongside people.
Written end-to-end in Rust 🦀.

Discover Zendo Talk to us

Products

Ready to run.
Ready to build on.

Start with Zendo — a turnkey desktop app for real-time human motion capture, teleoperation and human–robot interaction. Then send your recordings to Yume and get back annotated, world-model-ready training data.

Zendo · Real-time capture

Capture human motion. In real-time. With one click.

Zendo is a native desktop app that turns a standard RGB camera into a real-time human motion capture system. Run it in monocular mode with a single camera, or add more for stereo mode — which unlocks millimeter-range accuracy from a one-click multi-camera calibration. Stream the motion live for teleoperation, safety and human–robot interaction, or record and upload straight to Yume.

SDKRust Python

Native desktop app — macOS and Linux
Monocular mode: plug in one RGB camera and go
Stereo mode: 2+ cameras, mm-range accuracy, one-click calibration
80 DOF, biomechanically-faithful human motion
SDK for streaming live data to your own robot/application
Upload recordings to Yume for annotation

Download Zendo →Read the docs →

Yume · Video-to-data engineComing soon

Turn video into world-model training data.

Upload footage from Zendo and Yume extracts the ground truth: high-accuracy 3D human movement analysis, 3D reconstruction of the objects people interact with, and natural-language annotations of what happens in every clip. Data for world models that understand people and the environments they live and work in.

High-accuracy 3D human movement analysis — 80 DOF, biomechanically faithful
3D objects — reconstruct what people interact with
Natural-language annotations of every scene
Upload directly from Zendo

Request access →

Yume

Bare footage versus the 3D motion, objects and annotations Yume extracts from it.

/assets/yume.mp4

30–60 fps

Real-time human understanding

≥ 1 RGB

No markers, no suits, no depth sensor

80 DOF

Biomechanical body + hand model

Built with Rust 🦀

Deterministic, real-time, multiplatform

Applications

Built for robots that work with people.

From world-model training data to real-time control — the same stack powers them all.

01 — World models

World-model training data

Capture rich, multi-modal human datasets — 3D motion, hands, objects, depth and language annotations — to train world models that understand how people move and interact with their environment.

02 — Data

Large-scale data collection

Run markerless capture in the field — in clinics, warehouses, factories, homes — and collect biomechanically accurate motion data without slowing anyone down.

03 — Teleoperation & HRI

Teleoperation & interaction policies

Stream real-time human motion to teleoperate robots, or give control policies the perception to cooperate, yield, hand over objects and stay safe around people.

04 — Monitoring

Continuous monitoring & analytics

Ergonomics, rehabilitation, safety, sports — track human motion over time with privacy-preserving, on-device analysis.

Get started

Give your robot a sense of us.

Whether you need real-time motion capture, annotated training data from your own videos, or help designing control policies around people — we’d love to talk.

Teach your robotto understand people.

Ready to run.Ready to build on.