Abstract
Generalist humanoid motion trackers have recently achieved strong simulation metrics by scaling data and training, yet often remain brittle on hardware during sustained teleoperation due to interface- and dynamics-induced errors. We present MOSAIC, an open-source, full-stack system for humanoid motion tracking and whole-body teleoperation across multiple interfaces. MOSAIC first learns a teleoperation-oriented general motion tracker via RL on a multi-source motion bank with adaptive resampling and rewards that emphasize world-frame motion consistency, which is critical for mobile teleoperation. To bridge the sim-to-real interface gap without sacrificing generality, MOSAIC then performs rapid residual adaptation: an interface-specific policy is trained using minimal interface-specific data, and then distilled into the general tracker through an additive residual module, outperforming naive fine-tuning or continual learning. We validate MOSAIC with systematic ablations, out-of-distribution benchmarking, and real-robot experiments demonstrating robust offline motion replay and online long-horizon teleoperation under realistic latency and noise.
System Overview
MOSAIC consists of a unified training–deployment pipeline for humanoid motion tracking and teleoperation. Training/Simulation aggregates heterogeneous multi-source motions, two-level adaptive resampling, policy training process, yielding a deployable policy that preserves generality while improving real-robot robustness. Deployment/Real Robot supports both offline motion replay and online teleoperation. Finally, RobotBridge provides a modular interface that enables consistent evaluation and portable deployment across platforms.
Offline Replay
Motion 1: Dancing.
Motion 2: Fall and getup.
Motion 3: Kicking motion.
Motion 4: Sustained high-speed rotation.
Motion 5: S-curve trajectory running.
Motion 6: Dancing.
Motion 7: Folk dance.
Motion 8: Dancing.
Inertial MoCap Teleoperation
Motion 1: Circular walking.
Motion 2: Running.
Motion 3: Walking around the grid (Lane Agility Test).
Motion 4: Deep squat maneuver.
Motion 5: Counter-clockwise spin jump.
Motion 6: Clockwise spin jump.
Motion 7: Floss dance.
Motion 8: Continuous spinning.
Motion 9: Jump-shot.
Motion 10: Jumping.
Motion 11: Hopscotch.
Motion 12: Box lifting.
VR Teleoperation
Motion 1: Walking.
Motion 2: Running.
Motion 3: Squatting posture movement.
Motion 4: Chest expansion + leg kick.
Motion 5: Torso twist + punching.
Motion 6: Squat-to-stand transition.
Motion 7: Pick up the item on the table.
Motion 8: Pick up the item on the chair.
Robustness Testing
Test 1: Offline replay robustness testing.
Test 2: Inertial MoCap teleoperation robustness testing.
Test 3: VR teleoperation robustness testing.
BibTeX
@article{sun2026mosaic,
title={MOSAIC: Bridging the Sim-to-Real Gap in Generalist Humanoid Motion Tracking and Teleoperation with Rapid Residual Adaptation},
author={Zhenguo Sun and Bo-Sheng Huang and Yibo Peng and Xukun Li and Jingyu Ma and Yu Sun and Zhe Li and Haojun Jiang and Biao Gao and Zhenshan Bing and Xinlong Wang and Alois Knoll},
journal={arXiv preprint arXiv:2602.08594},
year={2026}
}
MOSAIC in Action. MOSAIC enables a single humanoid policy to operate in two modes: