MOSAIC: Bridging the Sim-to-Real Gap in Generalist Humanoid Motion Tracking and Teleoperation with Rapid Residual Adaptation

Zhenguo Sun^*1,2, Bo-Sheng Huang^*1,3, Yibo Peng^*1, Xukun Li^*1,2, Jingyu Ma¹, Yu Sun¹, Zhe Li¹, Haojun Jiang³, Biao Gao⁵, Zhenshan Bing^†4, Xinlong Wang^†1, Alois Knoll²

¹Beijing Academy of Artificial Intelligence, 100084 Beijing, China
²Technical University of Munich, 85748 Munich, Germany
³Tsinghua University, 100084 Beijing, China
⁴Nanjing University, 215163 Suzhou, China
⁵IO-AI.TECH, 518107 Shenzhen, China
^*Equal contribution, ^†Corresponding author

Paper Code Dataset

MOSAIC in Action. MOSAIC enables a single humanoid policy to operate in two modes: offline motion replay (top) and online whole-body teleoperation from multiple wearable interfaces (bottom). In offline replay, the robot robustly tracks diverse and highly dynamic reference motions—walking, running, kicking, kungfu-style strikes, jumping, and squatting. In online teleoperation, MOSAIC faithfully mirrors real-time human motion streams and supports challenging contact-rich and high-agility behaviors, including mid-air jump turns, single-leg support, and jump-shot–style movements.

Abstract

Generalist humanoid motion trackers have recently achieved strong simulation metrics by scaling data and training, yet often remain brittle on hardware during sustained teleoperation due to interface- and dynamics-induced errors. We present MOSAIC, an open-source, full-stack system for humanoid motion tracking and whole-body teleoperation across multiple interfaces. MOSAIC first learns a teleoperation-oriented general motion tracker via RL on a multi-source motion bank with adaptive resampling and rewards that emphasize world-frame motion consistency, which is critical for mobile teleoperation. To bridge the sim-to-real interface gap without sacrificing generality, MOSAIC then performs rapid residual adaptation: an interface-specific policy is trained using minimal interface-specific data, and then distilled into the general tracker through an additive residual module, outperforming naive fine-tuning or continual learning. We validate MOSAIC with systematic ablations, out-of-distribution benchmarking, and real-robot experiments demonstrating robust offline motion replay and online long-horizon teleoperation under realistic latency and noise.

System Overview

MOSAIC consists of a unified training–deployment pipeline for humanoid motion tracking and teleoperation. Training/Simulation aggregates heterogeneous multi-source motions, two-level adaptive resampling, policy training process, yielding a deployable policy that preserves generality while improving real-robot robustness. Deployment/Real Robot supports both offline motion replay and online teleoperation. Finally, RobotBridge provides a modular interface that enables consistent evaluation and portable deployment across platforms.

Offline Replay

Motion 1: Dancing.

Motion 2: Fall and getup.

Motion 3: Kicking motion.

Motion 4: Sustained high-speed rotation.

Motion 5: S-curve trajectory running.

Motion 6: Dancing.

Motion 7: Folk dance.

Motion 8: Dancing.

Inertial MoCap Teleoperation

Motion 1: Circular walking.

Motion 2: Running.

Motion 3: Walking around the grid (Lane Agility Test).

Motion 4: Deep squat maneuver.

Motion 5: Counter-clockwise spin jump.

Motion 6: Clockwise spin jump.

Motion 7: Floss dance.

Motion 8: Continuous spinning.

Motion 9: Jump-shot.

Motion 10: Jumping.

Motion 11: Hopscotch.

Motion 12: Box lifting.

VR Teleoperation

Motion 1: Walking.

Motion 2: Running.

Motion 3: Squatting posture movement.

Motion 4: Chest expansion + leg kick.

Motion 5: Torso twist + punching.

Motion 6: Squat-to-stand transition.

Motion 7: Pick up the item on the table.

Motion 8: Pick up the item on the chair.

Robustness Testing

Test 1: Offline replay robustness testing.

Test 2: Inertial MoCap teleoperation robustness testing.

Test 3: VR teleoperation robustness testing.

BibTeX


@article{sun2026mosaic,
    title={MOSAIC: Bridging the Sim-to-Real Gap in Generalist Humanoid Motion Tracking and Teleoperation with Rapid Residual Adaptation},
    author={Zhenguo Sun and Bo-Sheng Huang and Yibo Peng and Xukun Li and Jingyu Ma and Yu Sun and Zhe Li and Haojun Jiang and Biao Gao and Zhenshan Bing and Xinlong Wang and Alois Knoll},
    journal={arXiv preprint arXiv:2602.08594},
    year={2026}
}