ConTrack: Constrained Hand Motion Tracking with Adaptive Trade-off Control

Yutong Liang , Quanquan Peng , Ri-Zhao Qiu , Xiaolong Wang

University of California San Diego

Overview

ConTrack is a reinforcement learning framework that turns contact-rich human hand-object demonstrations into robot motions.

Video

Results

We test ConTrack on bimanual object manipulation (GRAB Dataset), articulated tool use (ARCTIC Dataset), and in-hand rotation (DexterHand Dataset). Each video below is one policy per clip, trained with the same recipe and shown on both the XHand and Sharpa Wave robot hands.

Real Robot Videos

Cylinder Handover, real XHand
Hammer Use, real XHand
Cube Manipulation, real XHand

Simulation Videos

GRAB Dataset

Cube Handover, XHand
Cube Handover, Sharpa Wave
Hammer Use, XHand
Hammer Use, Sharpa Wave
Waterbottle Handover, XHand
Waterbottle Handover, Sharpa Wave
Wineglass Handover, XHand
Wineglass Handover, Sharpa Wave

ARCTIC Dataset

Box Use, XHand
Box Use, Sharpa Wave
Mixer Use, XHand
Mixer Use, Sharpa Wave
Notebook Use, XHand
Notebook Use, Sharpa Wave
Waffleiron Use, XHand
Waffleiron Use, Sharpa Wave

DexterHand Dataset

Cuboid 0 Manipulation, XHand
Cuboid 0 Manipulation, Sharpa Wave
Cuboid 1 Manipulation, XHand
Cuboid 1 Manipulation, Sharpa Wave
Cylinder Manipulation, XHand
Cylinder Manipulation, Sharpa Wave
Ring Manipulation, XHand
Ring Manipulation, Sharpa Wave

Method

ConTrack uses object motion as the task target, then spends the remaining learning capacity on hand motion and contact style.

BibTeX

@misc{liang2026contrack,
      title={ConTrack: Constrained Hand Motion Tracking with Adaptive Trade-off Control}, 
      author={Liang, Yutong and Peng, Quanquan and Qiu, Ri-Zhao and Wang, Xiaolong},
      year={2026},
      eprint={2606.03177},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2606.03177}, 
}