Interaction-aware hand motion retargeting spanning geometry, force, and self-supervision.

Background and Definition#

In character animation and robotics, Motion Retargeting converts one individual’s joint state (qpos) into another individual’s joint state.

For hands, it means taking a human-hand motion (or one robot hand’s motion) and turning it into commands for a different hand, while keeping the object interaction intact. If you record a human picking up a block, good retargeting should produce robot joint commands that pick up that same block. This shows up everywhere in teleoperation, imitation learning, and data augmentation, because it lets you reuse demonstrations across embodiments.

If the degrees of freedom are identical, you can often get away with copying joint angles. Once the hands differ in DOF, link lengths, or limits, that naive trick breaks and contacts drift. Contacts are what make this hard: small pose errors can turn into big interaction errors. That is why a lot of recent work goes beyond geometric matching and folds in object shape, force/tactile cues, and action intent, often with self-supervision and unpaired data.

Geometric Retargeting#

Early approaches focused on geometric consistency: align keypoints, scale trajectories, and absorb residuals via optimization. AnyTeleop 1 includes the vector error from wrist to fingertip in the objective and applies smooth regularization. DexH2R 2 scales human-hand trajectories and then solves a nonlinear optimization to produce a joint sequence for the robot hand. This path provides straightforward geometric intuition but lacks modeling of object semantics, so it often becomes unstable when the task or contact surface changes.

Object-conditioned Retargeting#

When the hand interacts with objects of different shapes, joint angles and contact distributions rearrange systematically. If we continue to hard-map human poses to a robot hand, contact points will misalign, grip forces will become unbalanced, and the resulting pose will look unnatural. So recent work uses object geometry as an input: align the object, then infer a hand pose that matches the intended interaction.

  • FunGrasp (2024) 3: A three-stage pipeline: estimate a functional human-hand pose from a single RGB-D image; retarget in the object frame by aligning link directions and optimizing contacts; then train a vision-and-touch DRL policy to adapt to shape variation and unseen objects, with privileged learning and system ID for sim-to-real.
  • DexFlow (2025) 4: Builds a hierarchical optimization pipeline. It performs a global pose search to match human and robot hands, then locally optimizes contacts with an energy function so the robot hand naturally conforms to the object surface. It also extracts stable contacts via dual-threshold detection with temporal smoothing, and releases a cross-hand-topology dataset containing 292k grasp frames to support this pipeline.
  • Kinematic Motion Retargeting for Contact-Rich Manipulations (2024) 5: Treats retargeting as a non-isometric shape matching problem. Using surface contact regions and marker data, it incrementally estimates and optimizes target-hand trajectories via inverse kinematics. The core contributions are a local shape-matching algorithm and a multi-stage optimization pipeline that maintains consistent contact distributions over full manipulation sequences, and supports object replacement and cross-hand generalization.
  • Learning Cross-hand Policies of High-DOF Reaching and Grasping (2024) 6: Proposes a hand-shape-agnostic state-action representation and a two-stage framework. A unified policy predicts displacements of grasp keypoints, then hand-specific adapters convert them to each hand’s joint controls, enabling cross-hand transfer of high-DOF grasping. Inputs are semantic keypoints and the interaction bisector surface (IBS); a Transformer learns relations among fingers, yielding generalization over different hands and objects.

Force-conditioned Retargeting#

Force placement often decides whether a grasp holds. Even for the same object, changing the force profile can change the target pose, so it helps to treat force as an explicit condition.

  • Feel the Force: Contact-Driven Learning from Humans (2025) 7: Uses a tactile glove to record human contact forces and keypoint coordinates, predicts robot trajectories and desired grasp forces, and at execution time adjusts the gripper with PD control to track tactile demonstrations. However, the pipeline involves many hand-tuned components and has limited transferability.
  • DexMachina (2025) 8: Introduces a fading virtual-object controller during RL and adds contact and task rewards, but this should be considered RL tracking rather than true retargeting.

Cross-embodiment and Self-supervision#

The appeal here is to stop relying on manually paired data and instead learn cross-hand mappings from action principles.

Personally, I like the versions that learn mappings from rules rather than curated paired demonstrations. In XL-VLA, we explore this idea via CrossLatent: a shared latent action space trained with differentiable kinematic constraints and random joint sampling, then plugged into VLA models as a unified action interface.

  • Geometric Retargeting (2025) 9: Uses action principles such as fingertip-velocity consistency as self-supervised signals to learn unpaired, cross-embodiment mappings that preserve contact semantics and motion stability despite scale and joint differences, and has been integrated as a geometric prior into Dexterity Gen 10.
  • XL-VLA / CrossLatent (2026) 11: Pretrains a shared latent action space with a multi-headed VAE across heterogeneous hands using reconstruction, differentiable-FK fingertip retargeting, and a smooth latent prior; the frozen encoders/decoders turn hand-specific joint chunks into a unified token interface for VLA models.
  • Learning to Transfer Human Hand Skills for Robot Manipulations (2025) 12: Fits a shared manifold of human-hand motion, robot actions, and object motion; trains on synthetic paired triplets to avoid the high cost of real human-robot pairs.

Conclusions#

Retargeting alone rarely survives contact-rich manipulation, so recent work leans on visual and tactile cues to get cleaner contacts and better generalization. But beyond simple pick-and-place, transfer is still brittle. Two failure modes show up again and again: object understanding and force consistency. Change the object’s shape or function and the “same” human motion should map to a different robot configuration. Keep the object fixed but change the force distribution and the target qpos should move too. That is why many methods explicitly condition on object geometry/functionality, contact, or force targets.

You could also imagine a future where RL gives us native dexterous policies that are strong enough that retargeting becomes an extra input alignment problem. That controller does not really exist yet. For now, object- and force-conditioned retargeting is where most of the practical wins are.


Footnotes#

  1. AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation System. https://arxiv.org/abs/2307.04577v3

  2. DexH2R. https://arxiv.org/abs/2411.04428.pdf

  3. FunGrasp: Functional Grasping for Diverse Dexterous Hands. https://arxiv.org/abs/2411.16755v1

  4. DexFlow: A Unified Approach for Dexterous Hand Pose Retargeting and Interaction. https://arxiv.org/abs/2505.01083v1

  5. Kinematic Motion Retargeting for Contact-Rich Anthropomorphic Manipulations. https://arxiv.org/abs/2402.04820.pdf

  6. Learning Cross-hand Policies of High-DOF Reaching and Grasping. https://arxiv.org/abs/2404.09150

  7. Feel the Force: Contact-Driven Learning from Humans. https://arxiv.org/abs/2506.01944.pdf

  8. DexMachina. https://arxiv.org/abs/2505.24853.pdf

  9. Geometric Retargeting. https://arxiv.org/abs/2503.07541

  10. Dexterity Gen. https://zhaohengyin.github.io/dexteritygen/

  11. XL-VLA / CrossLatent: Cross-Hand Latent Representation for Vision-Language-Action Models. https://xl-vla.github.io

  12. Learning to Transfer Human Hand Skills for Robot Manipulations. https://arxiv.org/abs/2501.04169v1

Hand Motion Retargeting
https://www.lyt0112.com/blog/retargeting-en
Author Yutong Liang
Published at March 10, 2026
Last Updated March 10, 2026
Blog Content Copyright CC BY 4.0
Comment seems to stuck. Try to refresh?✨