Hi5: 2D Hand Pose Estimation with Zero Human Annotation

We introduce Hi5, a large synthetic dataset and data synthesis pipeline for 2D hand pose estimation that requires no human annotation, enabling diverse and accurate model training with only consumer-grade hardware.

  • Developed a data synthesis pipeline using high-fidelity 3D hand models, diverse genders and skin tones, and dynamic environments to generate realistic 2D hand images with automatic keypoint annotation.
  • Constructed the Hi5 dataset comprising ~583,000 labeled images, produced in 48 hours on a single consumer computer with full pose annotations.
  • Demonstrated that models trained on Hi5 perform competitively on real hand pose benchmarks and show robustness under occlusions and varied conditions.
  • Illustrated cost-effective synthetic data generation as a viable alternative to expensive manual annotation, expanding accessibility for pose estimation research.