Skip to content Skip to footer

Researchers from CMU and Max Planck Institute Unveil WHAM: A Groundbreaking AI Approach for Precise and Efficient 3D Human Motion Estimation from Video


3D human motion reconstruction is a complex process that involves accurately capturing and modeling the movements of a human subject in three dimensions. This job becomes even more challenging when dealing with videos captured by a moving camera in real-world settings, as they often contain issues like foot sliding. However, a team of researchers from Carnegie Mellon University and Max Planck Institute for Intelligent Systems have devised a method called WHAM (World-grounded Humans with Accurate Motion) that addresses these challenges and achieves precise 3D human motion reconstruction.

The study reviews two methods for recovering 3D human pose and shape from images: model-free and model-based. It highlights the use of deep learning techniques in model-based methods for estimating the parameters of a statistical body model. Existing video-based 3D HPS methods incorporate temporal information through various neural network architectures. Some ways employ additional sensors, like inertial sensors, but they can be intrusive. WHAM stands out by effectively combining 3D human motion and video context, leveraging prior knowledge, and accurately reconstructing 3D human activity in global coordinates.

The research addresses challenges in accurately estimating 3D human pose and shape from monocular video, emphasizing global coordinate consistency, computational efficiency, and realistic foot-ground contact. Leveraging AMASS motion capture and video datasets, WHAM combines motion encoder-decoder networks for lifting 2D key points to 3D poses, a feature integrator for temporal cues, and a trajectory refinement network for global motion estimation considering foot contact, enhancing accuracy on non-planar surfaces.

WHAM employs a unidirectional RNN for online inference and precise 3D motion reconstruction, featuring a motion encoder for context extraction and a motion decoder for SMPL parameters, camera translation, and foot-ground contact probability. Utilizing a bounding box normalization technique aids in motion context extraction. The image encoder, pretrained on human mesh recovery, captures and integrates image features with motion features through a feature integrator network. A trajectory decoder predicts global orientation and a refinement process minimizes foot sliding. Trained on synthetic AMASS data, WHAM outperforms existing methods in evaluations.

https://arxiv.org/abs/2312.07531

WHAM surpasses current state-of-the-art methods, exhibiting superior accuracy in per-frame and video-based 3D human pose and shape estimation. WHAM achieves precise global trajectory estimation by leveraging motion context and foot contact information, minimizing foot sliding, and enhancing international coordination. The method integrates features from 2D key points and pixels, improving 3D human motion reconstruction accuracy. Evaluation of in-the-wild benchmarks demonstrates WHAM’s superior performance in metrics like MPJPE, PA-MPJPE, and PVE. The trajectory refinement technique further refines global trajectory estimation and reduces foot sliding, as evidenced by improved error metrics.

In conclusion, the study’s key takeaways can be summarized in a few points:

  • WHAM has introduced a pioneering method that combines 3D human motion and video context.
  • The technique enhances 3D human pose and shape regression.
  • The process uses a global trajectory estimation framework incorporating motion context and foot contact.
  • The method addresses foot sliding challenges and ensures accurate 3D tracking on non-planar surfaces.
  • WHAM’s approach performs well on diverse benchmark datasets, including 3DPW, RICH, and EMDB.
  • The method excels in efficient human pose and shape estimation in global coordinates.
  • The method’s feature integration and trajectory refinement significantly improve motion and global trajectory accuracy.
  • The method’s accuracy has been validated through insightful ablation studies.

Check out the Paper, Project, and CodeAll credit for this research goes to the researchers of this project. Also, don’t forget to join our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..


Hello, My name is Adnan Hassan. I am a consulting intern at Marktechpost and soon to be a management trainee at American Express. I am currently pursuing a dual degree at the Indian Institute of Technology, Kharagpur. I am passionate about technology and want to create new products that make a difference.




Source link