Understanding the world from a first-person perspective is essential in Augmented Reality (AR), as it introduces unique challenges and significant visual transformations compared to third-person views. While synthetic data has greatly benefited vision models in third-person views, its utilization in tasks involving embodied egocentric perception still needs to be explored. A major obstacle in this domain is the accurate simulation of natural human movements and behaviors, crucial for steering embodied cameras to capture faithful egocentric representations of the 3D environment.
In response to this challenge, researchers at ETH Zurich and Microsoft present EgoGen, a novel synthetic data generator designed to produce precise and comprehensive ground-truth training data for egocentric perception tasks. At the core of EgoGen lies a pioneering human motion synthesis model that directly utilizes egocentric visual inputs from a virtual human to perceive the surrounding 3D environment.
This model is augmented with collision-avoiding motion primitives and employs a two-stage reinforcement learning strategy, thereby providing a closed-loop solution where the embodied perception and movement of the virtual human are seamlessly integrated. Unlike previous approaches, their model eliminates the need for a predefined global path and directly applies to dynamic environments.
With EgoGen, one can seamlessly augment existing real-world egocentric datasets with synthetic images. Their quantitative evaluations showcase significant improvements in the performance of state-of-the-art algorithms across various tasks, including mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views. These results underscore the efficacy of EgoGen in enhancing the capabilities of existing algorithms and highlight its potential to advance research in egocentric computer vision.
EgoGen is complemented by an easy-to-use and scalable data generation pipeline, showcasing its effectiveness across three key tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views. By making EgoGen fully open-sourced, researchers aim to provide a practical solution for creating realistic egocentric training data and serve as a valuable resource for egocentric computer vision research.
Furthermore, EgoGen’s versatility and adaptability make it a promising tool for various applications beyond tasks such as human-computer interaction, virtual reality, and robotics. With its release as an open-source tool, researchers anticipate EgoGen fostering innovation and advancements in the field of egocentric perception and contributing to the broader landscape of computer vision research.
Check out the Paper and Code. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our Telegram Channel
Arshad is an intern at MarktechPost. He is currently pursuing his Int. MSc Physics from the Indian Institute of Technology Kharagpur. Understanding things to the fundamental level leads to new discoveries which lead to advancement in technology. He is passionate about understanding the nature fundamentally with the help of tools like mathematical models, ML models and AI.