In the era of edge computing, deploying sophisticated models like Latent Diffusion Models (LDMs) on resource-constrained devices poses a unique set of challenges. These dynamic models, renowned for capturing temporal evolution, demand efficient strategies to navigate the limitations of edge devices. This study addresses the challenge of deploying LDMs on edge devices by proposing a…
Text-to-image diffusion models are generative models that generate images based on the text prompt given. The text is processed by a diffusion model, which begins with a random image and iteratively improves it word by word in response to the prompt. It does this by adding and removing noise to the idea, gradually guiding it…
Reinforcement learning (RL) agents epitomize artificial intelligence by embodying adaptive prowess, navigating intricate knowledge landscapes through iterative trial and error, and dynamically assimilating environmental insights to autonomously evolve and optimize their decision-making capabilities. Developing generalist RL agents that can perform diverse tasks in complex environments is a challenging task that requires numerous reward functions. However,…
Large Language Models (LLMs) have successfully utilized the power of Artificial Intelligence (AI) sub-fields, including Natural Language Processing (NLP), Natural Language Generation (NLG), and Computer Vision. With LLMs, the creation of vision-language models that can reason complexly about images, respond to queries pertaining to images, and describe images in natural language has been made possible.…
The progress in neural rendering has brought significant breakthroughs in reconstructing scenes and generating new viewpoints. However, its effectiveness largely depends on the precise pre-computation of camera poses. To minimize this problem, many efforts have been made to train Neural Radiance Fields (NeRFs) without precomputed camera poses. However, the implicit representation of NeRFs makes it…
Object perception in images and videos unleashes the power of machines to decipher the visual world. Like virtual sleuths, computer vision systems scour pixels, recognizing, tracking, and understanding the myriad objects that paint the canvas of digital experiences. This technological prowess, fueled by deep learning magic, opens doors to transformative applications – from self-driving cars…
NVFi tackles the intricate challenge of comprehending and predicting the dynamics within 3D scenes evolving over time, a task critical for applications in augmented reality, gaming, and cinematography. While humans effortlessly grasp the physics and geometry of such scenes, existing computational models struggle to explicitly learn these properties from multi-view videos. The core issue lies…
Video super-resolution, aiming to elevate the quality of low-quality videos to high fidelity, faces the daunting challenge of addressing diverse and intricate degradations commonly found in real-world scenarios. Unlike previous focuses on synthetic or specific camera-related degradations, the complexity arises from multiple unknown factors like downsampling, noise, blur, flickering, and video compression. While recent CNN-based…
3D human motion reconstruction is a complex process that involves accurately capturing and modeling the movements of a human subject in three dimensions. This job becomes even more challenging when dealing with videos captured by a moving camera in real-world settings, as they often contain issues like foot sliding. However, a team of researchers from…
The field of pose estimation, which involves determining the position and orientation of an object in space, is a rapidly evolving area, with researchers continuously developing new methods to improve its accuracy and performance. Researchers from three highly regarded institutions – Tsinghua Shenzhen International Graduate School, Shanghai AI Laboratory, and Nanyang Technological University – have…