Skip to content Skip to footer

GaussianOcc: A Self-Supervised Approach for Efficient 3D Occupancy Estimation Using Advanced Gaussian Splatting Techniques


3D occupancy estimation methods initially relied heavily on supervised training approaches requiring extensive 3D annotations, which limited scalability. Self-supervised and weakly-supervised learning techniques emerged to address this issue, utilizing volume rendering with 2D supervision signals. These methods, however, faced challenges, including the need for ground truth 6D poses and inefficiencies in the rendering process. Existing datasets also presented limitations, with issues such as self-occlusion affecting prediction accuracy.

To overcome these challenges, researchers explored more efficient paradigms for self-supervised 3D occupancy estimation. The field sought solutions to reduce dependency on ground truth poses, improve rendering efficiency, and develop methods applicable to real-world scenarios with limited data availability. This paper introduces GaussianOcc, a fully self-supervised approach using Gaussian splatting, designed to address the limitations of previous methods and advance the field of 3D occupancy estimation.

Researchers from The University of Tokyo and South China University of Technology developed GaussianOcc, a novel approach for fully self-supervised and efficient 3D occupancy estimation using Gaussian splatting. This method addresses limitations in existing techniques, which often require ground truth 6D poses and rely on inefficient volume rendering. GaussianOcc introduces two key components: Gaussian Splatting for Projection (GSP) and Gaussian Splatting from Voxel Space (GSV). These innovations eliminate the need for ground truth poses during training and enhance rendering efficiency. The proposed method demonstrates competitive performance while achieving 2.7 times faster training and 5 times faster rendering compared to existing approaches, making it highly suitable for practical applications in 3D occupancy estimation.

GaussianOcc’s methodology centers on two innovative techniques,GSP and GSV. GSP provides accurate scale information during training without relying on ground truth 6D poses, utilizing adjacent view projections to create a cross-view loss. This approach optimizes model performance and eliminates dependency on external pose data. GSV enhances rendering efficiency by performing Gaussian splatting directly from the 3D voxel space, treating each vertex as a 3D Gaussian, and optimizing attributes within the voxel space.

The methodology employs a U-Net architecture with New-CRFs based on the Swin Transformer for depth estimation and a 6D pose network consistent with SurroundDepth. A scale-aware training strategy is implemented, incorporating masking techniques and refinement processes to enhance Gaussian splatting effectiveness and improve depth estimation accuracy. Comprehensive ablation studies evaluate the impact of various components, demonstrating the advantages of the proposed methods in terms of occupancy and depth metrics. This integrated approach achieves efficient and self-supervised 3D occupancy estimation, addressing key limitations in existing methods.

GaussianOcc demonstrates superior performance in 3D occupancy estimation through self-supervised training and efficient rendering. The method achieves 2.7 times faster training and 5 times faster rendering compared to traditional volume rendering. It outperforms existing approaches in occupancy metrics (mIoU) and depth estimation. The GSP module enables accurate scale information acquisition without ground truth poses. Scale-aware training and erosion operations enhance alignment and reduce artifacts. Splatting rendering maintains efficiency at higher resolutions, offering significant advantages over volume rendering. These advancements establish GaussianOcc as a benchmark in self-supervised 3D occupancy estimation.

In conclusion, GaussianOcc introduces a fully self-supervised and efficient approach for 3D occupancy estimation. The method demonstrates strong generalization ability across diverse environments, validated on nuScenes and DDAD datasets. Gaussian splatting in voxel grids surpasses traditional volume rendering in accuracy and efficiency, significantly reducing computational costs. The research highlights the importance of accurate depth estimation in occupancy prediction. GaussianOcc’s innovative use of a 6D pose network for self-supervised learning, coupled with its rendering advancements, marks a significant leap forward in 3D scene understanding and reconstruction techniques.


Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..

Don’t Forget to join our 50k+ ML SubReddit

Here is a highly recommended webinar from our sponsor: ‘Building Performant AI Applications with NVIDIA NIMs and Haystack’


Shoaib Nazir is a consulting intern at MarktechPost and has completed his M.Tech dual degree from the Indian Institute of Technology (IIT), Kharagpur. With a strong passion for Data Science, he is particularly interested in the diverse applications of artificial intelligence across various domains. Shoaib is driven by a desire to explore the latest technological advancements and their practical implications in everyday life. His enthusiasm for innovation and real-world problem-solving fuels his continuous learning and contribution to the field of AI





Source link

Leave a comment

0.0/5