Researchers from UT Austin and Meta Developed SteinDreamer: A Breakthrough in Text-to-3D Asset Synthesis Using Stein Score Distillation for Superior Visual Quality and Accelerated Convergence

Recent advancements in text-to-image generation driven by diffusion models have sparked interest in text-guided 3D generation, aiming to automate 3D asset creation for virtual reality, movies, and gaming. However, challenges arise in 3D synthesis due to scarce high-quality data and the complexity of generative modeling with 3D representations. Score distillation techniques have emerged to address the lack of 3D data, utilizing a 2D diffusion model. Yet, recognized issues include noisy gradients and instability stemming from denoising uncertainty and small batch sizes, resulting in slow convergence and suboptimal solutions.

Researchers from The University of Texas at Austin and Meta Reality Labs have developed SteinDreamer, which integrates the proposed Stein Score Distillation(SSD) into a text-to-3D generation pipeline. SteinDreamer consistently addresses variance issues in the score distillation process. In 3D object and scene-level generation, SteinDreamer surpasses DreamFusion and ProlificDreamer, delivering detailed textures and precise geometries and mitigating Janus and ghostly artifacts. SteinDreamer’s reduced variance accelerates the convergence of 3D generation, resulting in fewer iterations.

Recent advancements in text-to-image generation, driven by diffusion models, have sparked interest in text-guided 3D generation, aiming to automate and accelerate 3D asset creation in virtual reality, movies, and gaming. The study mentions score distillation, a prevalent approach for text-to-3D asset synthesis, and highlights this method’s high variance in gradient estimation. The study also mentions the seminal works SDS from DreamFusion and VSD from ProlificDreamer, which are compared against the proposed SteinDreamer in the experiments. VSD is another variant of score distillation introduced by ProlificDreamer, which minimizes the KL divergence between the image distribution rendered from a 3D representation and the prior distribution.

The SSD technique incorporates control variates constructed by Stein’s identity to reduce variance in score distillation for text-to-3D asset synthesis. The proposed SSD allows for including flexible guidance priors and network architectures to optimize for variance reduction explicitly. The overall pipeline is implemented by instantiating the control variate with a monocular depth estimator. The effectiveness of SSD in reducing distillation variance and improving visual quality is demonstrated through experiments on both object-level and scene-level text-to-3D generation.

The proposed SteinDreamer, incorporating the SSD technique, consistently improves visual quality for object- and scene-generation generation in text-to-3D asset synthesis. SteinDreamer achieves faster convergence than existing methods due to more stable gradient updates. Qualitative results show that SteinDreamer generates views with less over-saturation and over-smoothing artifacts than SDS. In challenging scenarios for scene generation, SteinDreamer produces sharper results with better details than SDS and VSD. The experiments demonstrate that SSD effectively reduces distillation variance, improving visual quality in both object- and scene-generation generation.

In conclusion, The study presents SteinDreamer, a more general solution for reducing variance in score distillation for text-to-3D asset synthesis. Based on Stein’s identity, the proposed SSD technique effectively reduces distillation variance and consistently improves visual quality for both object- and scene-generation generations. SSD incorporates control variates constructed by Stein identity, allowing for flexible guidance priors and network architectures to optimize for variance reduction. SteinDreamer achieves faster convergence than existing methods due to more stable gradient updates. Empirical evidence shows that VSD consistently outperforms SDS, indicating that the variance of their numerical estimation significantly differs. SSD, implemented in SteinDreamer, yields results with richer textures and lower level variance than SDS.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 35k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.

⬆️ Join Our 35k+ ML SubReddit

Source link

Researchers from UT Austin and Meta Developed SteinDreamer: A Breakthrough in Text-to-3D Asset Synthesis Using Stein Score Distillation for Superior Visual Quality and Accelerated Convergence

Researchers from UT Austin and Meta Developed SteinDreamer: A Breakthrough in Text-to-3D Asset Synthesis Using Stein Score Distillation for Superior Visual Quality and Accelerated Convergence

Popular Posts

Meeting minutes generation with ChatGPT 4 API, Google Meet, Google Drive & Docs APIs | by Offer SADEY

How to implement Adaptive AI in your business | by LeewayHertz

Recent Posts

Recent Comments

Archives