SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

SyncDreamer: Generating Multiview-consistent Images
from a Single-view Image

ICLR 2024 (Spotlight)

Yuan Liu^1,2, Cheng Lin^2, Zijiao Zeng², Xiaoxiao Long^1†, Lingjie Liu³, Taku Komura¹, Wenping Wang^4†

¹The University of Hong Kong ²Tencent Games ³University of Pennsylvania ⁴Texas A&M University
^*The first two authors contribute equally. ^†Corresponding authors.

Abstract

SyncDreamer is able to directly generate multiview consistent images, which allows 3D reconstruction by NeuS or NeRF without SDS loss.

In this paper, we present a novel diffusion model called SyncDreamer that generates multiview-consistent images from a single-view image. Using pretrained large-scale 2D diffusion models, recent work Zero123 demonstrates the ability to generate plausible novel views from a single-view image of an object. However, maintaining consistency in geometry and colors for the generated images remains a challenge. To address this issue, we propose a synchronized multiview diffusion model that models the joint probability distribution of multiview images, enabling the generation of multiview-consistent images in a single reverse process. SyncDreamer synchronizes the intermediate states of all the generated images at every step of the reverse process through a 3D-aware feature attention mechanism that correlates the corresponding features across different views. Experiments show that SyncDreamer generates images with high consistency across different views, thus making it well-suited for various 3D generation tasks such as novel-view-synthesis, text-to-3D, and image-to-3D.

Reverse process of SyncDreamer's multiview diffusion.