Accepted Paper at BMVC 2021

3D-RETR: End-to-End Single and Multi-View 3D Reconstruction with Transformers



Zai Shi, Zhao Meng, Yiran Xing, Yunpu Ma, Roger Wattenhofer


The 32nd British Machine Vision Conference (BMVC 2021),
22–25 November 2021, Virtual



3D reconstruction aims to reconstruct 3D objects from 2D views. Previous works for 3D reconstruction mainly focus on feature matching between views or using CNNs as back- bones. Recently, Transformers have been shown effective in multiple applications of com- puter vision. However, whether or not Transformers can be used for 3D reconstruction is still unclear. In this paper, we fill this gap by proposing 3D-RETR, which is able to per- form end-to-end 3D REconstruction with TRansformers. 3D-RETR first uses a pretrained Transformer to extract visual features from 2D input images. 3D-RETR then uses another Transformer Decoder to obtain the voxel features. A CNN Decoder then takes as input the voxel features to obtain the reconstructed objects. 3D-RETR is capable of 3D reconstruction from a single view or multiple views. Experimental results on two datasets show that 3D- RETR reaches state-of-the-art performance on 3D reconstruction. Additional ablation study also demonstrates that 3D-DETR benefits from using Transformers. [pdf]