Deep learning algorithms can identify valid semantic information from the original 3D point cloud that can be used to create BIM models of the built environment. This is an important step in generating the digital twin of a building. Compared with traditional unimodal deep learning algorithms that directly process 3D point clouds, multimodal fusion algorithms that leverage 2D images as supplementary information for 3D scenes have greater performance advantages. In this study, the performance of an open-source multimodal algorithm, MVPNet, is improved on 3D semantic segmentation task by using KPConv as a more robust and stronger 3D backbone. Different modules of the two networks are meaningfully combined: the 2D-3D lifting method provided by MVPNet aggregates selected 2D multi-view images features into 3D point cloud, and then KPConv is used to fuse these features in 3D space to predict 3D semantic labels. On a custom ScanNet dataset, the proposed network achieves a score of 74.40 mIoU on the 3D semantic segmentation task, outperforming the original MVPNet (+3.19 mIoU). In addition, rich ablation studies are designed to investigate the appropriate fusion structure, timing, and the effect of 3D color, etc.
«
Deep learning algorithms can identify valid semantic information from the original 3D point cloud that can be used to create BIM models of the built environment. This is an important step in generating the digital twin of a building. Compared with traditional unimodal deep learning algorithms that directly process 3D point clouds, multimodal fusion algorithms that leverage 2D images as supplementary information for 3D scenes have greater performance advantages. In this study, the performance of...
»