We propose an online RGB-D based scene understanding method for indoor scenes running in real-time on mobile devices. First, we incrementally reconstruct the scene via SLAM and compute a 3D geometric segmentation by fusing segments obtained from each input depth image in a global 3D model. We combine this geometric segmentation with semantic annotations to obtain a semantic segmentation in form of a semantic map. To accomplish efficient semantic segmentation, we encode the segments in the global model with a fast incremental 3D descriptor and use a random forest to determine its semantic label. The predictions from successive frames are then fused to obtain a confident semantic class across time. As a result, the overall method achieves an accuracy that gets close to most state-of-the-art 3D scene understanding methods while being much more efficient, enabling real-time execution on low-power embedded systems.
«
We propose an online RGB-D based scene understanding method for indoor scenes running in real-time on mobile devices. First, we incrementally reconstruct the scene via SLAM and compute a 3D geometric segmentation by fusing segments obtained from each input depth image in a global 3D model. We combine this geometric segmentation with semantic annotations to obtain a semantic segmentation in form of a semantic map. To accomplish efficient semantic segmentation, we encode the segments in the global...
»