Abstract Instance segmentation of indoor point clouds remains difficult, driven by data scale, clutter, and imbalance across object classes. Geometry-driven methods such as SphericalMask provide robust coarse localization through spherical polygons and radial point migration, but they lack learned instance reasoning. Transformer-based decoders, while offering global context, often suffer from noisy attention and weak geometric grounding. This thesis addresses these limitations by extending the custom SphericalMask pipeline with an instance-aware MaskModule and an optional detection head. The MaskModule predicts a per-query spatial support mask that restricts cross-attention to meaningful regions, reducing global noise and improving separation between small and under-represented classes. The detection head adds geometric supervision by predicting abjectness and bounding boxes from pooled instance features, acting as a backbone regularizer. Both components were integrated into the AIH-3DIS framework and evaluated across three model variants. Experiments show that the MaskModule variant achieves the strongest overall performance with an AP of 0.320 (+2.7 over baseline) and improvements in AP50 and AP25. While the baseline attains slightly higher strict recall, its predictions lack spatial precision. In contrast, the MaskModule provides a balanced trade-off, offering more stable, fine-grained segmentation particularly for small building elements thereby improving downstream BIM and digital-twin applications.
«
Abstract Instance segmentation of indoor point clouds remains difficult, driven by data scale, clutter, and imbalance across object classes. Geometry-driven methods such as SphericalMask provide robust coarse localization through spherical polygons and radial point migration, but they lack learned instance reasoning. Transformer-based decoders, while offering global context, often suffer from noisy attention and weak geometric grounding. This thesis addresses these limitations by extending the...
»