This thesis addresses the challenge of automatic on-site data acquisition in BIM by developing new datasets and exploring efficient scene understanding algorithms. The primary objectives are twofold: creating a dataset specific to construction environments and exploring semi-supervised learning algorithms to enhance scene understanding.
The research identifies the necessary data types for accurately interpreting construction site scenes and streamlining the creation of high-quality segmentation data. A new dataset is generated using RGB images from the ConSLAM Sequence 2, annotated with segments of construction-related objects. The semi-supervised learning workflow, RTMDet-SAM, is proposed to generate pseudo labels, enhancing model training without extensive manual labeling.
Experiments show that pseudo labels generated by RTMDet-SAM improve recall and generalization performance for Mask R-CNN compared to traditional methods, with an average recall increase of 2.5%. Confidence scores of inferred segments improved by up to 55%. Additionally, a zero-shot approach using Grounding DINO shows promise in generating pseudo labels with minimal manual intervention.
This research contributes a new annotated dataset, a semi-supervised learning workflow, and insights into zero-shot learning for scene understanding in construction environments. These advancements aim to enhance automation in on-site data acquisition processes, reducing labor and costs associated with manual data collection for updating BIM.
«