This project explores the use of modern computer vision techniques, such as ViT and CNN, to understand the key architectural characteristics of buildings from Google Street View (GSV) imagery. With the rising interest in scalable methods for urban energy analysis, this project aims to extract four specific building attributes: building type, construction year, number of floors, and window-to-wall ratio (WWR). A dataset of ~3,000 filtered street-view building images from Germany was compiled using automated scripts and the GSV Static API which is then followed by scene filtering through pretrained models. Multiple model architectures like convolutional neural networks (CNNs), Vision Transformers (ViTs), Support Vector Regression (SVR), and pretrained segmentation networks were applied across different tasks. Results show that ViT- based models, especially DINOv2, consistently outperformed traditional CNNs in classification tasks. DINOv2 achieved up to 56.11% accuracy for residential subtype classification. A hybrid DINOv2 + SVR pipeline achieved the lowest mean absolute error (±16.9 years) for construction year estimation. Floor number estimation reached 48.33% accuracy with DINOv2 model. Moreover, WWR estimation using a pretrained SegFormer model showed feasibility despite having no training process and also relying on weak validation data. The results show the effectiveness of pretrained transformer-based models in low-data scenarios and the importance of combining deep features with classical machine learning methods. This work demonstrates a scalable, interpretable pipeline for visual building analysis and provides insights into urban modeling using publicly available imagery.
«
This project explores the use of modern computer vision techniques, such as ViT and CNN, to understand the key architectural characteristics of buildings from Google Street View (GSV) imagery. With the rising interest in scalable methods for urban energy analysis, this project aims to extract four specific building attributes: building type, construction year, number of floors, and window-to-wall ratio (WWR). A dataset of ~3,000 filtered street-view building images from Germany was compiled usin...
»