Scene understanding is an important class of computer vision problems that is an enabler for a wide variety of applications such as advanced driver assistance systems, autonomous vehicles or mobile assistive robots. Semantic segmentation is one of the common ways to address this problem. Unlike the more standard approaches based on a probabilistic graphical model, in this paper we present a two stage classification framework based on the concept of pixel neighborhoods. In the first stage, every pixel is classified based on its appearance. The output of the first classifier in a specific region around every pixel, which we call the pixel neighborhood, is summarized by a novel voting histogram feature and given as input to a second classifier. We show how to define the pixel neighborhood by using the geodesic distance in a way that it is able to capture both local image context as well as more global object relations. We perform a quantitative and qualitative evaluation on six well-known and challenging datasets and show that our model is able to natively handle both 2D and 3D data. We compare our method to several baselines and multiple closely related methods and show state-of-the-art performance. We also present a real world application of our method in a system that automatically detects parking spaces from a moving vehicle in real time.
«
Scene understanding is an important class of computer vision problems that is an enabler for a wide variety of applications such as advanced driver assistance systems, autonomous vehicles or mobile assistive robots. Semantic segmentation is one of the common ways to address this problem. Unlike the more standard approaches based on a probabilistic graphical model, in this paper we present a two stage classification framework based on the concept of pixel neighborhoods. In the first stage, every...
»