Object detection (OD) methods are finding application in various fields. The OD problem can be divided into two sub-problems, namely object classification and localization. While the former aims to answer the question what class a given object belongs to, the latter focuses on locating an object within a given image. For localization, both implicit representations, which border the object and its features (e.g. bounding boxes, polygons and masks), and explicit representations, which describe the object’s pose in an image (e.g. 6D pose, keypoints), are used. The 2D pose is a simple, yet effective representation that has so
far been overlooked. In this paper, we therefore motivate and formulate the use of 2D poses for object localization. Furthermore, represent RetinaNet-2DP, an anchor-based convolutional neural network (CNN) that is capable of detecting objects using 2D poses. To do so, we propose the idea of Anchor Poses and the Gaussian Kernel Distance as a similarity metric between poses. Experiments on the DOTA dataset and two robotics use cases from industry emphasize the performance of the network architecture and more generally demonstrate the potential of the proposed localization representation. Finally, we critically assess our findings and present an outlook of future work.
«
Object detection (OD) methods are finding application in various fields. The OD problem can be divided into two sub-problems, namely object classification and localization. While the former aims to answer the question what class a given object belongs to, the latter focuses on locating an object within a given image. For localization, both implicit representations, which border the object and its features (e.g. bounding boxes, polygons and masks), and explic...
»