In this work, we propose a method for object recognition and pose estimation using convolutional neural networks for robust feature descriptor learning from depth images. Compared to the previous methods solving this problem, which use nearest neighbor search on an estimated descriptor space, we create an efficient multi-task learning framework with direct pose regression. By combining the strengths of manifold learning using triplet loss and regression, we take a step in estimating the pose directly as opposed to relying on nearest neighbor search methods, the complexity of which grows linearly with respect to the number of objects. Furthermore, we conduct a detailed analysis of nearest neighbor search on feature descriptors and regression and show how both components are beneficial to each other. By leveraging the advantages of both manifold learning and regression tasks, we are able to improve the current state-of-the-art for object recognition and pose retrieval.
«
In this work, we propose a method for object recognition and pose estimation using convolutional neural networks for robust feature descriptor learning from depth images. Compared to the previous methods solving this problem, which use nearest neighbor search on an estimated descriptor space, we create an efficient multi-task learning framework with direct pose regression. By combining the strengths of manifold learning using triplet loss and regression, we take a step in estimating the pose dir...
»