We propose a novel approach to online visual tracking that combines the robustness of sparse coding representations with the flexibility of voting-based methods. Our algorithm relies on a dictionary that is once and for all learned from a large dataset of patches extracted from images unrelated to the test sequences. We propose to learn both shape and appearances of the target object by associating to each element of the dictionary a set of votes and corresponding appearances: this is the only information being updated during online tracking. Our method exhibits robustness towards occlusions, sudden local and global illumination changes as well as shape changes. We test our method on 50 standard sequences obtaining results comparable or superior to the state of the art.
«