NetVLAD – Supervised Place Recognition

Posted on February 15, 2017

0


Download PPT – Google Docs

Vector of locally aggregated descriptors (VLAD) [1] is a simple and popular technique for computing a fingerprint of an image for place recognition. It basically forms say K=64 clusters of SIFT like descriptors (descriptors at SIFT feature points). Then, for every descriptor subtracts it from cluster center and adds it up. This process is repeated for every cluster center.

NetVLAD[2] is an extension of this strategy to learn filters and cluster centers which help distinguish images for place recognition. I had given a short talk on details on this method, which can be accessed from [HERE].

The NetVLAD computation is summarized as follows, with W_k, b_k,  C_k as trainable parameters for every k between 1, …, K (say 64).

Screenshot from 2017-02-15 12:34:31.png

Note: The NetVLAD framework relies on this cool trick. I had talked in an earlier blog post which can be access from here.

References

[1] Arandjelovic, Relja, and Andrew Zisserman. “All about VLAD.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013.

[2] Arandjelovic, Relja, et al. “NetVLAD: CNN architecture for weakly supervised place recognition.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

Advertisements
Posted in: Research Blog