Download PPT – Google Docs

Vector of locally aggregated descriptors (VLAD) [1] is a simple and popular technique for computing a fingerprint of an image for place recognition. It basically forms say K=64 clusters of SIFT like descriptors (descriptors at SIFT feature points). Then, for every descriptor subtracts it from cluster center and adds it up. This process is repeated for every cluster center.

NetVLAD[2] is an extension of this strategy to learn filters and cluster centers which help distinguish images for place recognition. I had given a short talk on details on this method, which can be accessed from [HERE].

The NetVLAD computation is summarized as follows, with W_k, b_k,  C_k as trainable parameters for every k between 1, …, K (say 64).

Screenshot from 2017-02-15 12:34:31.png

Note: The NetVLAD framework relies on this cool trick. I had talked in an earlier blog post which can be access from here.

References

[1] Arandjelovic, Relja, and Andrew Zisserman. “All about VLAD.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013.

[2] Arandjelovic, Relja, et al. “NetVLAD: CNN architecture for weakly supervised place recognition.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s