Download PPT – Google Docs
Vector of locally aggregated descriptors (VLAD)  is a simple and popular technique for computing a fingerprint of an image for place recognition. It basically forms say K=64 clusters of SIFT like descriptors (descriptors at SIFT feature points). Then, for every descriptor subtracts it from cluster center and adds it up. This process is repeated for every cluster center.
NetVLAD is an extension of this strategy to learn filters and cluster centers which help distinguish images for place recognition. I had given a short talk on details on this method, which can be accessed from [HERE].
The NetVLAD computation is summarized as follows, with as trainable parameters for every k between 1, …, K (say 64).
Note: The NetVLAD framework relies on this cool trick. I had talked in an earlier blog post which can be access from here.
 Arandjelovic, Relja, and Andrew Zisserman. “All about VLAD.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013.
 Arandjelovic, Relja, et al. “NetVLAD: CNN architecture for weakly supervised place recognition.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.