Jégou, Hervé, et al. "Aggregating local descriptors into a compact image representation." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010.
Motivation:
It deals with the problem of image search on a very large scale, where accuracy, efficiency and memory were considered together.
Contributions:
The first contribution is VLAD(vector of locally aggregated descriptors) derived from both BOF and Fisher kernel, that aggregates SIFT descriptors and produces a compact representation. As a second contribution, it shows the advantage of jointly optimizing the trade-off between the dimensionality reduction and the indexation algorithm.
Technical summarization:
VLAD(Vector of Locally Aggregated Descriptors):
A vector representation of an image which aggregates descriptors based on a locality criterion in feature space. It can be seen as a simplification of the Fisher kernel. As for BOF, it learns a codebook of k visual words with k-means.
The idea of the VLAD descriptor is to accumulate, for each visual word, the differences between local descriptors that associated to the visual word and visual word.
Joint optimization of reduction/indexing:
It formulate the dimension to be retained by the PCA and quantization error.

This makes a objective criterion to optimize directly the dimensionality, which is obtained by finding on the learning set the value of D′ minimizing this criterion.
My comment:
The way to visualize the VLAD descriptors is clear and intuitive. Nice visualization can always make research more crystal.
It will be more complete if the timing experiments are made more.

沒有留言:
張貼留言