AMMAI: 3月 2016

2016年3月28日星期一

[AMMAI] [Lecture 05] - "Nonlinear dimensionality reduction by locally linear embedding,"

Paper Information:
Nonlinear dimensionality reduction by locally linear embedding," Roweis & Saul, Science, 2000.

Motivation:
The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction.

Contributions:
They introduce locally linear embedding (LLE), an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs. Moreover, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not involve local minima.

Technical summarization:
The LLE can be summarized in the Fig.

1. Select neighbors (for example by using the K nearest neighbors)
2. Reconstruct with linear weights
Minimize the cost function subject to two constrains:
first,Xi is reconstructed from neighbors
second, the rows of the weight matrix sum to one.
Weights can be solved by least-squares problems.

3.Map to embedded coordinates
It can solved by a sparse NxN eigenvalue problem

My comment:

LLE is able to identify the underlying structure of the manifold. However, PCA and MDS can not achieve.

A straightforward visualization gives a clear point of view to understand the advantage of the method. Besides, Applying LLE to various domains shows that the coordinates of these embedding spaces are related to meaningful attributes, such as the pose and expression of human faces.

2016年3月16日星期三

[AMMAI] [Lecture 04] - "Online dictionary learning for sparse coding"

Paper Information:
Mairal, Julien, et al. "Online dictionary learning for sparse coding." Proceedings of the 26th annual international conference on machine learning. ACM, 2009.

Motivation:
While learning the dictionary has proven to be critical to achieve (or improve upon) state-of-the-art results, effectively solving the corresponding optimization problem is a significant computational challenge, particularly in the context of the large-scale datasets involved in image processing tasks, that may include millions of training samples.
To address these issues, this paper propose an online approach that processes one element (or a small subset) of the training set at a time.

Contributions:
A new online optimization algorithm for dictionary learning, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples.

Technical summarization:

Classical dictionary learning techniques:
The main idea is try to model data vector as linear combinations of basis element. Therefore, loss function should be small if D is "good" at representing the signal X.

Online Dictionary Learning:
The algorithm is to alternate between the two variables, minimizing over one while keeping the other one fixed.

In practice, to speed up the algorithm it can be achieved by replacing the line 5 and 6 of Algorithm 1 with the concept of mini-batch by

To update the dictionary, the algorithm uses block-coordinate descent with warm restarts. Besides, one of its main advantages is that it is parameter-free and does not require any learning rate tuning.

My comment:
While compared to batch, online setting is more realistic and scaleable;
moreover, the parameter-free property makes the experiment stable and objective.
Besides, The application on removing the text from the damaged image is impressive.

[AMMAI] [Lecture 03] - "Iterative Quantization: A Procrustean Approach to Learning Binary Codes"

Paper Information:

Gong, Yunchao, and Svetlana Lazebnik. "Iterative quantization: A procrustean approach to learning binary codes." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011.

Motivation:

This paper wants to solve the problem of learning similarity preserving binary codes for efficient retrieval in large-scale image collections.

Contributions:

In this paper, the performance of PCA-based binary coding schemes can be greatly improved by simply rotating the projected data. Besides, an iterative quantization method for refining this rotation that is very natural and effective. Iterative quantization (ITQ), has connections to multi-class spectral clustering and to the orthogonal Procrustes problem, and it can be used both with unsupervised data embeddings such as PCA and supervised embeddings such as canonical correlation analysis (CCA).

Technical summarization:

Unsupervised Code Learning:

The major novelty of the method is that they try to preserve the locality structure of the projected data by rotating it so as to minimize the discretization error.

Binary Quantization:

Beginning with the random initialization of R, they adopt a k-means-like iterative quantization (ITQ) procedure to find a local minimum of the quantization loss (2). In each iteration, each data point is first assigned to the nearest vertex of the binary hypercube, and then R (any orthogonal c*c matrix) is updated to minimize the quantization loss given this assignment.

V = XW, W is obtained by top c eigenvectors of X (data matrix)

They alternate between updates to B and R for several iterations to find a locally optimal solution.

Leveraging Label Information:

Their method can be used with any orthogonal basis projection method. Therefore, supervised dimensionality reduction method can be used to capture the semantic structure of the dataset.

They refine their codes in a supervised setting using Canonical Correlation Analysis (CCA), which has proven to be an effective tool for extracting a common latent space from two views and is robust to noise. The goal of CCA is to find projection directions for feature and label vectors to maximize the correlation between the projected data.

My comment:

Sine there are no ground truth class labels for a dataset, they defined the ground truth by Euclidean neighbors. It may be useful while facing same situation while doing research.

As we can see from the result, PCA really helps to preserve semantic consistency for the smallest code sizes. Therefore, it is important to apply dimensionality reduction to the data in order to capture the its class structure.

2016年3月9日星期三

[AMMAI] [Lecture 02] - "Aggregating local descriptors into a compact image representation"

Paper Information:
Jégou, Hervé, et al. "Aggregating local descriptors into a compact image representation." Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010.

Motivation:
It deals with the problem of image search on a very large scale, where accuracy, efficiency and memory were considered together.

Contributions:
The first contribution is VLAD(vector of locally aggregated descriptors) derived from both BOF and Fisher kernel, that aggregates SIFT descriptors and produces a compact representation. As a second contribution, it shows the advantage of jointly optimizing the trade-off between the dimensionality reduction and the indexation algorithm.

Technical summarization:
VLAD(Vector of Locally Aggregated Descriptors):

A vector representation of an image which aggregates descriptors based on a locality criterion in feature space. It can be seen as a simplification of the Fisher kernel. As for BOF, it learns a codebook of k visual words with k-means.
The idea of the VLAD descriptor is to accumulate, for each visual word, the differences between local descriptors that associated to the visual word and visual word.

Joint optimization of reduction/indexing:

It formulate the dimension to be retained by the PCA and quantization error.

This makes a objective criterion to optimize directly the dimensionality, which is obtained by finding on the learning set the value of D′ minimizing this criterion.

My comment:
The way to visualize the VLAD descriptors is clear and intuitive. Nice visualization can always make research more crystal.

It will be more complete if the timing experiments are made more.

2016年3月28日 星期一