Fei-Fei, Li, and Pietro Perona. "A bayesian hierarchical model for learning natural scene categories." Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 2. IEEE, 2005.
Motivation:
Hand-annotating images is tedious and expensive, therefore, they propose an approach that can recognize natural scene categories by unsupervised learning.

Contributions:
An algorithm learning relevant intermediate representations of scenes automatically and without supervision. Besides, it is flexible and can group images into sensible hierarchy.
Technical summarization:

The goal of learning is to achieve a model that best represents the distribution of these codewords in each category of scenes. In recognition, therefore, they first identify all the codewords in the unknown image. Then they find the category model that fits best the distribution of the codewords of the particular image.
Codebook formation

Given the collection of detected patches from the training images of all categories, They learn the codebook by k-means.
Model Structure

1. Choose a category label c
2. Draw a parameter that determines the distribution of the intermediate themes by choosing

3. For each N patches Xn in the image
1. Choose a theme Zn
2. Choose a patch according to the number of themes and the total number of codewords in the codebook
Bayesian Decision
Given x, they want to compute the probability of each scene class.
Therefor,the goal is to maximize the log likelihood term log p(x|θ, β, c) by estimating the optimal θ and β. By using Jensen's inequality, the log likelihood can be bounded. Consequently, by maximizing the lower bound L(γ, φ; θ, β) then with the EM algorithm in turn estimate the model parameters θ and β.

With clear visualization, it gives a intuitive understanding that the distribution of the 40 intermediate themes and the distribution of codeword. Besides,for the incorrectly categorized images, the number of significant codewords of the model tends to occur less likely. It is a great finding that means there not enough reliable codewords found in the image.

Based on theme distribution, It demonstrate that the model can group images into hierarchy with the semantic meaning.

Though they did not implement the relative algorithm on the same dataset, it is still convincing that their method using unsupervised indeed have great performance.
沒有留言:
張貼留言