Hinton, Geoffrey, et al. "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." Signal Processing Magazine, IEEE 29.6 (2012): 82-97.
Motivation:
Gaussian mixture models (GMMs) are used to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. However, GMMs have serious drawback statistically inefficient for modeling data that lie on or near a nonlinear manifold in the data space. Deep neural networks methods have been shown to being better than GMMs on a variety of speech recognition benchmarks.
Contributions:
It wants to demonstrate the progress of DNN methods.
Technical summarization:
Restricted Boltzmann machine (RBM):
It's an approximate learning algorithm which consists of a layer of stochastic binary “visible” units that represent binary input data connected to a layer of stochastic binary hidden units that learn to model significant relationship between the visible units. It's a type of MRF but it has bipartie graph no visible-visible or hidden-hidden connections.
Stacking RBMs to make a deep belief network:
For real-valured data, Gaussian–Bernoulli RBM (GRBM) is adopted. By stacking RBMs,it can represent progressively more complex statistical structure in the data. After learning a DBN by training a stack of RBMs. It can be used to initialize all the feature detecting layers of a deterministic feedforward DNN. Then just add a final softmax layer and train the whole DNN discriminatively.
My comment:
Phonetic classification and recognition on TIMIT:
TIMIT is the bench-marked dataset for speech recognition. It is always helpful that we can find a bench-marked dataset related to our research because many existing techniques have already tested on the dataset. It greatly reduce the time for duplicating others' work. For each type of DBN-DNN the architecture that performed best on the development set is reported.
This paper metions a lot in pre-training even a much faster approximate method CD. Recent CNN methods are all based on the pre-training on ImageNet. It Indeed helps the related works for saving the time from tedious training and overfitting. However, parallelizing the fine-tuning of DNNs is still a major issue. Combing with the concept of "Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding" may help to improve fine-tuning time.


沒有留言:
張貼留言