AMMAI: [AMMAI] [Lecture 13] - "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups."

Paper Information:
Hinton, Geoffrey, et al. "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." Signal Processing Magazine, IEEE 29.6 (2012): 82-97.

Motivation:

Gaussian mixture models (GMMs) are used to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. However, GMMs have serious drawback statistically inefficient for modeling data that lie on or near a nonlinear manifold in the data space. Deep neural networks methods have been shown to being better than GMMs on a variety of speech recognition benchmarks.

Contributions:

It wants to demonstrate the progress of DNN methods.

Technical summarization:

Restricted Boltzmann machine (RBM):

It's an approximate learning algorithm which consists of a layer of stochastic binary “visible” units that represent binary input data connected to a layer of stochastic binary hidden units that learn to model significant relationship between the visible units. It's a type of MRF but it has bipartie graph no visible-visible or hidden-hidden connections.

Stacking RBMs to make a deep belief network:

For real-valured data, Gaussian–Bernoulli RBM (GRBM) is adopted. By stacking RBMs,it can represent progressively more complex statistical structure in the data. After learning a DBN by training a stack of RBMs. It can be used to initialize all the feature detecting layers of a deterministic feedforward DNN. Then just add a final softmax layer and train the whole DNN discriminatively.

My comment:

Phonetic classification and recognition on TIMIT:

TIMIT is the bench-marked dataset for speech recognition. It is always helpful that we can find a bench-marked dataset related to our research because many existing techniques have already tested on the dataset. It greatly reduce the time for duplicating others' work. For each type of DBN-DNN the architecture that performed best on the development set is reported.

This paper metions a lot in pre-training even a much faster approximate method CD. Recent CNN methods are all based on the pre-training on ImageNet. It Indeed helps the related works for saving the time from tedious training and overfitting. However, parallelizing the fine-tuning of DNNs is still a major issue. Combing with the concept of "Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding" may help to improve fine-tuning time.

AMMAI

2016年5月24日星期二

[AMMAI] [Lecture 13] - "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups."

沒有留言:

張貼留言

2016年5月24日 星期二

[AMMAI] [Lecture 13] - "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups."

沒有留言:

張貼留言

2016年5月24日星期二