Han, Song, Huizi Mao, and William J. Dally. "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding." arXiv preprint arXiv:1510.00149 (2015).
Motivation:
The demanding of running neural network in embeded systems becomes more and more popular. However, the limited harware resources is obstacle to the application.

Contributions:
They reduce the sorage and energy requited of large network with pruning, trained quantization, Huffmand coding.
Technical summarization:
3 blocks below will be describe in the following parts.
Network pruning:
Frist,It learn the connectivity via normal network training. Second,
weights below a threshold are removed. Finally, the remaining sparse connections will be retrained.
Trained quantization and weight sharing
They use k-means clustering to find the shared weights. Since centroid initialization impacts the quality of clustering, larger weights are quite vital; therefore, linear initialization is choosen to initialization.
Huffman coding
The main concept of Huffman coding is that more common symbols are represented with fewer bits.
My comment:
This paper indeed makes a lot visualization to crystallize the abstract weight distribution of CNN. Two examples below will be shown.
It's quite a straight way to view distribution of weight in histogram. Furthermore, from the weight distribution the bias is shown clearly; therefore, it's a concrete proof to show the reason to use Huffman coding.
Weights' distribution of conv3 layer is shown above. As we can see, it forms a bimodal distribution.
Following picture shows that overhead of codebook is very small and often negligible. As the first time I see the use of codebook, I thought it will cost some space. This picture shows it will not cost too much space consumption. Therefor, the decoding time might also not be a problem because the amount of codebook is quite small.











