Key Techniques
- Pruning
- Weight Sharing
- Quantization
- Low-rank Approximation
- Sparse Regularization
- Distillation
Pruning Weights
- Motivated by how real brain learns
- Remove weights which ๐ค๐๐๐โ๐ก < ๐กโ๐๐๐ โ๐๐๐
- Retrain after pruning weights
- Learn effective connections by iterative pruning
- Between L1 and L2, which regularization is better
Criteria for Pruning
- Minimum weight - Pruning by the magnitude of Kernel weight (L2 norm)
- Smallest activation - Prune Kernels that lead to feature maps with the, smallest activations
Weight Sharing
- Compress the neural network with weight sharing
- Use a low-cost hash function to randomly group connection weights into hash buckets
Quantization
- Binarize all the weights and activations, turning DNN into Binarized Neural Network (BNN), in order to reduce memory consumption and increase power-efficiency.
- The method binarizes both weights and activations, in contrast to BinaryConnect which binarizes only weights
Low-rank Approximation
- SVD
Sparse Regularization
- Zero-out groups of weights using sparsity-inducing penalty
Distillation
- Based on teacher-student model
- Given trained deep neural network (DNN), called ‘teachers’, make a compressed ‘student’ model, with similar accuracy using quantization and distillation
- Teacher = Original Deep Model, Student = Quantized Model
Ref - Link
Keep Exploring!!!
No comments:
Post a Comment