"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

July 18, 2023

Lightweight Deep Learning - Model Tuning - Model Compression

Key Techniques

  • Pruning
  • Weight Sharing
  • Quantization
  • Low-rank Approximation
  • Sparse Regularization
  • Distillation

Pruning Weights

  • Motivated by how real brain learns
  • Remove weights which ๐‘ค๐‘’๐‘–๐‘”โ„Ž๐‘ก < ๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘
  • Retrain after pruning weights
  • Learn effective connections by iterative pruning
  • Between L1 and L2, which regularization is better

Criteria for Pruning 

  • Minimum weight - Pruning by the magnitude of Kernel weight (L2 norm)
  • Smallest activation - Prune Kernels that lead to feature maps with the, smallest activations

Weight Sharing

  • Compress the neural network with weight sharing
  • Use a low-cost hash function to randomly group connection weights into hash buckets

Quantization

  • Binarize all the weights and activations, turning DNN into Binarized Neural Network (BNN), in order to reduce memory consumption and increase power-efficiency.
  • The method binarizes both weights and activations, in contrast to BinaryConnect which binarizes only weights

Low-rank Approximation

  • SVD

Sparse Regularization

  • Zero-out groups of weights using sparsity-inducing penalty

Distillation

  • Based on teacher-student model
  • Given trained deep neural network (DNN), called ‘teachers’, make a compressed ‘student’ model, with similar accuracy using quantization and distillation
  • Teacher = Original Deep Model, Student = Quantized Model

Ref  - Link


Keep Exploring!!!

No comments: