"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 17, 2018

Day #166 - Optimization for Machine Learning

Key Summary
  • Key topics are Gradient Descent,Momentum Methods,Stochastic Optimization
Motivation
  • Numerical optimization methods enable models to learn from data by adapting to parameters - They are basic engine behind ML techniques
  • Numerical Optimization solve by minimizing objective function. Small incremental changes to parameters that slowly decrease objective function towards (local) minimum
Key Notes
Gradient Descent
  • Gradient Descent - Standard Method, Update parameters by taking the negative step of gradient, Alpha-Step Size (Learning Rate). 
  • Gradient Direction is the direction of the greatest reduction in the objective function (Steepest descent)
  • Gradient will go downhill, Optimizing certain local approximation of objective function
  • Convergence Theory - These bounds must work for all objective functions in given class
  • Theory is not an absolute prescription on what to do
  • Minimize quadratic objective function
Momentum
  • Way to accelerate Gradient Descent
  • Accelerate directions that point down hill consistently
  • Classical Momentum, Nestervo's version
  • Momentum will remember direction
  • Consistent Direction pointing downwards
  • Tikhonov Regularization
Alternative Curvature matrices
  • Generalized Gauss-Newton Matrix - Only Apply for a certain special structure
  • Fisher information matrix 
  • Empirical Fisher information matrix - If Eigen Vectors aligned with the coordinate axis
  • Block Diagonal approximations 
  • Stochastic Optimization - Subsample data in training set
Mini-batching
  • Subsample mini batch of training cases
Stochastic gradient descent
  • Replace gradient with mini-batch
  • Will affect convergence theory
  • Polyak averaging
  • Stochastic optimization resembles deterministic optimization
  • Stochastic Gradient - Regular Gradient + Influence


Happy Mastering DL!!!

No comments: