- Key topics are Gradient Descent,Momentum Methods,Stochastic Optimization
- Numerical optimization methods enable models to learn from data by adapting to parameters - They are basic engine behind ML techniques
- Numerical Optimization solve by minimizing objective function. Small incremental changes to parameters that slowly decrease objective function towards (local) minimum
Gradient Descent
- Gradient Descent - Standard Method, Update parameters by taking the negative step of gradient, Alpha-Step Size (Learning Rate).
- Gradient Direction is the direction of the greatest reduction in the objective function (Steepest descent)
- Gradient will go downhill, Optimizing certain local approximation of objective function
- Convergence Theory - These bounds must work for all objective functions in given class
- Theory is not an absolute prescription on what to do
- Minimize quadratic objective function
- Way to accelerate Gradient Descent
- Accelerate directions that point down hill consistently
- Classical Momentum, Nestervo's version
- Momentum will remember direction
- Consistent Direction pointing downwards
- Tikhonov Regularization
- Generalized Gauss-Newton Matrix - Only Apply for a certain special structure
- Fisher information matrix
- Empirical Fisher information matrix - If Eigen Vectors aligned with the coordinate axis
- Block Diagonal approximations
- Stochastic Optimization - Subsample data in training set
- Subsample mini batch of training cases
- Replace gradient with mini-batch
- Will affect convergence theory
- Polyak averaging
- Stochastic optimization resembles deterministic optimization
- Stochastic Gradient - Regular Gradient + Influence
Happy Mastering DL!!!
No comments:
Post a Comment