Summary of Class Session
- ML Algos extract features
- DL learn features directly from data than engineered by human, Automatically learn from Data
Timeline of DL Concepts
1952 - Stochastic Gradient Descent
1958 - Perceptron - Learable Weights
1986 - Backpropagation - Multi-Layer Perceptron
1995 - Deep Convolutional CNN
- Perceptron - Single Neuron in Neural Network, Forward propagation of Information in Neural Network, Non-Linear Activation Function
- Activation Function - Sigmoid - Input real number transform output between 0 and 1, Produce Probability output ( > .5, < .5)
- "The purpose of Activation functions is to introduce non-linearities in the network"
- Linear functions produce linear decisions no matter of network size
- Non-Linearities allow us to approximate arbitrary complex functions
- Dot Product, Bias, Non-Linearity
- Inputs - Hidden Layers (States) - Outputs
- Connected - Every Node in one layer connected to node in another layer
- Objective Function (Cost Function, Emprical Loss) - Measures total loss over entire dataset
- Cross Entropy Loss can be used with models that output a probability between 0 and 1
- Mean Squared error loss can be used with regression models that output continuous real numbers
- Loss Optimization - Minimise Loss over entire training set
- Loss Landscape is Convex, Finding True Global minima is difficult
- Stable learning rates will converge smoothly to global minima
- Learning Rates (Momentum. Adagrad, Adadelta, Adam, RMSProp)
- Mini-batches lead to faster learning
- Generalize well on unseen data
- Regularization - Way to discourage models becoming too complex (Dropouts - On every iteration randomly drop some proportion of hidden neurons, Discourages memorization)
Happy Mastering DL!!!
No comments:
Post a Comment