Summary of Class Session
- ML Algos extract features
- DL learn features directly from data than engineered by human, Automatically learn from Data
Timeline of DL Concepts
1952 - Stochastic Gradient Descent
1958 - Perceptron - Learable Weights
1986 - Backpropagation - Multi-Layer Perceptron
1995 - Deep Convolutional CNN
- Perceptron - Single Neuron in Neural Network, Forward propagation of Information in Neural Network, Non-Linear Activation Function
- Activation Function - Sigmoid - Input real number transform output between 0 and 1, Produce Probability output ( > .5, < .5)
- "The purpose of Activation functions is to introduce non-linearities in the network"
- Linear functions produce linear decisions no matter of network size
- Non-Linearities allow us to approximate arbitrary complex functions
- Dot Product, Bias, Non-Linearity
- Inputs - Hidden Layers (States) - Outputs
- Connected - Every Node in one layer connected to node in another layer
- Objective Function (Cost Function, Emprical Loss) - Measures total loss over entire dataset
- Cross Entropy Loss can be used with models that output a probability between 0 and 1
- Mean Squared error loss can be used with regression models that output continuous real numbers
- Loss Optimization - Minimise Loss over entire training set
- Loss Landscape is Convex, Finding True Global minima is difficult
- Stable learning rates will converge smoothly to global minima
- Learning Rates (Momentum. Adagrad, Adadelta, Adam, RMSProp)
- Mini-batches lead to faster learning
- Generalize well on unseen data
- Regularization - Way to discourage models becoming too complex (Dropouts - On every iteration randomly drop some proportion of hidden neurons, Discourages memorization)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Activation functions (Non-Linear) | |
tf.nn.sigmoid(z) #Sigmoid | |
tf.nn.tanh(z) #Hyperbolic tangent | |
tf.nn.relu(z) #Rectified Linear Unit | |
#Binary Cross Entropy Loss | |
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(model.y,model.pred) | |
#Mean Squared Error Loss | |
loss = tf.reduce_mean(tf.square(tf.subtract(model.y,model.pred)) | |
#Gradient Descent | |
#Initialize | |
weights = tf.random_normal(shape,stddev=sigma) | |
#Loop until convergence | |
grads = tf.gradients(ys=loss,xs=weights) | |
weights_new = weights.assign(weights-lr*grads) | |
#learning rate algos | |
tf.train.MomentumOptimizer | |
tf.train.AdagradOptimizer | |
tf.train.AdaDeltaOptimizer | |
tf.train.AdamOptimizer | |
tf.train.RMSPropOptimizer | |
#Regularization - Dropout | |
tf.nn.dropout(hiddenLayer,p=0.5) |
Happy Mastering DL!!!
No comments:
Post a Comment