"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

November 22, 2018

Day #152 - MIT 6.S191: Introduction to Deep Learning - Class 1

Getting back into Another DL Course along with Code Examples.

Summary of Class Session
  • ML Algos extract features
  • DL learn features directly from data than engineered by human, Automatically learn from Data
Timeline of DL Concepts
1952 - Stochastic Gradient Descent
1958 - Perceptron - Learable Weights
1986 - Backpropagation - Multi-Layer Perceptron
1995 - Deep Convolutional CNN
  • Perceptron - Single Neuron in Neural Network, Forward propagation of Information in Neural Network, Non-Linear Activation Function
  • Activation Function - Sigmoid - Input real number transform output between 0 and 1, Produce Probability output ( > .5, < .5)
  • "The purpose of Activation functions is to introduce non-linearities in the network"
  • Linear functions produce linear decisions no matter of network size
  • Non-Linearities allow us to approximate arbitrary complex functions
  • Dot Product, Bias, Non-Linearity
  • Inputs - Hidden Layers (States) - Outputs
  • Connected - Every Node in one layer connected to node in another layer
  • Objective Function (Cost Function, Emprical Loss) - Measures total loss over entire dataset
  • Cross Entropy Loss can be used with models that output a probability  between 0 and 1
  • Mean Squared error loss can be used with regression models that output continuous real numbers
  • Loss Optimization - Minimise Loss over entire training set
  • Loss Landscape is Convex, Finding True Global minima is difficult
  • Stable learning rates will converge smoothly to global minima
  • Learning Rates (Momentum. Adagrad, Adadelta, Adam, RMSProp)
  • Mini-batches lead to faster learning
  • Generalize well on unseen data
  • Regularization - Way to discourage models becoming too complex (Dropouts - On every iteration randomly drop some proportion of hidden neurons, Discourages memorization)






#Activation functions (Non-Linear)
tf.nn.sigmoid(z) #Sigmoid
tf.nn.tanh(z) #Hyperbolic tangent
tf.nn.relu(z) #Rectified Linear Unit
#Binary Cross Entropy Loss
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(model.y,model.pred)
#Mean Squared Error Loss
loss = tf.reduce_mean(tf.square(tf.subtract(model.y,model.pred))
#Gradient Descent
#Initialize
weights = tf.random_normal(shape,stddev=sigma)
#Loop until convergence
grads = tf.gradients(ys=loss,xs=weights)
weights_new = weights.assign(weights-lr*grads)
#learning rate algos
tf.train.MomentumOptimizer
tf.train.AdagradOptimizer
tf.train.AdaDeltaOptimizer
tf.train.AdamOptimizer
tf.train.RMSPropOptimizer
#Regularization - Dropout
tf.nn.dropout(hiddenLayer,p=0.5)

Happy Mastering DL!!!

No comments: