Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Day #152 - MIT 6.S191: Introduction to Deep Learning

November 22, 2018

Getting back into Another DL Course along with Code Examples.

Summary of Class Session

ML Algos extract features
DL learn features directly from data than engineered by human, Automatically learn from Data

Timeline of DL Concepts

1952 - Stochastic Gradient Descent

1958 - Perceptron - Learable Weights

1986 - Backpropagation - Multi-Layer Perceptron

1995 - Deep Convolutional CNN

Perceptron - Single Neuron in Neural Network, Forward propagation of Information in Neural Network, Non-Linear Activation Function
Activation Function - Sigmoid - Input real number transform output between 0 and 1, Produce Probability output ( > .5, < .5)
"The purpose of Activation functions is to introduce non-linearities in the network"
Linear functions produce linear decisions no matter of network size
Non-Linearities allow us to approximate arbitrary complex functions
Dot Product, Bias, Non-Linearity
Inputs - Hidden Layers (States) - Outputs
Connected - Every Node in one layer connected to node in another layer
Objective Function (Cost Function, Emprical Loss) - Measures total loss over entire dataset
Cross Entropy Loss can be used with models that output a probability between 0 and 1
Mean Squared error loss can be used with regression models that output continuous real numbers
Loss Optimization - Minimise Loss over entire training set
Loss Landscape is Convex, Finding True Global minima is difficult
Stable learning rates will converge smoothly to global minima
Learning Rates (Momentum. Adagrad, Adadelta, Adam, RMSProp)
Mini-batches lead to faster learning
Generalize well on unseen data
Regularization - Way to discourage models becoming too complex (Dropouts - On every iteration randomly drop some proportion of hidden neurons, Discourages memorization)