"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

January 13, 2019

Day #188 - An Introduction to LSTMs in Tensorflow

Neural Networks
  • Input Layer - hidden Layer - Output Layer
  • Rewrite as matrix multiplication
  •  Activation functions - Non-linear transformation for data - Data is non-linear - Build models that have non-linear capacity


  • Iteratively find until it converge, Direction of descent - SGD
  • Backprop to find gradient





Model Sequence
  • Represent as bag of words
  • Represent sentence as vectors
  • BOW does not preserve order
  • Longer feature vector to maintain order
Markov Models
  • Rules
  • States
  • Transitions
  • Chances for next word prediction using markov model
  • State depends on previous state
RNN and LSTM
  • Sequence is sentence, function
  • Success of deep models (Alexa)





RNN Key Needs
  • Maintain Sequence
  • Learn the order
  • Preserve History
  • Producing function of previous state 
RNN
  • W, U stay the same
  • Cell state at time n contain information from all past time stamps
  • Compute function of all previous states
  • Machine Translation - 2 RNN - Encoder - Decoder Model
  • Last Cell State is representation of sentence

Train RNN
  • Backpropagation
  • Added time dimension
  • Chain rule for RNN and dependency on previous states
  • Cell state depends on all previous time cell states
  • Backpropagation through time
  • Hard to train due to vanishing gradient problem
  • Capture short term dependencies
  • Initialize weights differently
  • Gated Cell (Really Effective). Recurrent unit with several steps of logic gates
  • Gates decide what information to multiply
  • Functions of LSTM - Forgetting, Selective Updates, Output Certain Parts of cell
  • Fixed length encoding is a problem for encoder-decoder, Solution is attend over all encoder states







Tensorflow
  • DL Framework
  • GPU Acceleration
  • Code Reusability
  • TPU
TF Basics
  • Session
  • Computation Graph
  • Feed data in, Get Results
  • Variables, Sessions, Tensors
  • Perceptron classifier
  • Share weights
LSTM Example
  • Input Gate / Forget Gate / Update Gate / Output
  • Code - https://github.com/nicholaslocascio/bcs-lstm







Next Talks List
Yann LeCun - How does the brain learn so much so quickly? (CCN 2017)
Frank Hutter and Joaquin Vanschoren: Automatic Machine Learning (NeurIPS 2018 Tutorial)
Fernanda ViƩgas and Martin Wattenberg: Visualization for Machine Learning (NeurIPS 2018 Tutorial)
Memory: why it matters and how it works
The Neuroscience of Emotions
NIPS 2018 Videos


Happy Mastering DL!!!

No comments: