"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

November 23, 2018

Day #153 - Sequence Modelling of Neural Networks

  • Sequence modelling in google translations
  • Self parking car - Sequence modelling
Challenges
  • Sequence modelling - predict the next word
  • ML Models are not designed for sequences
  • FFN specifies size of input at outside (Fixed)
  • Sequences are variable length inputs
  • Use all information available in sequence and also fixed length vector
  • bow (bag of words), Each slot represents word and number is occurences of it, Vector size remains same
  • Sequential information lost in bow
  • Preserve sequence but also maintain length
To Model Sequences
  • Deal with variable length sequences
  • Maintain Sequence Order
  • Keep track of long term dependencies
  • Share parmeters across the sequence
RNN (Recurrent Neural Network)
  • Architected same as NN
  • Each Hidden unit is using slightly different function
  • HU - Function of input from its own previous output (Cell State)
  • HU - Input + Previous Cell State = New input at timestamp
  • Parameter sharing is taken care
  • Sn - contain information of all past timestamps
  • Solves long term dependencies
Train RNN
  • Similar to NN
  • Backpropagation through time (GD - Take derivative of loss with respect to each parameter, Shift parameters in opposite direction to minimise loss)
  • Loss at each time step, Total loss = sum of loss at every time step
  • Backpropagation through time
  • Vanishing Gradient problems - By time stamp increase gradient becomes longer and longer
Methods to Address Bias in RNN
  • Activation functions (RELU, tanh, Sigmoid)
  • Initializing weights to something like identity matrix (prevent shrinking product)
  • Add more complex cells (Gated Cell)
  • RNN vs LSTM, GRU
  • Long Short Term Memory (Keep Memory Unchanged for many time steps)
LSTMs Overview
  • 3 Step process
  • Step 1 - Forget irrelevant part of previous states (Remember Gate)
  • Step 2 - Selectively update cell States (seperate from whats outputted)
  • Step 3 - Output Certain parts of cell state  
  • 3 Steps implemented using Logic Gates
  • Logic gates implemented using Sigmoid functions
  • Update happens through additive function
  • Final Cell State Summarizes all information from the sequence
  • Music generation using RNN
  • Machine Translation (Two RNN side by side Encoder / Decoder)
  • Final cell state is passed, Decoder figures out and produces in different language
  • With Attention in Machine Translation we take weighted sum of all previous cell states



Happy Mastering DL!!!

No comments: