"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

March 05, 2023

LSTM One Pager

LSTM in its core, preserves information from inputs that has already passed through it using the hidden state.

Loss Computation Steps

  • The many-to-many RNN loss is computed at each time step.
  • Many to One RNN - we make the decision based on the final hidden state of this network

LSTM

  • forget irrelevant parts of previous state
  • selectively update cell state values
  • output certain parts of cell state

LSTM / GRU

  • LSTM (Long Short Term Memory): LSTM has three gates (input, output and forget gate)
  • GRU (Gated Recurring Units): GRU has two gates (reset and update gate).
  • GRU exposes the complete memory unlike LSTM, so applications which that acts as advantage might be helpful.
  • GRUs train faster and perform better than LSTMs on less training data

Gradient Updates

  • If the gradients are large Exploding gradients, learning diverges Solution: clip the gradients to a certain max value. 
  • If the gradients are small Vanishing gradients, learning very slow or stops Solution: introducing memory via LSTM, GRU, etc

Unidirectional  vs BiLSTM 

  • Unidirectional LSTM only preserves information of the past because the only inputs it has seen are from the past.
  • BiLSTM has two networks, one access pastinformation in forward direction and another access future in the reverse direction

Ref - Link

Keep Exploring!!!

No comments: