"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;
Showing posts with label LSTM. Show all posts
Showing posts with label LSTM. Show all posts

March 05, 2023

LSTM One Pager

LSTM in its core, preserves information from inputs that has already passed through it using the hidden state.

Loss Computation Steps

  • The many-to-many RNN loss is computed at each time step.
  • Many to One RNN - we make the decision based on the final hidden state of this network

LSTM

  • forget irrelevant parts of previous state
  • selectively update cell state values
  • output certain parts of cell state

LSTM / GRU

  • LSTM (Long Short Term Memory): LSTM has three gates (input, output and forget gate)
  • GRU (Gated Recurring Units): GRU has two gates (reset and update gate).
  • GRU exposes the complete memory unlike LSTM, so applications which that acts as advantage might be helpful.
  • GRUs train faster and perform better than LSTMs on less training data

Gradient Updates

  • If the gradients are large Exploding gradients, learning diverges Solution: clip the gradients to a certain max value. 
  • If the gradients are small Vanishing gradients, learning very slow or stops Solution: introducing memory via LSTM, GRU, etc

Unidirectional  vs BiLSTM 

  • Unidirectional LSTM only preserves information of the past because the only inputs it has seen are from the past.
  • BiLSTM has two networks, one access pastinformation in forward direction and another access future in the reverse direction

Ref - Link

Keep Exploring!!!