LSTM in its core, preserves information from inputs that has already passed through it using the hidden state.
Loss Computation Steps
- The many-to-many RNN loss is computed at each time step.
- Many to One RNN - we make the decision based on the final hidden state of this network
LSTM
- forget irrelevant parts of previous state
- selectively update cell state values
- output certain parts of cell state
LSTM / GRU
- LSTM (Long Short Term Memory): LSTM has three gates (input, output and forget gate)
- GRU (Gated Recurring Units): GRU has two gates (reset and update gate).
- GRU exposes the complete memory unlike LSTM, so applications which that acts as advantage might be helpful.
- GRUs train faster and perform better than LSTMs on less training data
Gradient Updates
- If the gradients are large Exploding gradients, learning diverges Solution: clip the gradients to a certain max value.
- If the gradients are small Vanishing gradients, learning very slow or stops Solution: introducing memory via LSTM, GRU, etc
Unidirectional vs BiLSTM
- Unidirectional LSTM only preserves information of the past because the only inputs it has seen are from the past.
- BiLSTM has two networks, one access pastinformation in forward direction and another access future in the reverse direction
Ref - Link
Keep Exploring!!!
No comments:
Post a Comment