"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 02, 2023

Transformer Notes

  • CNN - all the outputs are self dependent 
  • CNN/FF-Nets all the outputs are self dependent Feed-forward nets don’t remember historic input data
  • RNN - Hidden state memory, correlation between previous input to the next input, Cell state, Forget Gate
  • RNN - learn to keep only relevant information to make predictions and forget non relevant data RNN - Conveyer belt
  • RNN Perform well when the input data is interdependent in a sequential pattern correlation between previous input to the next input introduce bias based on your previous output

Transformer

  • Positional embeddings - the order and position of words in a sequence 
  • Self attention - allows each token to dynamically weigh and integrate information from all other positions 
  • The self-attention mechanism is a type of attention mechanism which allows every element of a sequence to interact with every others and find out who they should pay more attention to.
  • Multi-head attention runs multiple self-attention processes in parallel, capturing diverse aspects of the data

Keep Exploring!!!

No comments: