"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

August 22, 2021

Transformer - Let's relearn

Transformer - Let's relearn

These topics come and on and off. I was able to catch up with sliding windows, CNN, RNN, LSTM. Then a bit of Transformers and also how does it work in vision too :)

AI / ML won't let us feel guilty you have to still learn the basics.

Paper - Attention Is All You Need

Key Lessons

  • Representation of the sequence
  • Intra-attention of sequence order
  • Encoder-decoder structure
  • Encoder - a sequence of continuous representations
  • The decoder then generates an output sequence (Positional encoding)
  • Multi-Head Attention consists of several attention layers

Unofficial Walkthrough of Vision Transformer

  • Image is also pixels, learning pixel representations then the same encoding / decoding can be applied.

Transformers for Image Recognition at Scale

Key Notes

AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

  • Split an image into fixed-size patches
  • Linearly embed each of them
  • Add position embeddings
  • Feed the resulting sequence of vectors to a standard Transformer encoder

Do Vision Transformers See Like Convolutional Neural Networks?

  • Lower half of ResNet layers are similar to around the
  • lowest quarter of ViT layers
  • Highest ViT layers dissimilar to lower and higher ResNet layers.

Keep Thinking!!!

No comments: