Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Transformer

August 22, 2021

Transformer - Let's relearn

These topics come and on and off. I was able to catch up with sliding windows, CNN, RNN, LSTM. Then a bit of Transformers and also how does it work in vision too :)

AI / ML won't let us feel guilty you have to still learn the basics.

Paper - Attention Is All You Need

Key Lessons

Representation of the sequence
Intra-attention of sequence order
Encoder-decoder structure
Encoder - a sequence of continuous representations
The decoder then generates an output sequence (Positional encoding)
Multi-Head Attention consists of several attention layers

Unofficial Walkthrough of Vision Transformer

Image is also pixels, learning pixel representations then the same encoding / decoding can be applied.

Transformers for Image Recognition at Scale

Key Notes

Input image as a sequence of image patches, similar to the sequence of word embeddings
The Vision Transformer treats an input image as a sequence of patches
ViT can learn features hard-coded into CNNs (such as awareness of grid structure)
Image classification with Vision Transformer
Image classification with Vision Transformer

AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

Split an image into fixed-size patches
Linearly embed each of them
Add position embeddings
Feed the resulting sequence of vectors to a standard Transformer encoder

Do Vision Transformers See Like Convolutional Neural
Networks?
pdf: https://t.co/5Yz5F2PZwO
abs: https://t.co/bpHO2rOYDv

find striking differences between the two architectures, such as ViT having more uniform representations across all layers pic.twitter.com/0KT0KE16f9
— AK (@ak92501) August 20, 2021

Do Vision Transformers See Like Convolutional Neural Networks?

Lower half of ResNet layers are similar to around the
lowest quarter of ViT layers
Highest ViT layers dissimilar to lower and higher ResNet layers.

Keep Thinking!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

August 22, 2021

Transformer - Let's relearn

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts