Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Back to Basics - Fundamentals - RNN

March 01, 2021

Back to Basics - Fundamentals - RNN - Transformers

It needs a bit more careful *attention* to understand the crux of transformers. This lecture was useful

Slides - Link

Session -

Transfer Learning

Use Neural Network on imagenet and finetune on custom data
Better performance than anything else

Convert words to vectors

One hot encoding
Scales poorly with vocabulary size
Sparse and high dimensional
Map one hot to dense vectors (Embedding matrix)
Finding Embedding matrix - Learn as part of tasks
Learn the Language model
Training on large corpus of text - wikipedia
N-Grams, Sliding Window forming rows
Binary classification - 0 / 1 - Neighbouring word or not

NLP Imagenet moment - Elmo / ULMfit

ELMO - bidirectional stack LSTM
ULMfit

Good Paper Read - SQuAD: 100,000+ Questions for Machine Comprehension of Text

Attention

Only attention no LSTM
Self-attention, positional encoding, Layer normalization
Attention and Fully Connected Layers

Self Attention

Input sequence of vectors
Output weighted sum of input sequence

Learn weights

Compute attention weight for its own output
Compute every other vector to compute attention weights for its own output y_i (query)
Compare to every other vector to compute attention weight w_ij for output y_j (key)
Summed with other vectors to form the results of the attention weighted sum (value)

Multihead attention

Weight matrices - query, key, value weights
Multiple heads of attention just mean learning different sets of query, key and value matrices simultaneously

Transformer

Self attention layer - layer normalization - dense layer

Layer Normalization

Data scaling, weight initialization
Rest things between uniform mean and standard deviation

Position Embedding

Word embedding depends on word
Position embedding depends on position
Combine both and run through transformers
Both position and content reasoned

Attention is all you need

Translation
Encoder - Decoder architecture

GPT - Generative pretrained transformer

Generating text
ELMo, ULMFIT
Preceeding words
GPT2 1.5 Billion parameters

BERT

Bidirectional encoder representations from transformers

T5 - Text to Text Transfer Transformer

Input and output as text streams
11 billion parameters

Keep Thinking!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

March 01, 2021

Back to Basics - Fundamentals - RNN - Transformers

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts