Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Day #129

September 17, 2018

Day #129 - Neural Network for Words

Bag of Words

Vectorize each word with one hot encoder vector
Bag of Words representation - Sum of the individual One hot encoded vectors
BOW - Sum of sparse one hot encoded vectors

Neural Network for words

Dense Representation
Each word represented by Dense Vector
Word2Vec Embedding - Done in unsupervised manner
Sum of word2vec is feature representation
Convolutional filters to compute 2-gram words
Similar words have similar cosine distance in word2vec
Good Embedding + Convolution we can get more high level meaning
Maximum pooling over time (Just like we do in images) - Input Sequence - Convolutional filter - Slide in one direction - Maximum Activation - Select Output

Architecture

3,4,5 gram window - For each ngram we learn 100 filters
Obtain embedding of input sequence
Apply multi layer perceptron on those 300 features

Paper - https://arxiv.org/pdf/1408.5882.pdf

Apply Convolutions for Text

One hot encoded characters
1000 Kernels, 1000 filters
Apply same pattern Convolution - Pooling - Convolution - Pooling
Moving window with stride of two, Obtain pooling output

Encoder-Decoder Architecture

Attention Mechanism
Encoder - Hidden representation of input sentence (Encodes thought of sentence)
Types of encoders (RNN, CNN, Hierarchical structures)
Decoder - Decode the task / sequence from other language
LSTM / RNN encodes input sentence (End of Sentence token)
Decoding - Conditional Language Modelling
Feed output of previous state as input for next state
Stack several layers of LSTM model
Every state of decoder has three errors (Error from previous state, Error from context vector, Current input)

Encoder - Maps the source sequence to hidden vector (RNN)
Decoder - Perform Language modelling of given vector (RNN) but more inputs, three errors
Prediction - Conditional probability (Softmax)

Attention Mechanism

Powerful Technique in Neural Networks
Encoder has H states, Decoder - X states
Helps focus on different parts of sentence

Compute Similarities

Additive Attention
Multiplicative Attention
Dot Product

Local Attention

Predict best place

Happy Learning!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

September 17, 2018

Day #129 - Neural Network for Words

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts