"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

September 17, 2018

Day #129 - Neural Network for Words

Bag of Words
  • Vectorize each word with one hot encoder vector
  • Bag of Words representation - Sum of the individual One hot encoded vectors
  • BOW - Sum of sparse one hot encoded vectors
Neural Network for words
  • Dense Representation
  • Each word represented by Dense Vector
  • Word2Vec Embedding - Done in unsupervised manner
  • Sum of word2vec is feature representation
  • Convolutional filters to compute 2-gram words
  • Similar words have similar cosine distance in word2vec
  • Good Embedding + Convolution we can get more high level meaning 
  • Maximum pooling over time (Just like we do in images) - Input Sequence - Convolutional filter - Slide in one direction - Maximum Activation - Select Output
Architecture
  • 3,4,5 gram window - For each ngram we learn 100 filters
  • Obtain embedding of input sequence
  • Apply multi layer perceptron on those 300 features
Paper - https://arxiv.org/pdf/1408.5882.pdf

Apply Convolutions for Text
  • One hot encoded characters
  • 1000 Kernels, 1000 filters
  • Apply same pattern Convolution - Pooling - Convolution - Pooling
  • Moving window with stride of two, Obtain pooling output
Encoder-Decoder Architecture
  • Attention Mechanism
  • Encoder - Hidden representation of input sentence (Encodes thought of sentence)
  • Types of encoders (RNN, CNN, Hierarchical structures)
  • Decoder - Decode the task / sequence from other language
  • LSTM / RNN encodes input sentence (End of Sentence token)
  • Decoding - Conditional Language Modelling
  • Feed output of previous state as input for next state
  • Stack several layers of LSTM model
  • Every state of decoder has three errors (Error from previous state, Error from context vector, Current input)
Encoder - Maps the source sequence to hidden vector (RNN)
Decoder - Perform Language modelling of given vector (RNN) but more inputs, three errors
Prediction - Conditional probability (Softmax)

Attention Mechanism
  • Powerful Technique in Neural Networks
  • Encoder has H states, Decoder - X states
  • Helps focus on different parts of sentence
Compute Similarities
  • Additive Attention
  • Multiplicative Attention
  • Dot Product
Local Attention 
  • Predict best place
Happy Learning!!!

No comments: