- Vectorize each word with one hot encoder vector
- Bag of Words representation - Sum of the individual One hot encoded vectors
- BOW - Sum of sparse one hot encoded vectors
- Dense Representation
- Each word represented by Dense Vector
- Word2Vec Embedding - Done in unsupervised manner
- Sum of word2vec is feature representation
- Convolutional filters to compute 2-gram words
- Similar words have similar cosine distance in word2vec
- Good Embedding + Convolution we can get more high level meaning
- Maximum pooling over time (Just like we do in images) - Input Sequence - Convolutional filter - Slide in one direction - Maximum Activation - Select Output
- 3,4,5 gram window - For each ngram we learn 100 filters
- Obtain embedding of input sequence
- Apply multi layer perceptron on those 300 features
Apply Convolutions for Text
- One hot encoded characters
- 1000 Kernels, 1000 filters
- Apply same pattern Convolution - Pooling - Convolution - Pooling
- Moving window with stride of two, Obtain pooling output
- Attention Mechanism
- Encoder - Hidden representation of input sentence (Encodes thought of sentence)
- Types of encoders (RNN, CNN, Hierarchical structures)
- Decoder - Decode the task / sequence from other language
- LSTM / RNN encodes input sentence (End of Sentence token)
- Decoding - Conditional Language Modelling
- Feed output of previous state as input for next state
- Stack several layers of LSTM model
- Every state of decoder has three errors (Error from previous state, Error from context vector, Current input)
Decoder - Perform Language modelling of given vector (RNN) but more inputs, three errors
Prediction - Conditional probability (Softmax)
Attention Mechanism
- Powerful Technique in Neural Networks
- Encoder has H states, Decoder - X states
- Helps focus on different parts of sentence
- Additive Attention
- Multiplicative Attention
- Dot Product
- Predict best place
No comments:
Post a Comment