Paper #1 - Sequence Learning
Captured are summary of keypoints for Sequence Learning paper
Introduction
- Multilayered Long Short-Term Memory (LSTM)
- DNNs can only be applied to problems whose inputs and targets can be sensibly encoded with vectors of fixed dimensionality
- We are mapping a sequence of words representing the question to a sequence of words representing the answer
- LSTM learns to map an input sentence of variable length into a fixed-dimensional vector representation
- Map the input sequence to a fixed-sized vector using one RNN
- The goal of the LSTM is to estimate the conditional probability
- First, we used two different LSTMs: one for the input sequence and another for the output sequence
https://arxiv.org/pdf/1703.03906.pdf
NMT - an end-to-end approach to automated translation
- Based on an encoder-decoder architecture consisting of two recurrent neural networks (RNNs) and an attention mechanism that aligns target with source tokens
- Shortcoming - amount of compute required to train them
- Encoder-decoder architecture with attention mechanism
- An encoder function fenc takes as input a sequence of source tokens x and produces a sequence of states h
- Decoder is an RNN that predicts the probability of a target sequence y
- Decoder RNN also uses context vector - called the attention vector and is calculated as a weighted average of the source states
- Commonly used attention mechanisms are the additive
- Given an attention key h (an encoder state) and attention query s (a decoder state), the attention score for each pair is calculated
Happy Learning!!!
No comments:
Post a Comment