"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

September 18, 2018

Day #132 - Sequence Learning Paper

Understanding NLP and Deep learning requires understanding the research papers behind it. Listed below are readings and important points for my reference (copied from the paper)

Paper #1 - Sequence Learning 
Captured are summary of keypoints for Sequence Learning paper

Introduction
  • Multilayered Long Short-Term Memory (LSTM)
  • DNNs can only be applied to problems whose inputs and targets can be sensibly encoded with vectors of fixed dimensionality
  • We are mapping a sequence of words representing the question to a sequence of words representing the answer
  • LSTM learns to map an input sentence of variable length into a fixed-dimensional vector representation
Model
  • Map the input sequence to a fixed-sized vector using one RNN
  • The goal of the LSTM is to estimate the conditional probability
  • First, we used two different LSTMs: one for the input sequence and another for the output sequence
Paper #2 - Massive Exploration of Neural Machine Translation Architectures
https://arxiv.org/pdf/1703.03906.pdf

NMT - an end-to-end approach to automated translation
  • Based on an encoder-decoder architecture consisting of two recurrent neural networks (RNNs) and an attention mechanism that aligns target with source tokens
  • Shortcoming - amount of compute required to train them
NMT
  • Encoder-decoder architecture with attention mechanism
  • An encoder function fenc takes as input a sequence of source tokens x and produces a sequence of states h 
  • Decoder is an RNN that predicts the probability of a target sequence y
  • Decoder RNN also uses context vector - called the attention vector and is calculated as a weighted average of the source states
Attention Mechanism
  • Commonly used attention mechanisms are the additive 
  • Given an attention key h (an encoder state) and attention query s (a decoder state), the attention score for each pair is calculated
Paper #3 - NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE - https://arxiv.org/pdf/1409.0473.pdf

Happy Learning!!!

No comments: