"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 22, 2018

Day #170 - RNN

Updated (May 30 / 2022) - Based on student discussions :)
  • RNN = CNN with previous state/sequencing
  • LSTM: - Cell memory stores the t-1 output
    • has 3 layers- Forget, Update, Output
    • Can do bi diretional for Offiline data
  • CNNs are mostly used for images, RNNs are mainly used for sequential data like videos or texts
Key Summary
  • Recurrent Neural Networks
  • Flexibility in architecture
  • Operate over sequences of input and output
  • Image to sequence of words
  • Sequence of words and classify sentiment of sentence
  • Function of all the frames
  • RNN for processing sequentially
  • Paper - DRAW - Recurrent Neural Network for Image Generation
  • Paper - Multiple Object Recognition with Visual Attention (Paper) - sequence processing of fixed inputs
  • Arrows - Functional Dependents
  • RNN has a state - Receives through time input vectors
  • It has state internally, Modify state as function, Weights are inside RNN
  • Predict output based on certain state
  • RNN - Collection of vectors, Function of previous state + current input vector
  • Single Hidden State and Recurrence formula
  • Character level language models
  • Feed Sequence of character and ask NN to predict sequence
  • One hot representation - turn on bit that corresponds to the order
  • Hidden layer summarizes all characters until then
  • Softmax classifier over next character
  • Same function always applied at each step
  • Initialization - Setting it to zero
  • Order of data-set matters, Function of everything that comes before it
  • Character level RNN - https://gist.github.com/karpathy/d4dee566867f8291f086


RNN
  • Input, Order characters
  • Associate indexes for evert character - sequence length is 25
  • Too large data cannot be put on top of it
  • Chunks of input data (25 characters)
  • Backpropogate 25 characters
  • Wxh, Whh  - Parameters to train
  • Sampling code to generate samples of characters it thinks
  • RNN distribution of next character sequence
  • Adagrad Update
  • Loss function - Forward and backward method
  • Backward 25 all the way to 1
  • Backpropagate thru softmax, activation function
  • Sample functions generate new text data
  • 25 softmax at every batch, they all backpropagate
  • Regularization is done 
  • Loss function - Forward pass - Compute Loss, Backward pass - Compute Gradient
  • Indexes and sequences of indexes, RNN has no knowledge of characters
  • Quiet Interesting examples of poetry, formula generation, code generation
  • Three layer LSTM



Working Details
  • Character level RNN on text
  • Cell is excited or not based on hidden states
  • Quote detection cell (Until open and close)
  • Line length tracking cell
  • Deeper the expression
  • RNNs are used for training sequence models
Image Captioning
  • Sequence of words for Image
  • Image -> CNN
  • ConvNet Process Image
  • RNN - Remember Sequences
  • Conditioning generated model with output of convolution process
  • Predict Next word / Remember information
  • Word level embedding
  • Sample until end of sentence
  • Number of dimensions = Number of words + 1 (End Token)
  • Backpropagate at single time
  • Embedding - One hot representation
  • Image plugged in first step
  • Backpropagate everything completely jointly
  • We can figure out features to better describe in end

More Architectures
  • Look up image and use feature maps in image
  • Attention over image
  • Soft Attention
  • Selective Attention over inputs
Multiple RNNs
  • RNNs feed into each other
  • All in single computational graph
LSTM
  • Recurrence formula is slightly complicated
  • Cancatenate and new formula for combining vectors
LSTM instead of RNN
  • x (input), h(previous hidden state)
  • f - sigma gate - forget gate - reset some cells to zero
  • g - tan gate
  • LSTM has hidden and cell state vector (two vectors)
  • LSTM operate over cell state


LSTM
  • f, i, g, o - n dimensional vectors
RNN vs LSTM
  • Based on Hidden state operate on cell
  • Forget gate - reset some cells to zero
  • LSTM very good with vanishing gradient problem
  • Relu used here



This is a special 100th Post of Learning for this year. Also, 170th post for Data Science. I hope this incremental learning always adds the delta for the next big idea

Keep Mastering DL!!!

No comments: