Updated (May 30 / 2022) - Based on student discussions :)
- RNN = CNN with previous state/sequencing
- LSTM: - Cell memory stores the t-1 output
- has 3 layers- Forget, Update, Output
- Can do bi diretional for Offiline data
- CNNs are mostly used for images, RNNs are mainly used for sequential data like videos or texts
Key Summary
- Recurrent Neural Networks
- Flexibility in architecture
- Operate over sequences of input and output
- Image to sequence of words
- Sequence of words and classify sentiment of sentence
- Function of all the frames
- RNN for processing sequentially
- Paper - DRAW - Recurrent Neural Network for Image Generation
- Paper - Multiple Object Recognition with Visual Attention (Paper) - sequence processing of fixed inputs
- Arrows - Functional Dependents
- RNN has a state - Receives through time input vectors
- It has state internally, Modify state as function, Weights are inside RNN
- Predict output based on certain state
- RNN - Collection of vectors, Function of previous state + current input vector
- Single Hidden State and Recurrence formula
- Character level language models
- Feed Sequence of character and ask NN to predict sequence
- One hot representation - turn on bit that corresponds to the order
- Hidden layer summarizes all characters until then
- Softmax classifier over next character
- Same function always applied at each step
- Initialization - Setting it to zero
- Order of data-set matters, Function of everything that comes before it
- Character level RNN - https://gist.github.com/karpathy/d4dee566867f8291f086
RNN
- Input, Order characters
- Associate indexes for evert character - sequence length is 25
- Too large data cannot be put on top of it
- Chunks of input data (25 characters)
- Backpropogate 25 characters
- Wxh, Whh - Parameters to train
- Sampling code to generate samples of characters it thinks
- RNN distribution of next character sequence
- Adagrad Update
- Loss function - Forward and backward method
- Backward 25 all the way to 1
- Backpropagate thru softmax, activation function
- Sample functions generate new text data
- 25 softmax at every batch, they all backpropagate
- Regularization is done
- Loss function - Forward pass - Compute Loss, Backward pass - Compute Gradient
- Indexes and sequences of indexes, RNN has no knowledge of characters
- Quiet Interesting examples of poetry, formula generation, code generation
- Three layer LSTM
Working Details
- Character level RNN on text
- Cell is excited or not based on hidden states
- Quote detection cell (Until open and close)
- Line length tracking cell
- Deeper the expression
- RNNs are used for training sequence models
- Sequence of words for Image
- Image -> CNN
- ConvNet Process Image
- RNN - Remember Sequences
- Conditioning generated model with output of convolution process
- Predict Next word / Remember information
- Word level embedding
- Sample until end of sentence
- Number of dimensions = Number of words + 1 (End Token)
- Backpropagate at single time
- Embedding - One hot representation
- Image plugged in first step
- Backpropagate everything completely jointly
- We can figure out features to better describe in end
- Look up image and use feature maps in image
- Attention over image
- Soft Attention
- Selective Attention over inputs
- RNNs feed into each other
- All in single computational graph
- Recurrence formula is slightly complicated
- Cancatenate and new formula for combining vectors
- x (input), h(previous hidden state)
- f - sigma gate - forget gate - reset some cells to zero
- g - tan gate
- LSTM has hidden and cell state vector (two vectors)
- LSTM operate over cell state
LSTM
- f, i, g, o - n dimensional vectors
- Based on Hidden state operate on cell
- Forget gate - reset some cells to zero
- LSTM very good with vanishing gradient problem
- Relu used here
This is a special 100th Post of Learning for this year. Also, 170th post for Data Science. I hope this incremental learning always adds the delta for the next big idea
Keep Mastering DL!!!
No comments:
Post a Comment