- Input Layer - hidden Layer - Output Layer
- Rewrite as matrix multiplication
- Activation functions - Non-linear transformation for data - Data is non-linear - Build models that have non-linear capacity
- Iteratively find until it converge, Direction of descent - SGD
- Backprop to find gradient
Model Sequence
- Represent as bag of words
- Represent sentence as vectors
- BOW does not preserve order
- Longer feature vector to maintain order
- Rules
- States
- Transitions
- Chances for next word prediction using markov model
- State depends on previous state
- Sequence is sentence, function
- Success of deep models (Alexa)
RNN Key Needs
- Maintain Sequence
- Learn the order
- Preserve History
- Producing function of previous state
RNN
Train RNN
- W, U stay the same
- Cell state at time n contain information from all past time stamps
- Compute function of all previous states
- Machine Translation - 2 RNN - Encoder - Decoder Model
- Last Cell State is representation of sentence
Train RNN
- Backpropagation
- Added time dimension
- Chain rule for RNN and dependency on previous states
- Cell state depends on all previous time cell states
- Backpropagation through time
- Hard to train due to vanishing gradient problem
- Capture short term dependencies
- Initialize weights differently
- Gated Cell (Really Effective). Recurrent unit with several steps of logic gates
- Gates decide what information to multiply
- Functions of LSTM - Forgetting, Selective Updates, Output Certain Parts of cell
- Fixed length encoding is a problem for encoder-decoder, Solution is attend over all encoder states
- DL Framework
- GPU Acceleration
- Code Reusability
- TPU
- Session
- Computation Graph
- Feed data in, Get Results
- Variables, Sessions, Tensors
- Perceptron classifier
- Share weights
- Input Gate / Forget Gate / Update Gate / Output
- Code - https://github.com/nicholaslocascio/bcs-lstm
Hidden Technical Debt in Machine Learning Systems https://t.co/szYRTtSpDd on some of the new joys and struggles of deploying machine learning models in the wild. Still a long way to go to establish new language and design patterns for programming the 2.0 stack pic.twitter.com/6qR9BAA6qS— Andrej Karpathy (@karpathy) November 5, 2018
Next Talks List
Yann LeCun - How does the brain learn so much so quickly? (CCN 2017)
Frank Hutter and Joaquin Vanschoren: Automatic Machine Learning (NeurIPS 2018 Tutorial)
Fernanda ViƩgas and Martin Wattenberg: Visualization for Machine Learning (NeurIPS 2018 Tutorial)
Memory: why it matters and how it works
The Neuroscience of Emotions
NIPS 2018 Videos
Happy Mastering DL!!!
No comments:
Post a Comment