- AGI - Artificial General Intelligence
- Do Most Economically Valuable work
- Deep Reinforcement Learning trains Deep Networks with Trial and Error
- Function approximators - Deep Networks
- Good for Sequential Learning
- Good when we do not know optimal behavior
- RL is useful when evaluating behavior is easier than generating them
Deep Learning
- Good for High Dimensional Data
- Approximate a function
- Video Games
- For Decision Rules
Recap of DL Patterns
- Finding a model that would give right output for certain inputs
- Output of each layer is a re-arrangement of input with the non-linearity applied
- Loss function differentialble with parameters in model
- Compute loss changes with respect to change in parameters
- Function composition is the core of the model
- Function topology with multiple architecture
- Non-Linearity does a lot of work
- Successive layers represents more complex features
- LSTM (RNN) - Accept timeseries of input and timeseries of output
- Transformer - Allows network to do (Attention) over several inputs
- Attention Neural Networks - Select most meaningful details from data, Make Decision based on lot of data
- Regularizers - Tradeoff loss against something that is not dependant on task, They do better job at Generalization
- Adaptive Optimizer
- Agent that interacts with environment
- Agent picks and executes action
- With New Environment Agent proceeds further
- What decision maximizes rewards
- Attains the goal with Trial and Error
Observations And Actions
- Observations are continuous
- Actions may be discrete or continuous
Policy
- Randomly (Stochastic)
- Deterministic (Map directly with no randomness)
- Randomness is helpful
- Logits (Probabilities of particular action)
- Probabilities of max of softmax of the logits
Reward function - Measures of good / bad. More positive better
Value functions - How much reward expected to get
Value function satisfy Bellman Equation
Types of RL Algos
Model, Environment based models
Try - Evaluate - Improve the policy
Policy Optimzation
- Run policy by complete trajectory
- Represent policy with Neural Network
- Parameters are in distribution
- Bring gradient inside expectation
- Expectation based on Trajectory
- Starting state drawn from some distribution
- Markov Property notion of picking next state depends on current state not on previous state
- Every action will get some update
- Reward to go Policy Gradient
- Advantage form functions
- How much better action is than average
- N-Step Advantage Estimates
- Initially Assignments (Weights) do Matter while setting up the system
Next is Part II
Happy Mastering DL!!!
No comments:
Post a Comment