Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Day #208 - OpenAI - Spinning Up in Deep RL Workshop - Part 1

February 10, 2019

Day #208 - OpenAI - Spinning Up in Deep RL Workshop - Part 1

Key Lessons

AGI - Artificial General Intelligence
Do Most Economically Valuable work
Deep Reinforcement Learning trains Deep Networks with Trial and Error
Function approximators - Deep Networks

Reinforcement Learning

Good for Sequential Learning
Good when we do not know optimal behavior
RL is useful when evaluating behavior is easier than generating them

Deep Learning

Good for High Dimensional Data
Approximate a function

Deep RL

Video Games
For Decision Rules

Recap of DL Patterns

Finding a model that would give right output for certain inputs
Output of each layer is a re-arrangement of input with the non-linearity applied
Loss function differentialble with parameters in model
Compute loss changes with respect to change in parameters
Function composition is the core of the model
Function topology with multiple architecture
Non-Linearity does a lot of work
Successive layers represents more complex features
LSTM (RNN) - Accept timeseries of input and timeseries of output
Transformer - Allows network to do (Attention) over several inputs
Attention Neural Networks - Select most meaningful details from data, Make Decision based on lot of data
Regularizers - Tradeoff loss against something that is not dependant on task, They do better job at Generalization
Adaptive Optimizer

Formulate RL Problem

Agent that interacts with environment
Agent picks and executes action
With New Environment Agent proceeds further
What decision maximizes rewards
Attains the goal with Trial and Error

Observations And Actions

Observations are continuous
Actions may be discrete or continuous

Policy

Randomly (Stochastic)
Deterministic (Map directly with no randomness)

Randomness is helpful

Logits (Probabilities of particular action)
Probabilities of max of softmax of the logits

Trajectory - Sequence of states and actions in an environment

Reward function - Measures of good / bad. More positive better

Value functions - How much reward expected to get
Value function satisfy Bellman Equation

Types of RL Algos
Model, Environment based models

Try - Evaluate - Improve the policy
Policy Optimzation

Run policy by complete trajectory
Represent policy with Neural Network

Derive Policy Gradient

Parameters are in distribution
Bring gradient inside expectation
Expectation based on Trajectory

Starting state drawn from some distribution
Markov Property notion of picking next state depends on current state not on previous state

Every action will get some update
Reward to go Policy Gradient

Advantage form functions
How much better action is than average

N-Step Advantage Estimates

Initially Assignments (Weights) do Matter while setting up the system

Next is Part II

Happy Mastering DL!!!

No comments:

Subscribe to: Post Comments (Atom)