Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Day #200 - MIT AI: OpenAI Meta-Learning and Self-Play - Ilya Sutskever

January 29, 2019

Day #200 - MIT AI: OpenAI Meta-Learning and Self-Play - Ilya Sutskever

Key Lessons

Theorem - From a given little Data -> Find Shorter program that generates data by learning from Data -> Extract Regularity -> Make Predictions
If Data is Random cannot extract Regularity
Our Objective - Finding Best Short Program that solves problem given your data
Small Circuits - Find best small circuit using back propagation
Circuit - Impose Constraint - Using Data - Iteratively Makes Changes until predictions satisfy data
Backpropogation is circuit search
Training neural network = solving neural equation
Parallel network - Multiple Layers - Reasoning happens - Given Data - Find NN
Run computation inside layers, Finding the best NN
Model class worth optimizing and Optimizable

RL

Framework to evaluate agents
Interesting RL Algos
Good Enough for doing interesting things
Mathematical problem - Maximize expected reward
Agent -> Action
Agent <- Reward
Real-world we figure out from our interpretation from our senses
Existence / Non-Existence is the reward for us

RL in nutshell

Add randomnes in your action
Check if results surprise you
Change parameters to take those actions in future
Try adding randomness -> If you like results -> Randomness to desired target -> work in future
Try actions, if you like increase the log probability

Q-Learning

Learn from data generated from actor
Learn from other data

On & Off Policy Learning

Potential for RL

From Simulation of world we can train interesting things
Data - Extract Entropy - Learn Fastest way Possible

Meta Learning

Learn to Learn
Input is All info of Test Tasks + Test Case
Turn NN into Learning Algorithm
Training Task to Training Case
Neural Architecture Search - Find an architecture that solves a small problem well, apply to new tasks

Hindsight Experience Replay

Learning Algo for reinforcement learning that learns to make use of experience
Explore the unknown environment
Rewards from time to time
Learn Policy with very larg family of goals

Reached State incrementally
Combination of on and off policy
Systems learns from both success and failure
Sparse reward won't help, No reward No learning
Learn from both failures and success and you will never stop growing
SIM2Real and meta Learning
Robotics - Train in Simulation and Carry over to physical robot

Learn Hierarchy of Actions with Meta Learning

Some idea of action primitives to start with
Low level primitives, distribution of tasks

Self Play

TD-Gammon
Q-Learning + NN + Self-Play
DQN for Atari
What is task ? What are we teaching system to do ?
Agents create environment
Self play allows turn compute into data

Learn from Human Feedback

Technical Problem

Happy Mastering DL!!!

No comments:

Subscribe to: Post Comments (Atom)