"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

January 29, 2019

Day #200 - MIT AI: OpenAI Meta-Learning and Self-Play - Ilya Sutskever

Key Lessons
  • Theorem - From a given little Data -> Find Shorter program that generates data by learning from Data -> Extract Regularity -> Make Predictions
  • If Data is Random cannot extract Regularity
  • Our Objective - Finding Best Short Program that solves problem given your data
  • Small Circuits - Find best small circuit using back propagation
  • Circuit - Impose Constraint - Using Data - Iteratively Makes Changes until predictions satisfy data
  • Backpropogation is circuit search
  • Training neural network = solving neural equation 
  • Parallel network - Multiple Layers - Reasoning happens - Given Data - Find NN
  • Run computation inside layers, Finding the best NN
  • Model class worth optimizing and Optimizable




RL
  • Framework to evaluate agents
  • Interesting RL Algos
  • Good Enough for doing interesting things
  • Mathematical problem - Maximize expected reward
  • Agent -> Action 
  • Agent <- Reward
  • Real-world we figure out from our interpretation from our senses
  • Existence / Non-Existence is the reward for us



RL in nutshell
  • Add randomnes in your action
  • Check if results surprise you
  • Change parameters to take those actions in future
  • Try adding randomness -> If you like results -> Randomness to desired target -> work in future
  • Try actions, if you like increase the log probability
Q-Learning
  • Learn from data generated from actor
  • Learn from other data
On & Off Policy Learning




Potential for RL
  • From Simulation of world we can train interesting things
  • Data - Extract Entropy - Learn Fastest way Possible
Meta Learning
  • Learn to Learn
  • Input is All info of Test Tasks + Test Case
  • Turn NN into Learning Algorithm
  • Training Task to Training Case
  • Neural Architecture Search - Find an architecture that solves a small problem well, apply to new tasks


Hindsight Experience Replay
  • Learning Algo for reinforcement learning that learns to make use of experience
  • Explore the unknown environment
  • Rewards from time to time
  • Learn Policy with very larg family of goals 

  • Reached State incrementally 
  • Combination of on and off policy
  • Systems learns from both success and failure
  • Sparse reward won't help, No reward No learning
  • Learn from both failures and success and you will never stop growing
  • SIM2Real and meta Learning
  • Robotics - Train in Simulation and Carry over to physical robot
Learn Hierarchy of Actions with Meta Learning

  • Some idea of action primitives to start with
  • Low level primitives, distribution of tasks

Self Play
  • TD-Gammon
  • Q-Learning + NN + Self-Play
  • DQN for Atari
  • What is task ? What are we teaching system to do ?
  • Agents create environment
  • Self play allows turn compute into data
Learn from Human Feedback
  • Technical Problem



Happy Mastering DL!!!

No comments: