- Theorem - From a given little Data -> Find Shorter program that generates data by learning from Data -> Extract Regularity -> Make Predictions
- If Data is Random cannot extract Regularity
- Our Objective - Finding Best Short Program that solves problem given your data
- Small Circuits - Find best small circuit using back propagation
- Circuit - Impose Constraint - Using Data - Iteratively Makes Changes until predictions satisfy data
- Backpropogation is circuit search
- Training neural network = solving neural equation
- Parallel network - Multiple Layers - Reasoning happens - Given Data - Find NN
- Run computation inside layers, Finding the best NN
- Model class worth optimizing and Optimizable
RL
- Framework to evaluate agents
- Interesting RL Algos
- Good Enough for doing interesting things
- Mathematical problem - Maximize expected reward
- Agent -> Action
- Agent <- Reward
- Real-world we figure out from our interpretation from our senses
- Existence / Non-Existence is the reward for us
RL in nutshell
- Add randomnes in your action
- Check if results surprise you
- Change parameters to take those actions in future
- Try adding randomness -> If you like results -> Randomness to desired target -> work in future
- Try actions, if you like increase the log probability
- Learn from data generated from actor
- Learn from other data
Potential for RL
- From Simulation of world we can train interesting things
- Data - Extract Entropy - Learn Fastest way Possible
- Learn to Learn
- Input is All info of Test Tasks + Test Case
- Turn NN into Learning Algorithm
- Training Task to Training Case
- Neural Architecture Search - Find an architecture that solves a small problem well, apply to new tasks
Hindsight Experience Replay
- Learning Algo for reinforcement learning that learns to make use of experience
- Explore the unknown environment
- Rewards from time to time
- Learn Policy with very larg family of goals
- Reached State incrementally
- Combination of on and off policy
- Systems learns from both success and failure
- Sparse reward won't help, No reward No learning
- Learn from both failures and success and you will never stop growing
- SIM2Real and meta Learning
- Robotics - Train in Simulation and Carry over to physical robot
- Some idea of action primitives to start with
- Low level primitives, distribution of tasks
Self Play
- TD-Gammon
- Q-Learning + NN + Self-Play
- DQN for Atari
- What is task ? What are we teaching system to do ?
- Agents create environment
- Self play allows turn compute into data
Learn from Human Feedback
- Technical Problem
Happy Mastering DL!!!
No comments:
Post a Comment