Key Summary
- Use Deep Learning to make sequential decisions
- The process of learning
- Supervised (Manual Annotation)
- Looking at own existence (Good, Bad, Morals)
- Supervised - Learn from Example
- RL - Learn by Experience
- Human RL is million years of experience
- Every Type of Learning is Supervised by Loss Function
- The agent contains Lidar, Camera, Radar, GPS, Stereo Camera, Microphone, Networking, IMU
- Sensory Data -> DL -> Make Sense of Data
- Knowledge / Reasoning - Action Recognition
- Imitation Learning / Observations based
- Learn by Experience, Interaction
- Agent senses environment based on observations
- Through Action environment changes and new observation occurs
- We have to design the world the agent is trying to solve the task
- How much of the environment is observable / Agents in a real-world scenario
- RL Challenges in the current scenario
- Train in Simulation and transfer to real world
- Components of Agent - Strategy - Policy
- Deterministic world - Shortest path
- Good / Bad and the reward structures
- Unintended consequences based on Reward Structure
- Our own model to reason to value human life, efficiency, money
- Model-Based - As you interact with the world, construct based on the dynamics of the world, Model based on the world
- Value-Based - Estimate the quality of taking actions of the state, Off Policy, They constantly update
- Policy-Based - On Policy, Directly learn a policy function
- DQN - Deep Queue Learning Networks
- Q- Learning, Looks at State, Action, Value
- Which action to maximize the reward
Exploration Vs Exploitation
- Q-Learning - Value Iteration for Non-Neural Networks
- Deep RL = RL + Neural Networks
- Neural Networks are good at function approximators
- DQN Learn from historical data
Policy Gradient
- Vanilla Policy gradient (Collect reward at the end)
- Faster Convergence
- Cons - Inefficient
(Critic) Value based + Policy-Based = Advantage - Critic Method
Deep Deterministic Policy Gradient - DQ Network + Pick Deterministic Policy. Add Noise for exploration since it is deterministic
Happy Mastering DL!!!
No comments:
Post a Comment