"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

January 27, 2019

Day #199 - Introduction to Deep Reinforcement Learning (Deep RL)

Every Lecture has a historical context, evolution, mathematics and inspiration, Technical area overview, Network Architecture overview. Well Summarized!!

Key Summary
  • Use Deep Learning to make sequential decisions
  • The process of learning

  • Supervised (Manual Annotation)
  • Looking at own existence (Good, Bad, Morals)
  • Supervised - Learn from Example
  • RL - Learn by Experience
  • Human RL is million years of experience
  • Every Type of Learning is Supervised by Loss Function
  • The agent contains Lidar, Camera, Radar, GPS, Stereo Camera, Microphone, Networking, IMU
  • Sensory Data -> DL -> Make Sense of Data
  • Knowledge / Reasoning - Action Recognition

  • Imitation Learning / Observations based
  • Learn by Experience, Interaction
  • Agent senses environment based on observations
  • Through Action environment changes and new observation occurs
  • We have to design the world the agent is trying to solve the task



  • How much of the environment is observable / Agents in a real-world scenario
  • RL Challenges in the current scenario
  • Train in Simulation and transfer to real world
  • Components of Agent - Strategy - Policy
  • Deterministic world - Shortest path






  • Good / Bad and the reward structures
  • Unintended consequences based on Reward Structure
  • Our own model to reason to value human life, efficiency, money
  • Model-Based - As you interact with the world, construct based on the dynamics of the world, Model based on the world
  • Value-Based - Estimate the quality of taking actions of the state, Off Policy, They constantly update
  • Policy-Based - On Policy, Directly learn a policy function





RL methods
  • DQN - Deep Queue Learning Networks
  • Q- Learning, Looks at State, Action, Value
  • Which action to maximize the reward


Exploration Vs Exploitation
  • Q-Learning - Value Iteration for Non-Neural Networks
  • Deep RL = RL + Neural Networks
  • Neural Networks are good at function approximators
  • DQN Learn from historical data 


Policy Gradient

  • Vanilla Policy gradient (Collect reward at the end)
  • Faster Convergence
  • Cons - Inefficient



(Critic) Value based + Policy-Based = Advantage - Critic Method



Deep Deterministic Policy Gradient - DQ Network + Pick Deterministic Policy. Add Noise for exploration since it is deterministic





Happy Mastering DL!!!

No comments: