"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

January 30, 2019

Day #202 - Ilya Sutskever at AI Frontiers 2018 - Great Talk

Key Lesson 
  • OpenAI Five - NN that play as Strongest Players (strategy game)

  • Solution Approach - Very Large Scale RL
  • Bot has accumulated 500 years of Game play experience
  • Breakthrough with old algorithms
  • It takes human many more years to become good but RL learns faster than before
Dexterity
  • Vision + Sensation to deal with block
  • Diverse Objects Handling using RL approach
  • Train in Simulation and Deploy in Real world (Randomize perception and physics)
  • SIM2Real is hard as simulation different for real robot
  • If you don't know Randomize actions
  • Generality and System able to learn

Curiosity based Exploration
  • Seek Novelty, Avoid boredom



OpenAI Mission
  • AGI (Artificial General Intelligence)
  • Work to benefit humanity
Deep Learning is Root of all

Exponential reduction in error year after year
The accuracy of Machine Translation improved drastically
GAN in 2014


Compute used has increased by a factor of 300K

Challenges
  • Unsupervised Learning
  • Robust Classification
  • Reasoning
  • Abstraction
Rapid progress past 6 years.


Happy Mastering DL!!!

Data Science Interesting Reads and Ideas

Good Read 25 Machine Learning Startups To Watch In 2018

The problems and areas they are working are
  • Continuously Updating (Learning) Models
  • Automate Data Extraction from unstructured documents
  • AI Solutions specialized for Verticals (Manufacturing/ Healthcare/ Talent Management/ Sales / Supply Chain)
Use Cases
  • Real time Anomaly Detection
  • Recommendations
  • Aerial, drone and satellite imagery
  • Fintech, Cybercrime use cases (Breach, Fraud, criminal activity)
  • Early Stage Cancer Detection
Key Ideas from the Blog Reads of Startups
  • Working on Autonomous Data Science Platform
  • Unsupervised ML Engine
  • From the Data, Find Appropriate Distribution that applies (Extract Anamoly from it - Non Matching the distribution)
  • AI-assisted command centers
More Reads
Artificial Intelligence in Medicine: How “Deep Learning” Is Helping Doctors Save Lives

Happy Mastering DL!!!

January 29, 2019

Day #201 - Person Detection for Surveillance Camera Video

Today I was pulled out for a real theft investigation. There was a theft reported in my community. We have a ton of videos. Finding out the person across all the videos is a tedious task. I worked on approach for the same.

Step #1 - OpenCV to Extract Frames form the Videos
Step #2 - Used Yolo to Detect Person from the frame
Step #3 - Extract the region of Person and Add Appropriate reference for further analysis

This was a great way and lesson to experiment and save time in this Situation.

Happy Coding!!!
Happy Mastering DL!!!


Day #200 - MIT AI: OpenAI Meta-Learning and Self-Play - Ilya Sutskever

Key Lessons
  • Theorem - From a given little Data -> Find Shorter program that generates data by learning from Data -> Extract Regularity -> Make Predictions
  • If Data is Random cannot extract Regularity
  • Our Objective - Finding Best Short Program that solves problem given your data
  • Small Circuits - Find best small circuit using back propagation
  • Circuit - Impose Constraint - Using Data - Iteratively Makes Changes until predictions satisfy data
  • Backpropogation is circuit search
  • Training neural network = solving neural equation 
  • Parallel network - Multiple Layers - Reasoning happens - Given Data - Find NN
  • Run computation inside layers, Finding the best NN
  • Model class worth optimizing and Optimizable




RL
  • Framework to evaluate agents
  • Interesting RL Algos
  • Good Enough for doing interesting things
  • Mathematical problem - Maximize expected reward
  • Agent -> Action 
  • Agent <- Reward
  • Real-world we figure out from our interpretation from our senses
  • Existence / Non-Existence is the reward for us



RL in nutshell
  • Add randomnes in your action
  • Check if results surprise you
  • Change parameters to take those actions in future
  • Try adding randomness -> If you like results -> Randomness to desired target -> work in future
  • Try actions, if you like increase the log probability
Q-Learning
  • Learn from data generated from actor
  • Learn from other data
On & Off Policy Learning




Potential for RL
  • From Simulation of world we can train interesting things
  • Data - Extract Entropy - Learn Fastest way Possible
Meta Learning
  • Learn to Learn
  • Input is All info of Test Tasks + Test Case
  • Turn NN into Learning Algorithm
  • Training Task to Training Case
  • Neural Architecture Search - Find an architecture that solves a small problem well, apply to new tasks


Hindsight Experience Replay
  • Learning Algo for reinforcement learning that learns to make use of experience
  • Explore the unknown environment
  • Rewards from time to time
  • Learn Policy with very larg family of goals 

  • Reached State incrementally 
  • Combination of on and off policy
  • Systems learns from both success and failure
  • Sparse reward won't help, No reward No learning
  • Learn from both failures and success and you will never stop growing
  • SIM2Real and meta Learning
  • Robotics - Train in Simulation and Carry over to physical robot
Learn Hierarchy of Actions with Meta Learning

  • Some idea of action primitives to start with
  • Low level primitives, distribution of tasks

Self Play
  • TD-Gammon
  • Q-Learning + NN + Self-Play
  • DQN for Atari
  • What is task ? What are we teaching system to do ?
  • Agents create environment
  • Self play allows turn compute into data
Learn from Human Feedback
  • Technical Problem



Happy Mastering DL!!!

January 27, 2019

Day #199 - Introduction to Deep Reinforcement Learning (Deep RL)

Every Lecture has a historical context, evolution, mathematics and inspiration, Technical area overview, Network Architecture overview. Well Summarized!!

Key Summary
  • Use Deep Learning to make sequential decisions
  • The process of learning

  • Supervised (Manual Annotation)
  • Looking at own existence (Good, Bad, Morals)
  • Supervised - Learn from Example
  • RL - Learn by Experience
  • Human RL is million years of experience
  • Every Type of Learning is Supervised by Loss Function
  • The agent contains Lidar, Camera, Radar, GPS, Stereo Camera, Microphone, Networking, IMU
  • Sensory Data -> DL -> Make Sense of Data
  • Knowledge / Reasoning - Action Recognition

  • Imitation Learning / Observations based
  • Learn by Experience, Interaction
  • Agent senses environment based on observations
  • Through Action environment changes and new observation occurs
  • We have to design the world the agent is trying to solve the task



  • How much of the environment is observable / Agents in a real-world scenario
  • RL Challenges in the current scenario
  • Train in Simulation and transfer to real world
  • Components of Agent - Strategy - Policy
  • Deterministic world - Shortest path






  • Good / Bad and the reward structures
  • Unintended consequences based on Reward Structure
  • Our own model to reason to value human life, efficiency, money
  • Model-Based - As you interact with the world, construct based on the dynamics of the world, Model based on the world
  • Value-Based - Estimate the quality of taking actions of the state, Off Policy, They constantly update
  • Policy-Based - On Policy, Directly learn a policy function





RL methods
  • DQN - Deep Queue Learning Networks
  • Q- Learning, Looks at State, Action, Value
  • Which action to maximize the reward


Exploration Vs Exploitation
  • Q-Learning - Value Iteration for Non-Neural Networks
  • Deep RL = RL + Neural Networks
  • Neural Networks are good at function approximators
  • DQN Learn from historical data 


Policy Gradient

  • Vanilla Policy gradient (Collect reward at the end)
  • Faster Convergence
  • Cons - Inefficient



(Critic) Value based + Policy-Based = Advantage - Critic Method



Deep Deterministic Policy Gradient - DQ Network + Pick Deterministic Policy. Add Noise for exploration since it is deterministic





Happy Mastering DL!!!