Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Weekend Reads

November 08, 2020

Weekend Reads - Advanced Models for Computer Vision

Key Notes

What Classifier will Miss - Human-level scene understanding

Parsing the scene
The angle of Bicycle (Pose, Relative pose)
Person on Bicycle
Closer Inspection

Tasks

Object Detection
Pose Estimation
Accuracy vs Efficiency of Models

CNN as Deep Learning Puzzle

Input-Output Node, Loss Computation and Backprop

Classification - sparse description of the image

Object Detection

Multi-task problem
Classification & Localisation
Object, Location, Bounding box
Dataset, Samples, List of Objects, Labels, Bbox for each object

Predict BBOX Coordinates

Continuous Output
Minimize mse of samples
Regression for bbox prediction
The first part is the classification
The Second Step is regression

Faster RCNN

Two-Stage Detector
Good Candidate BBOX
Refine through Regression
Discretize bbox space
Anchor points distributed
Candidate boxes of different scale and ratio
n candidates per anchor
Is there an object or not in the box
Refine through regression
We cannot backdrop on parameters of bbox (Spatial Transformer Networks)

One Stage Detector - Train end to end

Employ Hard negative mining

Retinanet uses Focal Loss (The loss function is just a mathematical way of saying how far off a guess is from the real value of a data point.). It puts more weight on the objects that were hard to classify and decreases the impact on easy correct predictions