Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Day #169

December 21, 2018

Day #169 - Spatial Localization and Detection

Key Summary

Higher layers more complex object parts
Pooling for downsize
Resnet 150 layer architecture

Localization and Detection

Classification - Given Image classify object category
Classification + Localization - Draw bounding box where the class occurs
Object Detection - All instances of categories int the image
Instance Segmentation - Find all categories like a contour
Detection can have multiple objects, variable number of objects

Classification

C classes
Input is Image
Output is class label
Evaluation Metric Accuracy

Localization

Input Image
Output Box in image (x,y,w,h)
Evaluation Metric Intersection over union
Class label plus bounding box

Idea #1 - Localization as Regression

Classification like SVM
Regression - Linear Regression
Image -> Processing -> Four Real valued numbers
x,y,w,h - always four numbers
Loss L2 Euclidean Loss
Train as classification networks

Train pretrained model
Take fully connected layers
New FC layers
Output Real valued numbers
Class agnostic regressor
Class specific regressor

Human Pose Estimation

Input close up view
Fixed number of joints
Find all joins
Encode
x,y for join location
predict pose
Deep pose
Regressing using CNN for joining positions

Idea #2 - Sliding Window
Run classification + Regression network at multiple locations on a high resolution image
Convert fully connected layers into convolutional layers for efficient computation
Combine classifier and regressor predictions across all scales for final prediction

Overfeat Architecture

Alexnet
Classification head
Regression head

Integrated Recognition, Localization and Detection using CNN (ICLR 2014)

FC Later - 4096 vector
Convert FC into Convolutional layer
Efficient Sliding Window - Overfeat
RPN (Region Proposal Networks)

Object Detection

Find all instances of those classes
Variable sized outputs
Not a straight of regression
Detection as classification
Classifier on Image Regions
Windows Size - Try them all
Windows of different sizes and scales

HOG

Compute HOG
Score every sub-window
Apply non-maxima supression
Linear Classifier on top of HOG

HOG changed into Deformable Parts Model

Still HOG
Rather than Linear Classifier
Templates for parts
Deform a little-bit

Detection as Classification

Run only Region proposals
Output Regions where object may be located
Class agnostic object detector
Fast to run
Blob like structures in image
Most famous is selective search
Merge blob like regions
Bunch of regional proposal methods - "What makes for effective detection proposals?"
Edgeboxes is really fast (1/3rd of second per image)

R-CNN

Region based CNN method
Input Image
Selective Search
2000 Boxes
Each box crop image region
Run forward through CNN with regression and classification head
Regression head correct region proposal

Training Pipeline

Start by download from internet
Fine tune model for detection
Add new layers with different classes
Run on positive and negative images from detection dataset
Selective search - CNN cache features to desk
Large Hard-drive - Pascal Dataset
Extraction takes 100s of GB

SVM to classify different classes

Binary SVM to classify different regions
Positive and Negative Samples to train binary SVMs

Bounding Box Regression

For each class train a linear regression model to map from cached features to offsets to GT boxes to make up for slightly wrong proposals

Object Detection Evaluation

Metric - Mean Average Precision
TP - High scores / Thresholds of correct box

RCNN is slow takes time
SVM and Regression are trainned offline
Complicated training pipeline

Fast R-CNN

Swap order of extracting regions and running CNN
Sliding window idea
Pipeline looks similar
High Resolution input image - Convolutional vector
Extract Region proposals from feature map
Fed into FC layers
Train all at once
ROI Pooling
Region proposal from edge boxes
Swapping order of convolution + cropping
Offline processing

Faster R-CNN

Feature map from convolution
Regional Proposal Network produces from feature maps
ROI pooling
Classifier
Feature map - Sliding window over convolutional feature map
Anchor boxes - Different sizes

ResNet

Deep Residual Network
101 layer Residual Network
box refinement (multiple steps for bounding box)
Add context
Multi-Scale Testing

Yolo

Localization as regression
Pose Detection directly as regression
Input image into grid
Within grid fixes bounding box of predictions
Score for bounding box
Classification Score
Detection ends up as regression
Upper bound in number of outputs is a problem

Happy Learning!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

December 21, 2018

Day #169 - Spatial Localization and Detection

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts