"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 21, 2018

Day #169 - Spatial Localization and Detection

Key Summary
  • Higher layers more complex object parts
  • Pooling for downsize
  • Resnet 150 layer architecture
Localization and Detection
  • Classification - Given Image classify object category 
  • Classification + Localization - Draw bounding box where the class occurs
  • Object Detection - All instances of categories int the image
  • Instance Segmentation - Find all categories like a contour
  • Detection can have multiple objects, variable number of objects
Classification
  • C classes
  • Input is Image
  • Output is class label
  • Evaluation Metric Accuracy
Localization
  • Input Image
  • Output Box in image (x,y,w,h)
  • Evaluation Metric Intersection over union
  • Class label plus bounding box



Idea #1 - Localization as Regression
  • Classification like SVM
  • Regression - Linear Regression
  • Image -> Processing -> Four Real valued numbers
  • x,y,w,h - always four numbers
  • Loss L2 Euclidean Loss
  • Train as classification networks
Train pretrained model
Take fully connected layers
New FC layers
Output Real valued numbers
Class agnostic regressor
Class specific regressor

Human Pose Estimation
  • Input close up view
  • Fixed number of joints
  • Find all joins
  • Encode
  • x,y for join location
  • predict pose
  • Deep pose
  • Regressing using CNN for joining positions
Idea #2 - Sliding Window
Run classification + Regression network at multiple locations on a high resolution image
Convert fully connected layers into convolutional layers for efficient computation
Combine classifier and regressor predictions across all scales for final prediction

Overfeat Architecture
  • Alexnet
  • Classification head
  • Regression head
Integrated Recognition, Localization and Detection using CNN (ICLR 2014)
  • FC Later - 4096 vector
  • Convert FC into Convolutional layer
  • Efficient Sliding Window - Overfeat
  • RPN (Region Proposal Networks)
Object Detection
  • Find all instances of those classes
  • Variable sized outputs
  • Not a straight of regression
  • Detection as classification
  • Classifier on Image Regions
  • Windows Size - Try them all
  • Windows of different sizes and scales
HOG
  • Compute HOG
  • Score every sub-window
  • Apply non-maxima supression
  • Linear Classifier on top of HOG
HOG changed into Deformable Parts Model
  • Still HOG
  • Rather than Linear Classifier
  • Templates for parts
  • Deform a little-bit
Detection as Classification
  • Run only Region proposals
  • Output Regions where object may be located
  • Class agnostic object detector
  • Fast to run
  • Blob like structures in image
  • Most famous is selective search
  • Merge blob like regions
  • Bunch of regional proposal methods - "What makes for effective detection proposals?"
  • Edgeboxes is really fast (1/3rd of second per image)
R-CNN
  • Region based CNN method
  • Input Image
  • Selective Search
  • 2000 Boxes
  • Each box crop image region
  • Run forward through CNN with regression and classification head
  • Regression head correct region proposal
Training Pipeline
  • Start by download from internet
  • Fine tune model for detection
  • Add new layers with different classes
  • Run on positive and negative images from detection dataset
  • Selective search - CNN cache features to desk
  • Large Hard-drive - Pascal Dataset
  • Extraction takes 100s of GB
SVM to classify different classes
  • Binary SVM to classify different regions
  • Positive and Negative Samples to train binary SVMs
Bounding Box Regression
  • For each class train a linear regression model to map from cached features to offsets to GT boxes to make up for slightly wrong proposals
Object Detection Evaluation
  • Metric - Mean Average Precision
  • TP - High scores / Thresholds of correct box
RCNN is slow takes time
SVM and Regression are trainned offline
Complicated training pipeline

Fast R-CNN
  • Swap order of extracting regions and running CNN
  • Sliding window idea
  • Pipeline looks similar
  • High Resolution input image - Convolutional vector
  • Extract Region proposals from feature map
  • Fed into FC layers
  • Train all at once
  • ROI Pooling
  • Region proposal from edge boxes
  • Swapping order of convolution + cropping
  • Offline processing
Faster R-CNN
  • Feature map from convolution
  • Regional Proposal Network produces from feature maps
  • ROI pooling
  • Classifier
  • Feature map - Sliding window over convolutional feature map
  • Anchor boxes - Different sizes 
ResNet
  • Deep Residual Network
  • 101 layer Residual Network
  • box refinement (multiple steps for bounding box)
  • Add context
  • Multi-Scale Testing
Yolo
  • Localization as regression
  • Pose Detection directly as regression
  • Input image into grid
  • Within grid fixes bounding box of predictions
  • Score for bounding box
  • Classification Score
  • Detection ends up as regression
  • Upper bound in number of outputs is a problem

Happy Learning!!!

No comments: