"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

June 28, 2020

Weekend Reads - Detectors - Single / Two Stage Analysis - Papers

Paper #1 - Optimizing the Trade-off between Single-Stage and Two-Stage Deep Object Detectors using Image Difficulty Prediction

Key Notes
Two-stage detectors - Mask R-CNN
  • Stage 1 - Region Proposal Network to generate regions of interests
  • Stage 2 - Pipeline for object classification and bounding-box regression Comments - Highest accuracy rates, typically slower
Single-stage detectors - YOLO (You Only Look Once) and SSD (Single Shot MultiBox Detector)
Treat object detection as a simple regression problem by taking an input image and learning the class probabilities and bounding box coordinates.
Comments -  lower accuracy rates, but are much faster

Implementation
  • Step 1 - YOLO works by dividing each image into a fixed grid, and for each grid location, it predicts a number of bounding boxes and confidence for each bounding box
  • Step 2 - The confidence reflects the accuracy of the bounding box and whether the bounding box actually contains an object (regardless of class)
  • Step 3 - YOLO also predicts the classification score for each box for every class in training
Novelty 
  • Image difficulty predictor
  • Easy images are sent to the faster single-stage detector
  • Hard images are sent to the more accurate two-stage detector
Image difficulty predictor. We build our image difficulty prediction model based on CNN features and linear regression with ν-Support Vector Regression

This is a very practical approach to chose between models vs execution time vs accuracy


Paper #2 - Light-Head R-CNN: In Defense of Two-Stage Object Detector

Two-Stage Detectors -  like Faster RCNN The two-stage detector divides the task into two steps:
  • The first step (body) generates many proposals, 
  • The second step (head) focuses on the recognition of the proposals.
  • In order to achieve the best accuracy, the design of the head is heavy
  • They first propose potential object locations in an image—region proposals—and then apply a classifier to these regions to score potential detections
  • Earlier sliding window approaches ran into scaling problems
Faster R-CNN [28] introduces Region Proposal Network (RPN) to generate proposals by using network features
Feature Pyramid Networks (FPN) [19], which exploits inherent multiscale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids

Novelty
  • Light-head design to build an efficient yet accurate two-stage detector
  • Large-kernel separable convolution to produce “thin” feature maps with small channel number (α × p × p is used in our experiments and α ≤ 10). 
  • Our Light-Head R-CNN builds “thin” feature maps before RoI warping. RPN (Region Proposal Network) is a sliding-window class agnostic object detector that use features from C4
  • Non-maximum suppression (NMS) is used to reduce the number of proposals

Paper #3 - RetinaMask: Learning to predict masks improves state-of-the-art single-shot
detection for free
  • Single Shot Detector Applications -  embedded vision applications, self-driving cars, and mobile phone vision
  • Two-stage detectors significantly beat single stage detectors on the speed-vs-accuracy tradeoff on a standard desktop/workstation + high-end GPU configurations
Novelty
  • Novel instance mask prediction head to the single-shot RetinaNet detector
  • Self-adjusting loss function that improves robustness during training 
  • Smooth L1 and Self-Adjusting Smooth L1

Keep Thinking and Happy Learning!!!

No comments: