Paper #1 - Optimizing the Trade-off between Single-Stage and Two-Stage Deep Object Detectors using Image Difficulty Prediction
Key Notes
Two-stage detectors - Mask R-CNN
Treat object detection as a simple regression problem by taking an input image and learning the class probabilities and bounding box coordinates.
Comments - lower accuracy rates, but are much faster
Implementation
This is a very practical approach to chose between models vs execution time vs accuracy
Paper #2 - Light-Head R-CNN: In Defense of Two-Stage Object Detector
Two-Stage Detectors - like Faster RCNN The two-stage detector divides the task into two steps:
Feature Pyramid Networks (FPN) [19], which exploits inherent multiscale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids
Novelty
Paper #3 - RetinaMask: Learning to predict masks improves state-of-the-art single-shot
detection for free
Keep Thinking and Happy Learning!!!
Key Notes
Two-stage detectors - Mask R-CNN
- Stage 1 - Region Proposal Network to generate regions of interests
- Stage 2 - Pipeline for object classification and bounding-box regression Comments - Highest accuracy rates, typically slower
Treat object detection as a simple regression problem by taking an input image and learning the class probabilities and bounding box coordinates.
Comments - lower accuracy rates, but are much faster
Implementation
- Step 1 - YOLO works by dividing each image into a fixed grid, and for each grid location, it predicts a number of bounding boxes and confidence for each bounding box
- Step 2 - The confidence reflects the accuracy of the bounding box and whether the bounding box actually contains an object (regardless of class)
- Step 3 - YOLO also predicts the classification score for each box for every class in training
- Image difficulty predictor
- Easy images are sent to the faster single-stage detector
- Hard images are sent to the more accurate two-stage detector
This is a very practical approach to chose between models vs execution time vs accuracy
Paper #2 - Light-Head R-CNN: In Defense of Two-Stage Object Detector
Two-Stage Detectors - like Faster RCNN The two-stage detector divides the task into two steps:
- The first step (body) generates many proposals,
- The second step (head) focuses on the recognition of the proposals.
- In order to achieve the best accuracy, the design of the head is heavy
- They first propose potential object locations in an image—region proposals—and then apply a classifier to these regions to score potential detections
- Earlier sliding window approaches ran into scaling problems
Feature Pyramid Networks (FPN) [19], which exploits inherent multiscale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids
Novelty
- Light-head design to build an efficient yet accurate two-stage detector
- Large-kernel separable convolution to produce “thin” feature maps with small channel number (α × p × p is used in our experiments and α ≤ 10).
- Our Light-Head R-CNN builds “thin” feature maps before RoI warping. RPN (Region Proposal Network) is a sliding-window class agnostic object detector that use features from C4
- Non-maximum suppression (NMS) is used to reduce the number of proposals
Paper #3 - RetinaMask: Learning to predict masks improves state-of-the-art single-shot
detection for free
- Single Shot Detector Applications - embedded vision applications, self-driving cars, and mobile phone vision
- Two-stage detectors significantly beat single stage detectors on the speed-vs-accuracy tradeoff on a standard desktop/workstation + high-end GPU configurations
- Novel instance mask prediction head to the single-shot RetinaNet detector
- Self-adjusting loss function that improves robustness during training
- Smooth L1 and Self-Adjusting Smooth L1
No comments:
Post a Comment