"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

September 06, 2018

Day #127 - SSD, Yolo Paper Reading Notes

Only the key summary points. These are selected lines (copied) for my quick reference and understanding

Yolo Notes
  • Resize Image 
  • Run CNN (A single convolutional network simultaneously predicts multiple bounding boxes and class probabilities for those boxes)
  • Non-max Suppression
Alternative Techniques
  • Sliding window and region proposal-based techniques
Implementation Details
  • YOLO sees the entire image during training and test time so it encodes contextual information about classes as well as their appearance
  • Our system divides the input image into a S × S grid
  • If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object.
  • Each bounding box consists of 5 predictions: x, y, w, h, and confidence
  • Network architecture is inspired by the GoogLeNet model for image classification
  • YOLO predicts multiple bounding boxes per grid cell
Limitations of YOLO
  • Struggles to generalize to objects in new or unusual aspect ratios or configurations
Other Detection Systems
  • Haar, SIFT, HOG, convolutional features
What is Non-max Suppression

All modern object detectors follow a three step recipe:
(1) proposing a search space of windows (exhaustive by sliding window or sparser using proposals),
(2) scoring/refining the window with a classifier/regressor, and
(3) merging windows that might belong to the same object.

Non-Max Suppression - The algorithm greedily selects high scoring detections and deletes close-by less confident neighbours since they are likely to cover the same object

R-CNN [10] - Replaced features extraction and classifiers by a neural network

Related work - Viola&Jones, deformable parts model (DPM), clustering algorithms, mean-shift clustering, agglomerative clustering, affinity propagation clustering

Deformable parts models. Deformable parts models (DPM) use a sliding window approach to object detection

R-CNN. Region proposals instead of sliding windows to find objects in images. Selective
Search [34] generates potential bounding boxes, a convolutional network extracts features, an SVM scores the boxes, a linear model adjusts the bounding boxes, and non-max suppression eliminates duplicate detections.

SSD Notes

Discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location
  • The core of SSD is predicting category scores and box offsets for a fixed set of default bounding boxes using small convolutional filters applied to feature maps
  • Based on a feed-forward convolutional network that produces a fixed-size collection of bounding boxes and scores for the presence of object class instances in those boxes
  • Ground truth information needs to be assigned to specific outputs in the fixed set of detector outputs
These papers I need to revisit next couple of months to understand it better.

Happy Learning!!!

No comments: