Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Day #127

September 06, 2018

Day #127 - SSD, Yolo Paper Reading Notes

Only the key summary points. These are selected lines (copied) for my quick reference and understanding

Yolo Notes

Resize Image
Run CNN (A single convolutional network simultaneously predicts multiple bounding boxes and class probabilities for those boxes)
Non-max Suppression

Alternative Techniques

Sliding window and region proposal-based techniques

Implementation Details

YOLO sees the entire image during training and test time so it encodes contextual information about classes as well as their appearance
Our system divides the input image into a S × S grid
If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object.
Each bounding box consists of 5 predictions: x, y, w, h, and confidence
Network architecture is inspired by the GoogLeNet model for image classification
YOLO predicts multiple bounding boxes per grid cell

Limitations of YOLO

Struggles to generalize to objects in new or unusual aspect ratios or configurations

Other Detection Systems

Haar, SIFT, HOG, convolutional features

What is Non-max Suppression

All modern object detectors follow a three step recipe:
(1) proposing a search space of windows (exhaustive by sliding window or sparser using proposals),
(2) scoring/refining the window with a classifier/regressor, and
(3) merging windows that might belong to the same object.

Non-Max Suppression - The algorithm greedily selects high scoring detections and deletes close-by less confident neighbours since they are likely to cover the same object

R-CNN [10] - Replaced features extraction and classifiers by a neural network

Related work - Viola&Jones, deformable parts model (DPM), clustering algorithms, mean-shift clustering, agglomerative clustering, affinity propagation clustering

Deformable parts models. Deformable parts models (DPM) use a sliding window approach to object detection

R-CNN. Region proposals instead of sliding windows to find objects in images. Selective
Search [34] generates potential bounding boxes, a convolutional network extracts features, an SVM scores the boxes, a linear model adjusts the bounding boxes, and non-max suppression eliminates duplicate detections.

SSD Notes

Discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location

The core of SSD is predicting category scores and box offsets for a fixed set of default bounding boxes using small convolutional filters applied to feature maps
Based on a feed-forward convolutional network that produces a fixed-size collection of bounding boxes and scores for the presence of object class instances in those boxes
Ground truth information needs to be assigned to specific outputs in the fixed set of detector outputs

These papers I need to revisit next couple of months to understand it better.

Happy Learning!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

September 06, 2018

Day #127 - SSD, Yolo Paper Reading Notes

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts