Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Day #174 - CS231 - Lecture 13: Segmentation, soft attention, spatial transformers

December 27, 2018

Day #174 - CS231 - Lecture 13: Segmentation, soft attention, spatial transformers

Key Lessons

Segmentation
Semantic Segmentation
Instance Segmentation

Soft Attention

Discrete Locations
Continuous Locations (Spatial Transformers)

Inception-v4

Deep Network
Repeated Modules
No Padding
Strided Convolution and MaxPooling
Efficient Convolution Tricks, 1 X 7, 7 X 1

Segmentation
Two Subtasks involved
Semantic

Input Image / Fixed number of classes
Background class
Label Every pixel with one of semantic class
Higher level understanding of images
Not aware of instances
Count of objects not known

Instance

Detect Instances given category, label
Simultaneous detection and segmentation

Semantic Segmentation

Label without instances
Given input image, Extract patch from image
Run it through CNN
Classify Cetre pixel as COW
Run over entire image
Expensive Operation
100 Pixel order of magnitude receptive field
Semantic Segmentation - Multi scale
Super Pixel / Segmentation Trees

Semantic Segmentation - Refinement

Apply CNN once to get labels
Increases Effective Receptive Field for output
Recurrent Convolutional network
Iteratively Define outputs

Semantic Segmentation - Upsampling

Input Images run through convolutions extract feature maps
Learn upsampling as part of network
Skip connections - Help in low level details
Convolutional features from different layers in network
Accuracy - Classification metrics, interesection over union (Ground truth vs Region Predicted)
Learnable Upsampling - Deconvolution (Convolution Transpose), Fractionally strided convolution

Instance Segmentation

Generalization
Distinguish Instances
Detect and label instances
End up looking like detection models
SDS (Simultaneous Detection and Segmentation
RCNN - External Region proposals
Box CNN, Region CNN
Mast with mean color of dataset

Hypercolumns

Region Classification
Region Refinement
Extract Convolutional features
Upsell and Club them together
Bi-linear / nearest neighbor for upsampling

Cascases

High Resolution Images
From Conv feature maps
Each Feature Map predicts region of interest
Reshape boxes to fixed size

Attention Models

RNN for Captioning

H X W X 3
Input Image -> CNN -> Features -> Hidden State -> First Word -> Second Word
One chance to look at input image
Weighted vector from imput features
Hidden states sent to the model

Soft Vs Hard Attention

Grid of features
Distribution over grid locations
Attention - Nice Interpretable Outputs

Soft Attention for Translation

Many to Many RNN
Sequence to Sequence model
Generate Output sequence similar to encaptioning
Content based addressing
Probablity of distribution directly
Soft Attention Easy to implement and train
Constrained to fixed grid provided by feature maps
Spatial Transformer Network - Similar to texture mapping
Attend to arbitrary parts of input in a nice way

Happy

Happy Mastering DL!!!

No comments:

Subscribe to: Post Comments (Atom)