"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 27, 2018

Day #174 - CS231 - Lecture 13: Segmentation, soft attention, spatial transformers

Key Lessons
  • Segmentation
  • Semantic Segmentation
  • Instance Segmentation
Soft Attention
  • Discrete Locations
  • Continuous Locations (Spatial Transformers)
Inception-v4
  • Deep Network
  • Repeated Modules
  • No Padding
  • Strided Convolution and MaxPooling
  • Efficient Convolution Tricks, 1 X 7, 7 X 1
Segmentation
Two Subtasks involved
Semantic
  • Input Image / Fixed number of classes
  • Background class
  • Label Every pixel with one of semantic class
  • Higher level understanding of images
  • Not aware of instances
  • Count of objects not known
Instance
  • Detect Instances given category, label
  • Simultaneous detection and segmentation
Semantic Segmentation
  • Label without instances
  • Given input image, Extract patch from image
  • Run it through CNN
  • Classify Cetre pixel as COW
  • Run over entire image
  • Expensive Operation
  • 100 Pixel order of magnitude receptive field
  • Semantic Segmentation - Multi scale
  • Super Pixel / Segmentation Trees


Semantic Segmentation - Refinement
  • Apply CNN once to get labels
  • Increases Effective Receptive Field for output
  • Recurrent Convolutional network
  • Iteratively Define outputs

Semantic Segmentation - Upsampling
  • Input Images run through convolutions extract feature maps
  • Learn upsampling as part of network
  • Skip connections - Help in low level details
  • Convolutional features from different layers in network
  • Accuracy - Classification metrics, interesection over union (Ground truth vs Region Predicted)
  • Learnable Upsampling - Deconvolution (Convolution Transpose), Fractionally strided convolution
Instance Segmentation
  • Generalization
  • Distinguish Instances
  • Detect and label instances
  • End up looking like detection models
  • SDS (Simultaneous Detection and Segmentation
  • RCNN - External Region proposals
  • Box CNN, Region CNN
  • Mast with mean color of dataset

Hypercolumns
  • Region Classification
  • Region Refinement
  • Extract Convolutional features
  • Upsell and Club them together
  • Bi-linear / nearest neighbor for upsampling

Cascases
  • High Resolution Images
  • From Conv feature maps
  • Each Feature Map predicts region of interest
  • Reshape boxes to fixed size



Attention Models
RNN for Captioning
  • H X W X 3
  • Input Image -> CNN -> Features -> Hidden State -> First Word -> Second Word
  • One chance to look at input image
  • Weighted vector from imput features
  • Hidden states sent to the model
Soft Vs Hard Attention
  • Grid of features
  • Distribution over grid locations
  • Attention - Nice Interpretable Outputs


Soft Attention for Translation
  • Many to Many RNN
  • Sequence to Sequence model
  • Generate Output sequence similar to encaptioning
  • Content based addressing
  • Probablity of distribution directly
  • Soft Attention Easy to implement and train
  • Constrained to fixed grid provided by feature maps
  • Spatial Transformer Network - Similar to texture mapping
  • Attend to arbitrary parts of input in a nice way


Happy

Happy Mastering DL!!!

No comments: