- Segmentation
- Semantic Segmentation
- Instance Segmentation
- Discrete Locations
- Continuous Locations (Spatial Transformers)
- Deep Network
- Repeated Modules
- No Padding
- Strided Convolution and MaxPooling
- Efficient Convolution Tricks, 1 X 7, 7 X 1
Two Subtasks involved
Semantic
- Input Image / Fixed number of classes
- Background class
- Label Every pixel with one of semantic class
- Higher level understanding of images
- Not aware of instances
- Count of objects not known
- Detect Instances given category, label
- Simultaneous detection and segmentation
- Label without instances
- Given input image, Extract patch from image
- Run it through CNN
- Classify Cetre pixel as COW
- Run over entire image
- Expensive Operation
- 100 Pixel order of magnitude receptive field
- Semantic Segmentation - Multi scale
- Super Pixel / Segmentation Trees
- Apply CNN once to get labels
- Increases Effective Receptive Field for output
- Recurrent Convolutional network
- Iteratively Define outputs
- Input Images run through convolutions extract feature maps
- Learn upsampling as part of network
- Skip connections - Help in low level details
- Convolutional features from different layers in network
- Accuracy - Classification metrics, interesection over union (Ground truth vs Region Predicted)
- Learnable Upsampling - Deconvolution (Convolution Transpose), Fractionally strided convolution
- Generalization
- Distinguish Instances
- Detect and label instances
- End up looking like detection models
- SDS (Simultaneous Detection and Segmentation
- RCNN - External Region proposals
- Box CNN, Region CNN
- Mast with mean color of dataset
- Region Classification
- Region Refinement
- Extract Convolutional features
- Upsell and Club them together
- Bi-linear / nearest neighbor for upsampling
Cascases
- High Resolution Images
- From Conv feature maps
- Each Feature Map predicts region of interest
- Reshape boxes to fixed size
Attention Models
RNN for Captioning- H X W X 3
- Input Image -> CNN -> Features -> Hidden State -> First Word -> Second Word
- One chance to look at input image
- Weighted vector from imput features
- Hidden states sent to the model
- Grid of features
- Distribution over grid locations
- Attention - Nice Interpretable Outputs
- Many to Many RNN
- Sequence to Sequence model
- Generate Output sequence similar to encaptioning
- Content based addressing
- Probablity of distribution directly
- Soft Attention Easy to implement and train
- Constrained to fixed grid provided by feature maps
- Spatial Transformer Network - Similar to texture mapping
- Attend to arbitrary parts of input in a nice way
Happy
Happy Mastering DL!!!
No comments:
Post a Comment