Key Notes
What Classifier will Miss - Human-level scene understanding
- Parsing the scene
- The angle of Bicycle (Pose, Relative pose)
- Person on Bicycle
- Closer Inspection
- Object Detection
- Pose Estimation
- Accuracy vs Efficiency of Models
Input-Output Node, Loss Computation and Backprop
Classification - sparse description of the image
Object Detection
- Multi-task problem
- Classification & Localisation
- Object, Location, Bounding box
- Dataset, Samples, List of Objects, Labels, Bbox for each object
- Continuous Output
- Minimize mse of samples
- Regression for bbox prediction
- The first part is the classification
- The Second Step is regression
- Two-Stage Detector
- Good Candidate BBOX
- Refine through Regression
- Discretize bbox space
- Anchor points distributed
- Candidate boxes of different scale and ratio
- n candidates per anchor
- Is there an object or not in the box
- Refine through regression
- We cannot backdrop on parameters of bbox (Spatial Transformer Networks)
- Employ Hard negative mining
Retinanet uses Focal Loss (The loss function is just a mathematical way of saying how far off a guess is from the real value of a data point.). It puts more weight on the objects that were hard to classify and decreases the impact on easy correct predictions
Semantic Segmentation
- Pooling - reduce the resolution of feature maps
- Upsample based on the nearest neighbor approach
U-Net
- Segmenting medical images
- Input Image -> Convolution -> RELU pooling
- Encoder - Similar to Image Classifier
- Upsampling through Decoder for same resolution output
- Upsampling - blobby feature map
- For every location distribution over classes
- Cross Entropy (Avg Over all Locations)
No comments:
Post a Comment