- Higher layers more complex object parts
- Pooling for downsize
- Resnet 150 layer architecture
- Classification - Given Image classify object category
- Classification + Localization - Draw bounding box where the class occurs
- Object Detection - All instances of categories int the image
- Instance Segmentation - Find all categories like a contour
- Detection can have multiple objects, variable number of objects
- C classes
- Input is Image
- Output is class label
- Evaluation Metric Accuracy
- Input Image
- Output Box in image (x,y,w,h)
- Evaluation Metric Intersection over union
- Class label plus bounding box
Idea #1 - Localization as Regression
- Classification like SVM
- Regression - Linear Regression
- Image -> Processing -> Four Real valued numbers
- x,y,w,h - always four numbers
- Loss L2 Euclidean Loss
- Train as classification networks
Take fully connected layers
New FC layers
Output Real valued numbers
Class agnostic regressor
Class specific regressor
Human Pose Estimation
- Input close up view
- Fixed number of joints
- Find all joins
- Encode
- x,y for join location
- predict pose
- Deep pose
- Regressing using CNN for joining positions
Run classification + Regression network at multiple locations on a high resolution image
Convert fully connected layers into convolutional layers for efficient computation
Combine classifier and regressor predictions across all scales for final prediction
Overfeat Architecture
- Alexnet
- Classification head
- Regression head
- FC Later - 4096 vector
- Convert FC into Convolutional layer
- Efficient Sliding Window - Overfeat
- RPN (Region Proposal Networks)
- Find all instances of those classes
- Variable sized outputs
- Not a straight of regression
- Detection as classification
- Classifier on Image Regions
- Windows Size - Try them all
- Windows of different sizes and scales
- Compute HOG
- Score every sub-window
- Apply non-maxima supression
- Linear Classifier on top of HOG
- Still HOG
- Rather than Linear Classifier
- Templates for parts
- Deform a little-bit
- Run only Region proposals
- Output Regions where object may be located
- Class agnostic object detector
- Fast to run
- Blob like structures in image
- Most famous is selective search
- Merge blob like regions
- Bunch of regional proposal methods - "What makes for effective detection proposals?"
- Edgeboxes is really fast (1/3rd of second per image)
- Region based CNN method
- Input Image
- Selective Search
- 2000 Boxes
- Each box crop image region
- Run forward through CNN with regression and classification head
- Regression head correct region proposal
Training Pipeline
- Start by download from internet
- Fine tune model for detection
- Add new layers with different classes
- Run on positive and negative images from detection dataset
- Selective search - CNN cache features to desk
- Large Hard-drive - Pascal Dataset
- Extraction takes 100s of GB
- Binary SVM to classify different regions
- Positive and Negative Samples to train binary SVMs
- For each class train a linear regression model to map from cached features to offsets to GT boxes to make up for slightly wrong proposals
- Metric - Mean Average Precision
- TP - High scores / Thresholds of correct box
SVM and Regression are trainned offline
Complicated training pipeline
Fast R-CNN
- Swap order of extracting regions and running CNN
- Sliding window idea
- Pipeline looks similar
- High Resolution input image - Convolutional vector
- Extract Region proposals from feature map
- Fed into FC layers
- Train all at once
- ROI Pooling
- Region proposal from edge boxes
- Swapping order of convolution + cropping
- Offline processing
- Feature map from convolution
- Regional Proposal Network produces from feature maps
- ROI pooling
- Classifier
- Feature map - Sliding window over convolutional feature map
- Anchor boxes - Different sizes
- Deep Residual Network
- 101 layer Residual Network
- box refinement (multiple steps for bounding box)
- Add context
- Multi-Scale Testing
- Localization as regression
- Pose Detection directly as regression
- Input image into grid
- Within grid fixes bounding box of predictions
- Score for bounding box
- Classification Score
- Detection ends up as regression
- Upper bound in number of outputs is a problem
Happy Learning!!!
No comments:
Post a Comment