YoFlow - Link1
Poster - Link2
Key Summary
- You Look Only Once (Yolo) Implemented in Tensorflow
- K-means clustering across short rolling windows to group similar objects across frames
- Predict if object is similar between frames
- Single Feedforward network
- Yolo reframe object detection as single regression problem
- Images -> Pixels -> Bounding Box -> Probabilities
- Divides image into S X S Grid
- If Object centre falls in the Grid then grid cell is responsible for detecting the object
- Predict bounding boxes and class probabilities
- 32 layer Deep CNN
- Input Image resized into 448 x 448
- Yolo Network Output = S x S x (B*5 + C) tensor of predictions ( 7 x 7 x 30 )
- S - Numbers of rows and columns in which we divide the image
- B - Number of objects that can be predicted in given box
- C - Number of classes
- 5 - Terms account for x-axis grid offset, y-axis grid offset, width, height and confidence in each grid cell
- Dataset - Pascal VOC Dataset
- K-means clustering across images within the short rolling window to group similar objects across frames
- Define Distance between two images I1, I2 given dimensions x,y and color channels c
- Works well when images are similar
Reading #2 - DOCK: Detecting Objects by transferring Common-sense Knowledge
Key Summary
- Transfer Learning to transfer knowledge from source categories
- Encode similaruty, spatial, attribute and scene
- Model Objects at region level
- Leverage pretrained object detectors
- Similarity at Region Level
- Context used for Object Detection, Semantic Segmentation and Object Discovery
- Use External Knowledge to interpret common sense
- Pascal VOC Dataset
- Base Network CNN + Common Sense
- Classification Matrix, Common Sense Matrix
- Image with Region proposals
- Common-sense cues based on similarity, attributes, spatial relations
No comments:
Post a Comment