CVPR Paper Reads - Large-scale Product Recognition
Paper #1 - 1st Place Solution to CVPR 2021 AliProducts Challenge: Large-scale Product Recognition
Key Lessons
- The final solution employed 11 models including three backbones: efficientnet, efficientnetv2, and nfnet.
- Small models were trained with less epochs and large models were trained with more epoch
Data Augmentation
- RandomCrop: 448*448
- RandomRotation: ±30°
- RandomHorizontalFlip: p=0.5
Paper #2 - Solution for Large-scale Long-tailed Recognition with Noisy Labels
Key Lessons
- CNNs and Transformer, including ResNeSt, EfficientNetV2, and DeiT
- Ensemble three different network architectures with ImageNet pretrained weights, including ResNeSt-101, DeiT-small and EfficientNevV2-m.
Paper #3 - An Effective Ensemble Method for AliProducts Challenge: Large-scale Product Recognition
Key Lessons
- The AliProducts dataset consists of more than 3M images of nearly 50K different products.
- All networks are initialized with pre-trained weights on ImageNet and trained with cross entropy loss.
- As for image augmentation, we use RandomCrop, RandomHorizontalFlip as well as Nomalization
Paper #4 - RETAIL VISION WORKSHOP 2021 - PRODUCT PRICING CHALLENGE(4TH PLACE SOLUTION)
Key Lessons
- First step involves detecting the prices present on shelves. A single class called "pricing" (Bounding Box)
- Second step is to detect and recognize text present inside the pricing. Google Vision API was used for text detection and recognition
- Price Text Box Extraction: The text box with the max area containing only number was chosen the price box(or integer part of the price).
- Price Text Cleaning, Price Rounding off
Summary - As we can see a mix of techniques custom detection, OCR comes into play for item price area detection, parsing, cleaning, and product match based on both text, price, value. We could also do a similar image / key points match too.
More reads - Link
Keep Thinking!!!