"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

June 28, 2021

CVPR Paper Reads - Large-scale Product Recognition

CVPR Paper Reads - Large-scale Product Recognition

Paper #1 - 1st Place Solution to CVPR 2021 AliProducts Challenge: Large-scale Product Recognition

Key Lessons

  • The final solution employed 11 models including three backbones: efficientnet, efficientnetv2, and nfnet.
  • Small models were trained with less epochs and large models were trained with more epoch

Data Augmentation

  • RandomCrop: 448*448
  • RandomRotation: ±30°
  • RandomHorizontalFlip: p=0.5

Paper #2 - Solution for Large-scale Long-tailed Recognition with Noisy Labels

Key Lessons

  • CNNs and Transformer, including ResNeSt, EfficientNetV2, and DeiT
  • Ensemble three different network architectures with ImageNet pretrained weights, including ResNeSt-101, DeiT-small and EfficientNevV2-m.

Paper #3 - An Effective Ensemble Method for AliProducts Challenge: Large-scale Product Recognition

Key Lessons

  • The AliProducts dataset consists of more than 3M images of nearly 50K different products.
  • All networks are initialized with pre-trained weights on ImageNet and trained with cross entropy loss.
  • As for image augmentation, we use RandomCrop, RandomHorizontalFlip as well as Nomalization

Paper #4 - RETAIL VISION WORKSHOP 2021 - PRODUCT PRICING CHALLENGE(4TH PLACE SOLUTION)

Key Lessons

  • First step involves detecting the prices present on shelves. A single class called "pricing"  (Bounding Box)
  • Second step is to detect and recognize text present inside the pricing. Google Vision API was used for text detection and recognition
  • Price Text Box Extraction: The text box with the max area containing only number was chosen the price box(or integer part of the price). 
  • Price Text Cleaning, Price Rounding off

Summary -  As we can see a mix of techniques custom detection, OCR comes into play for item price area detection, parsing, cleaning, and product match based on both text, price, value. We could also do a similar image / key points match too.

More reads - Link

Keep Thinking!!!

No comments: