Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Retail Product Detection

December 30, 2022

Retail Product Detection

Ref - Post

Product region, brand logo region
Product textual data (title, brands)
The regions of interest in images were detected by a pretrained teacher model
Following the trend of using free-form text, we train the CPG model with 2.3M product entities synthesized from an e-commerce site in a self-supervised fashion
The bounding boxes for product-noun-to-object task are generated by a pre-trained general domain modulated detection model
Visual-language understanding of logos, brand strings, product details for the query product entity and for all brand representative product entities

Text to image lookup and comparison
Similar embedding lookup and comparison

Crafted image caption is tokenized and encoded using a pre-trained text encoder: RoBERTa
Image and textual features are concatenated as a multimodal vector and fed to a joint transformer encoder with cross attention between image and textual features

Keep Exploring!!!

No comments:

Subscribe to: Post Comments (Atom)