"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 30, 2022

Retail Product Detection

Ref - Post 

  • Product region, brand logo region
  • Product textual data (title, brands)
  • The regions of interest in images were detected by a pretrained teacher model
  • Following the trend of using free-form text, we train the CPG model with 2.3M product entities synthesized from an e-commerce site in a self-supervised fashion
  • The bounding boxes for product-noun-to-object task are generated by a pre-trained general domain modulated detection model
  • Visual-language understanding of logos, brand strings, product details for the query product entity and for all brand representative product entities


  • Text to image lookup and comparison
  • Similar embedding lookup and comparison


  • Crafted image caption is tokenized and encoded using a pre-trained text encoder: RoBERTa
  • Image and textual features are concatenated as a multimodal vector and fed to a joint transformer encoder with cross attention between image and textual features
Keep Exploring!!!

No comments: