"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

May 04, 2023

DINOv2

DINOv2

  • Automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data
  • Focus on text-guided pretraining
  • Textual supervision to guide the training of the features
  • PCA between the patches of the images from the same column

  • Features are learned from images alone
  • Self-supervised learning has the potential to learn all-purposed visual features if pretrained on a large quantity of curated data
  • Automatic pipeline to filter and rebalance datasets from an extensive collection of uncurated images
  • Data similarities are used instead of external metadata and do not require manual annotation

Other Approaches

  • Extracting a signal from the image to be predicted from the rest of the image
  • Discriminative signals between images or groups of images to learn features. 

  • Copy detection pipeline of Pizzi et al. (2022) to the uncurated data and remove near-duplicate images
  • Compute an image embedding using a self-supervised ViT-H/16 network pretrained on ImageNet-22k, and use cosine-similarity as a distance measure between images.
  • k-means clustering of the uncurated data.
  • Query dataset for retrieval, if it is large enough we retrieve N (typically 4) nearest neighbors for each query image.

Summary

  • DINOv2, a new series of image encoders pretrained on large curated data with no supervision
  • Visual features are compatible with classifiers as simple as linear layers - meaning the underlying information is readily available

Keep Exploring!!!

No comments: