Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): DINOv2

May 04, 2023

DINOv2

Automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data
Focus on text-guided pretraining
Textual supervision to guide the training of the features
PCA between the patches of the images from the same column

Features are learned from images alone
Self-supervised learning has the potential to learn all-purposed visual features if pretrained on a large quantity of curated data
Automatic pipeline to filter and rebalance datasets from an extensive collection of uncurated images
Data similarities are used instead of external metadata and do not require manual annotation

Other Approaches

Extracting a signal from the image to be predicted from the rest of the image
Discriminative signals between images or groups of images to learn features.

Copy detection pipeline of Pizzi et al. (2022) to the uncurated data and remove near-duplicate images
Compute an image embedding using a self-supervised ViT-H/16 network pretrained on ImageNet-22k, and use cosine-similarity as a distance measure between images.
k-means clustering of the uncurated data.
Query dataset for retrieval, if it is large enough we retrieve N (typically 4) nearest neighbors for each query image.

Summary

DINOv2, a new series of image encoders pretrained on large curated data with no supervision
Visual features are compatible with classifiers as simple as linear layers - meaning the underlying information is readily available

Keep Exploring!!!

No comments:

Subscribe to: Post Comments (Atom)