Dataset - LAION-5B (5 billion text-image pairs)
Dataset from - Pinterest and DeviantArt, e-commerce services like Shopify, cloud services like Amazon Web Services, thumbnails from YouTube, and images from news sites.
CNN vs Diffusion
- CNN – Feature Extraction, Error calculation, Weights update
- Diffusion – Noise Addition in the forward step, Denoising in the second step
Key Steps in Implementation
- Method of learning to generate new stuff - Forward/reverse diffusion
- Way to link text and images - Text-image representation model, Word as vectors, CLIP
- Way to compress images retain features - Autoencoder - imposes a bottleneck in the network which forces a compressed knowledge representation of the original input
- Priors built into the algorithm, Diffusion for Images – UNet architecture - U-net architecture + ‘attention’
- ControlNet - Control diffusion models by adding extra conditions, a "locked" copy, and a "trainable" weights copy
No comments:
Post a Comment