"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 25, 2023

Stable Diffusion Internals

Stable Diffusion Key Steps

  • Method of learning to generate new stuff - Forward/reverse diffusion
  • Way to link text and images - Text-image representation model
  • Way to compress images - Autoencoder
  • Way to add in good inductive biases - U-net  architecture + ‘attention’

Build Stable Diffusion “from Scratch”

  • Principle of Diffusion models (sampling, learning)
  • Diffusion for Images – UNet architecture
  • Understanding prompts – Word as vectors, CLIP
  • Let words modulate diffusion – Conditional Diffusion, Cross Attention
  • Diffusion in latent space – AutoEncoderKL
  • Training on Massive Dataset. – LAION 5Billion

GAN

  • One shot generation. Fast. 
  • Harder to control in one pass. 
  • Adversarial min-max objective. Can collapse.

Diffusion

  • Multi-iteration generation. Slow.
  • Easier to control during generation. 
  • Simple objective, no adversary in training. 

Key Ingredients of UNet

  • Convolution operation 
  • Save parameter, spatial invariant

Down/Up sampling

  • Multiscale / Hierarchy 
  • Learn modulation at multi scale and multi-abstraction levels.

Skip connection 

  • No bottleneck
  • Route feature of the same scaledirectly. 
  • Cf. AutoEncoder has bottleneck

Autoencoder 

  • Autoencoder - impose a bottleneck in the network which forces a compressed knowledge representation of the original input
  • An autoencoder can learn non-linear transformations with a non-linear activation function and multiple layers
  • An ideal autoencoder will learn descriptive attributes of faces such as skin color, whether or not the person is wearing glasses, etc. in an attempt to describe an observation in some compressed representation.
  • For variational autoencoders, the encoder model is sometimes referred to as the recognition model whereas the decoder model is sometimes referred to as the generative model

Applications of Autoencoders

  • Image Coloring, Feature variation, Dimensionality,  Reduction, Denoising Image, Watermark Removal

PCA vs Autoencoder

  • PCA attempts to discover a lower dimensional hyperplane which describes the original data
  • Autoencoders are capable of learning nonlinear manifolds (a manifold is defined in simple terms as a continuous, non-intersecting surface)

ControlNet 

  • ControlNet is a neural network structure to control diffusion models by adding extra conditions.
  • It copys the weights of neural network blocks into a "locked" copy and a "trainable" copy.
  • The "trainable" one learns your condition. The "locked" one preserves your model.

Keep Exploring!!!



No comments: