Stable Diffusion Key Steps
- Method of learning to generate new stuff - Forward/reverse diffusion
- Way to link text and images - Text-image representation model
- Way to compress images - Autoencoder
- Way to add in good inductive biases - U-net architecture + ‘attention’
Build Stable Diffusion “from Scratch”
- Principle of Diffusion models (sampling, learning)
- Diffusion for Images – UNet architecture
- Understanding prompts – Word as vectors, CLIP
- Let words modulate diffusion – Conditional Diffusion, Cross Attention
- Diffusion in latent space – AutoEncoderKL
- Training on Massive Dataset. – LAION 5Billion
GAN
- One shot generation. Fast.
- Harder to control in one pass.
- Adversarial min-max objective. Can collapse.
Diffusion
- Multi-iteration generation. Slow.
- Easier to control during generation.
- Simple objective, no adversary in training.
Key Ingredients of UNet
- Convolution operation
- Save parameter, spatial invariant
Down/Up sampling
- Multiscale / Hierarchy
- Learn modulation at multi scale and multi-abstraction levels.
Skip connection
- No bottleneck
- Route feature of the same scaledirectly.
- Cf. AutoEncoder has bottleneck
Autoencoder
- Autoencoder - impose a bottleneck in the network which forces a compressed knowledge representation of the original input
- An autoencoder can learn non-linear transformations with a non-linear activation function and multiple layers
- An ideal autoencoder will learn descriptive attributes of faces such as skin color, whether or not the person is wearing glasses, etc. in an attempt to describe an observation in some compressed representation.
- For variational autoencoders, the encoder model is sometimes referred to as the recognition model whereas the decoder model is sometimes referred to as the generative model
Applications of Autoencoders
- Image Coloring, Feature variation, Dimensionality, Reduction, Denoising Image, Watermark Removal
PCA vs Autoencoder
- PCA attempts to discover a lower dimensional hyperplane which describes the original data
- Autoencoders are capable of learning nonlinear manifolds (a manifold is defined in simple terms as a continuous, non-intersecting surface)
ControlNet
- ControlNet is a neural network structure to control diffusion models by adding extra conditions.
- It copys the weights of neural network blocks into a "locked" copy and a "trainable" copy.
- The "trainable" one learns your condition. The "locked" one preserves your model.
Keep Exploring!!!
No comments:
Post a Comment