"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;
Showing posts with label Stable Diffusion. Show all posts
Showing posts with label Stable Diffusion. Show all posts

April 08, 2024

Anyscale Endpoints discussion

Step #1 - Anyscale signup



Step #2 - Notebook for Deploying Diffusion models



Step #3 - Deploying Service Command



Step #4 - Service Deployment


Code Example - 


Keep Exploring!!!

January 11, 2024

Comfy Tool Notes

Comfy Tool Notes

Summary from Link

Key Notes

  • Model files - civitai, hugging face
  • CLIP, Main Model, VAE
  • CheckpointLoader - Outputs Model, Clip, VAE
  • Clip Model - Encode the text to main model, Positive and Negative prompt
  • Encoded positive and Negative prompts sent to MODEL at each step and used to guide denoising
  • VAE transalate image in latent space to pixel space

Inpaint Examples

Samplername - uni_pc_bh2

  • AutocodePro
  • Finetuned Stable Diffusion for Anime
  • AlphaCTR
  • Low Rank Optimization LoRA models are essentially compact versions of Stable Diffusion that introduce minor, yet impactful modifications to the standard models. 
  • ControlNet/T2I adapter needs the image that is passed to it to be in a specific format like depthmaps
  • Stable Zero123 is a diffusion model that given an image with an object and a simple background can generate images of that object from different angles.
  • SDXL Turbo is a SDXL model that can generate consistent images in a single step. 

Nodes Explanation

  • CLIP model: to convert text into a format the Unet can understand
  • Unet: to perform the "diffusion" process, the step-by-step processing of images that we call generation
  • VAE: to decode the image from latent space into pixel space (also used to encode a regular image from pixel space to latent space when we are doing img2img)
  • KSampler node. This is the actual "generation" part, so you'll notice the KSampler takes the most time to run when you queue a prompt.

Checkpoints

  • Place checkpoints in the folder ComfyUI/models/checkpoints:
  • SDXL 1.0 base checkpoint, SDXL 1.0 refiner checkpoint
  • VAE - Place VAEs in the folder ComfyUI/models/vae
  • Fixed SDXL 0.9 VAE 
  • LoRAs - Place LoRAs in the folder ComfyUI/models/loras
  • Stable Diffusion Hub

Keep Exploring!!!

December 28, 2023

Stable Diffusion - Basics

Dataset - LAION-5B (5 billion text-image pairs) 

Dataset from - Pinterest and DeviantArt, e-commerce services like Shopify, cloud services like Amazon Web Services, thumbnails from YouTube, and images from news sites.

CNN vs Diffusion

  • CNN – Feature Extraction, Error calculation, Weights update
  • Diffusion – Noise Addition in the forward step, Denoising in the second step

Key Steps in Implementation 

  • Method of learning to generate new stuff - Forward/reverse diffusion
  • Way to link text and images - Text-image representation model, Word as vectors, CLIP
  • Way to compress images retain features - Autoencoder - imposes a bottleneck in the network which forces a compressed knowledge representation of the original input
  • Priors built into the algorithm, Diffusion for Images – UNet architecture  - U-net architecture + ‘attention’
  • ControlNet  - Control diffusion models by adding extra conditions, a "locked" copy, and a "trainable" weights copy

December 25, 2023

Stable Diffusion Internals

Stable Diffusion Key Steps

  • Method of learning to generate new stuff - Forward/reverse diffusion
  • Way to link text and images - Text-image representation model
  • Way to compress images - Autoencoder
  • Way to add in good inductive biases - U-net  architecture + ‘attention’

Build Stable Diffusion “from Scratch”

  • Principle of Diffusion models (sampling, learning)
  • Diffusion for Images – UNet architecture
  • Understanding prompts – Word as vectors, CLIP
  • Let words modulate diffusion – Conditional Diffusion, Cross Attention
  • Diffusion in latent space – AutoEncoderKL
  • Training on Massive Dataset. – LAION 5Billion

GAN

  • One shot generation. Fast. 
  • Harder to control in one pass. 
  • Adversarial min-max objective. Can collapse.

Diffusion

  • Multi-iteration generation. Slow.
  • Easier to control during generation. 
  • Simple objective, no adversary in training. 

Key Ingredients of UNet

  • Convolution operation 
  • Save parameter, spatial invariant

Down/Up sampling

  • Multiscale / Hierarchy 
  • Learn modulation at multi scale and multi-abstraction levels.

Skip connection 

  • No bottleneck
  • Route feature of the same scaledirectly. 
  • Cf. AutoEncoder has bottleneck

Autoencoder 

  • Autoencoder - impose a bottleneck in the network which forces a compressed knowledge representation of the original input
  • An autoencoder can learn non-linear transformations with a non-linear activation function and multiple layers
  • An ideal autoencoder will learn descriptive attributes of faces such as skin color, whether or not the person is wearing glasses, etc. in an attempt to describe an observation in some compressed representation.
  • For variational autoencoders, the encoder model is sometimes referred to as the recognition model whereas the decoder model is sometimes referred to as the generative model

Applications of Autoencoders

  • Image Coloring, Feature variation, Dimensionality,  Reduction, Denoising Image, Watermark Removal

PCA vs Autoencoder

  • PCA attempts to discover a lower dimensional hyperplane which describes the original data
  • Autoencoders are capable of learning nonlinear manifolds (a manifold is defined in simple terms as a continuous, non-intersecting surface)

ControlNet 

  • ControlNet is a neural network structure to control diffusion models by adding extra conditions.
  • It copys the weights of neural network blocks into a "locked" copy and a "trainable" copy.
  • The "trainable" one learns your condition. The "locked" one preserves your model.

Keep Exploring!!!