Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Stable Diffusion

Showing posts with label Stable Diffusion. Show all posts

April 08, 2024

Anyscale Endpoints discussion

Step #1 - Anyscale signup

Step #2 - Notebook for Deploying Diffusion models

Step #3 - Deploying Service Command

Step #4 - Service Deployment

Code Example -

Keep Exploring!!!

January 11, 2024

Comfy Tool Notes

Summary from Link

Key Notes

Model files - civitai, hugging face
CLIP, Main Model, VAE
CheckpointLoader - Outputs Model, Clip, VAE
Clip Model - Encode the text to main model, Positive and Negative prompt
Encoded positive and Negative prompts sent to MODEL at each step and used to guide denoising
VAE transalate image in latent space to pixel space

Inpaint Examples

Samplername - uni_pc_bh2

AutocodePro
Finetuned Stable Diffusion for Anime
AlphaCTR
Low Rank Optimization LoRA models are essentially compact versions of Stable Diffusion that introduce minor, yet impactful modifications to the standard models.
ControlNet/T2I adapter needs the image that is passed to it to be in a specific format like depthmaps
Stable Zero123 is a diffusion model that given an image with an object and a simple background can generate images of that object from different angles.
SDXL Turbo is a SDXL model that can generate consistent images in a single step.

Nodes Explanation

CLIP model: to convert text into a format the Unet can understand
Unet: to perform the "diffusion" process, the step-by-step processing of images that we call generation
VAE: to decode the image from latent space into pixel space (also used to encode a regular image from pixel space to latent space when we are doing img2img)
KSampler node. This is the actual "generation" part, so you'll notice the KSampler takes the most time to run when you queue a prompt.

Checkpoints

Place checkpoints in the folder ComfyUI/models/checkpoints:
SDXL 1.0 base checkpoint, SDXL 1.0 refiner checkpoint
VAE - Place VAEs in the folder ComfyUI/models/vae
Fixed SDXL 0.9 VAE
LoRAs - Place LoRAs in the folder ComfyUI/models/loras
Stable Diffusion Hub

Keep Exploring!!!

December 28, 2023

Stable Diffusion - Basics

Dataset - LAION-5B (5 billion text-image pairs)

Dataset from - Pinterest and DeviantArt, e-commerce services like Shopify, cloud services like Amazon Web Services, thumbnails from YouTube, and images from news sites.

CNN vs Diffusion

CNN – Feature Extraction, Error calculation, Weights update
Diffusion – Noise Addition in the forward step, Denoising in the second step

Key Steps in Implementation

Method of learning to generate new stuff - Forward/reverse diffusion
Way to link text and images - Text-image representation model, Word as vectors, CLIP
Way to compress images retain features - Autoencoder - imposes a bottleneck in the network which forces a compressed knowledge representation of the original input
Priors built into the algorithm, Diffusion for Images – UNet architecture - U-net architecture + ‘attention’
ControlNet - Control diffusion models by adding extra conditions, a "locked" copy, and a "trainable" weights copy

The guide to fine-tuning Stable Diffusion with your own images

From DALL·E to Stable Diffusion: how do text-to-image generation models work?

Keep Exploring!!!

December 25, 2023

Stable Diffusion Internals

Stable Diffusion Key Steps

Method of learning to generate new stuff - Forward/reverse diffusion
Way to link text and images - Text-image representation model
Way to compress images - Autoencoder
Way to add in good inductive biases - U-net architecture + ‘attention’

Build Stable Diffusion “from Scratch”

Principle of Diffusion models (sampling, learning)
Diffusion for Images – UNet architecture
Understanding prompts – Word as vectors, CLIP
Let words modulate diffusion – Conditional Diffusion, Cross Attention
Diffusion in latent space – AutoEncoderKL
Training on Massive Dataset. – LAION 5Billion

GAN

One shot generation. Fast.
Harder to control in one pass.
Adversarial min-max objective. Can collapse.

Diffusion

Multi-iteration generation. Slow.
Easier to control during generation.
Simple objective, no adversary in training.

Key Ingredients of UNet

Convolution operation
Save parameter, spatial invariant

Down/Up sampling

Multiscale / Hierarchy
Learn modulation at multi scale and multi-abstraction levels.

Skip connection

No bottleneck
Route feature of the same scaledirectly.
Cf. AutoEncoder has bottleneck

Autoencoder

Autoencoder - impose a bottleneck in the network which forces a compressed knowledge representation of the original input
An autoencoder can learn non-linear transformations with a non-linear activation function and multiple layers
An ideal autoencoder will learn descriptive attributes of faces such as skin color, whether or not the person is wearing glasses, etc. in an attempt to describe an observation in some compressed representation.
For variational autoencoders, the encoder model is sometimes referred to as the recognition model whereas the decoder model is sometimes referred to as the generative model

Applications of Autoencoders

Image Coloring, Feature variation, Dimensionality, Reduction, Denoising Image, Watermark Removal

PCA vs Autoencoder

PCA attempts to discover a lower dimensional hyperplane which describes the original data
Autoencoders are capable of learning nonlinear manifolds (a manifold is defined in simple terms as a continuous, non-intersecting surface)

ControlNet

ControlNet is a neural network structure to control diffusion models by adding extra conditions.
It copys the weights of neural network blocks into a "locked" copy and a "trainable" copy.
The "trainable" one learns your condition. The "locked" one preserves your model.

Keep Exploring!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

April 08, 2024

Anyscale Endpoints discussion

January 11, 2024

Comfy Tool Notes

December 28, 2023

Stable Diffusion - Basics

December 25, 2023

Stable Diffusion Internals

About Me

What is your Expertise

Search This Blog

Git Code Repository

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts