"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

October 30, 2023

Code - Domain - Data Knowledge

#Coding floods the scene, #DomainKnowledge remains elusive. Truly grasping #CustomerNeeds - now that's a rare spirit. As #HelloWorld apps multiply, remember #RealWorld applications need a #panoramic vision for solutions. #Data + #Domain + Understanding Customer Needs requires constant experimentation, ideation and iteration. #perspectives. Analogy from Ray Dalio, Shapers get both the big picture and the details right. To me, it seems that Shaper = Visionary + Practical Thinker + Determined.

Keep Exploring!!!

October 29, 2023

Introducing ChatGPT Enterprise

ChatGPT Enterprise

  • You own and control your business data in ChatGPT Enterprise
  • We do not train on your business data or conversations, and our models don’t learn from your usage

ChatGPT Enterprise is available today

Advanced Data Analysis (ChatGPT Enterprise version)

Advanced Data Analysis (ADA) has been upgraded to include three new capabilities aimed at enhancing the analysis of text-rich documents: 

  • Synthesis -  Analyze information from documents to generate new content or insights
  • Transformation - Alter the presentation of information without changing its underlying essence
  • Extraction - Identify and pull out specific pieces of information from a document
  • Supporting Formats - PDF (.pdf), Text (.txt), PowerPoint (.ppt), Word (.doc), Excel (.xlx), Comma-separated values (.csv)

Contact OpenAI sales team

Advanced-Data Analysis (ChatGPT Enterprise version) - pull out specific pieces of information from a document, Supports - PDF (.pdf), Text (.txt), PowerPoint (.ppt), Word (.doc), Excel (.xlx), Comma-separated values (.csv)

Keep Exploring!!!

Google Vision - Experiments - Vertex Vision

  • Cloud Video Intelligence API - Detects objects, explicit content, and scene changes in videos. It also specifies the region
  • Cloud Vision API - Image Content Analysis

Track objects in a streaming video

Track objects

Shot change detection tutorial

  • SHOT_CHANGE_DETECTION request 
  • List of all shots that occur within the video
  • For each shot, provide the start and end time of the shot

Track objects in a local video file

All Video Intelligence code samples

AI-powered video archive for searching family videos

Video intelligence takes to the streets



Hello video data: Train an AutoML video classification model

Live Streaming on Google Cloud with Media CDN and Live Streaming API

Experiments

  • On lower resolution
  • Out-of-box detections
  • Select frames by Shot detection and evaluate
  • Offline mode evaluation (On videos from the bucket)
  • Online mode evaluation (Live Streaming API)
  • To find unknowns/limitations from sample videos 
  • Exploratory analysis/video/image

Architecture options

  • Offline Video evaluation with video in GCP bucket
  • Offline Video evaluation + custom models
  • Offline Video + Shot Detection + Out of box object detection
  • Live Streaming API Evaluation




Register streams - Streams connect your physical devices (like IP cameras)
Create Apps


Keep Exploring!!!

information asymmetry, Common-Knowledge Effect

It is a very relatable experience to information asymmetry from project experiences, domain understanding.

Some instances when 

  • Customers share limited information 
  • Unexplained features about transactions
  • Correlations were removed and data anonymized
This usually happens and results in poor forecasting




When two people do not have same level of info, the perception and understanding varies





Common-Knowledge Effect: A Harmful Bias in Team Decision Making

The common-knowledge effect is a decision-making bias where teams overemphasize the information most team members understand instead of pursuing and incorporating the unique knowledge of team members.

Preference Bias

  • We are more likely to discuss information that aligns with our initial preferences or preconceived notions.
  • Even when all information is shared with the group, we still process that information according to our initial preferences.

Social Comparison

  • We seek social acceptance and avoid conflict with teammates. We tend to adopt the group's prevailing view when evaluating information in unclear situations.
  • Information familiar to multiple team members becomes socially validated and more likely to be repeated and affirmed.

Keep Exploring!!!

70 hours work week VS Thinking perspective VS Consulting vs Domain Knowledge

Numbers do not reflect the quality of outcome rather consistency, ideas, and experimentation matters

Steve Jobs on Continuous Process Improvement

  • The theory behind the question of why we do ?
  • Ways of doing things? Question the basics? Shift to an Optimistic point of view of Relooking options?
  • The shift of perspective / optimistic point of view?
  • That's the way it's done vs Finding new ways/opportunities
Steve Jobs on Consulting

  • Owning/working extended period few years
  • See-through actions / accumulate scar tissues
  • Learning a fraction vs Not owning results
  • Picture of a banana vs 2D vs Experience of doing vs 3D views
  • Knowing is a 2D view vs doing is a 3D view (Hands-on Matters)
  • Take responsibility for work
This was the key lesson for Why I was able to rewrite the warranty engine in Microsoft for 224 million serial numbers

Team - Teamwork 



"My business model is The Beatles. They were four guys who kept each other's kind of negative tendencies in check. They balanced each other, and the total was greater than the sum of the parts. That's how I see business: Great things in business are never done by one person, they're done by a team of people."

Ideas to Reality

This lies in process / constant innovation


It is compounding value over the years :)


Hire people trustworthy plus have complementary skills. Aligned on vision but diverse in skills


Arrogance - Keep a Tab




Keep Exploring!!!

October 23, 2023

Text to Vision - Image - Survey - Techniques - Lessons

Text to Vision - Image - Survey - Techniques - Lessons

  • multimodal-to-text generation models (e.g. Flamingo)
  • image-text matching models (e.g. CLIP)
  • text-to-image generation models (e.g. Stable Diffusion).

A Survey of Diffusion Based Image Generation Models: Issues and Their Solutions

  • Difficulty generating images with multiple objects
  • Quality improvement of generated images

Concepts

  • To generate images with multiple objects, layout information such as bounding boxes or segmentation maps is added to the model. 
  • Cross-attention maps have been found to play a crucial role in image generation quality
  • Techniques like “SynGen” [55] and “Attend-and-Excite” [9] have been introduced to improve attention maps

  • Mathematically, this process can be modeled as a Markov process.
  • The process of adding noise step by step from X0 to XT is called “forward process” or “diffusion process”
  • Reversely, from XT , the process of iterative remove the noise until getting the clear image is called the “reverse process”

  • Denoising Diffusion Probabilistic Models (DDPM)
  • Basic components of a diffusion model
  • Noise prediction module - U-net / pure transformer structure
  • Condition encoder - conditioned on something, such as text. T5 series encoder or CLIP text encoder is used in most of the current works.
  • Super resolution module - DALL·E 2 employs two super-resolution models in its pipeline
  • Dimension reduction module - Text encoder and image encoder of CLIP are components integrated into the DALL·E 2 model
  • Diffusion models can also encounter difficulties in accurately representing positional information
  • SceneComposer [75] and SpaText [1] concentrate on leveraging segmentation maps for image synthesis
  • Subject Driven Generation
  • Concept customization or personalized generation
  • Present an image or a set of images that represent a particular concept, and then generate new images based on that specific concept




  • Advantage of Blip-diffusion lies in its ability to perform “zero-shot” generation, as well as “few-shot” generation with minimal fine-tuning

QUALITY IMPROVEMENT OF GENERATED IMAGES

  • Mixture of experts (MOE) [60] is a technique that leverages the strengths of different models, and it has been adapted for use in diffusion models to optimize their performance
  • Employ Gaussian blur on the on certain area of the prediction according to self-attention map to extract this condition

Reverse Stable Diffusion: What prompt was used to generate this image?

  • new task of predicting the text prompt given an image generated by a generative diffusion model
  • DiffusionDB is the first large-scale text-to-image prompt dataset. It contains 14 million images generated by Stable Diffusion using prompts and hyperparameters specified by real users.
  • Diffusion Explorer

Learning framework for prompt embedding estimation

Reversing the textto-image diffusion process

  • Predict a sentence embedding of the original prompt used to generate the input image
  • As underlying models, we consider three state-of-the-art architectures
  • that are agnostic to the generative mechanism of Stable Diffusion, namely ViT, CLIP and Swin Transformer 
  • U-Net model from Stable Diffusion, which operates in the latent space.

Explain in laymen terms - U-Net model from Stable Diffusion, which operates in the latent space.

The U-Net model from Stable Diffusion is a type of artificial intelligence model used for various computer vision tasks like image segmentation, where it identifies and separates different objects or features within an image.

Imagine you have a picture that appears blurry, full of noise, or unclear. The U-Net model from Stable Diffusion operates like a sophisticated visual detective, which can work back through the noise, step by step, to try and reconstruct the original picture.

To do this, it operates in what we call the 'latent space', which is loosely analogous to the mind’s eye of the AI - it's where the AI forms a sort of abstract, compressed understanding of the different elements present in the image, their shapes, and how they relate to each other. You can think of the latent space as a box where the details of the image are stored in a compact form, almost like the raw components before they've been assembled into the complete picture.

So, the U-Net model from Stable Diffusion first takes a noisy image, maps or translates it into this intermediate latent space - compressing and organizing the information in a way it can handle - before then reconstructing the original, clearer image from that. It's essentially a way of moving from a jumble of details,into a structured "blueprint" in the latent space, and then using that blueprint to rebuild a clear and accurate image. 

A key aspect of the U-Net model is its structure, which is like a U-shape (thus the name 'U-net'). The first half of the U shape takes the noisy image and condenses it down into the blueprint in the latent space (this is called encoding or downsampling). The second half then expands this blueprint back out into the clear image (known as decoding or upsampling). This U-shape structure, combined with the operation in the latent space, allows the model to effectively manage and recover the important details from the noisy input and improve the generated output's quality significantly.

So in simple terms, the U-Net model from Stable Diffusion operates like a skilled restorer, turning a distorted or noisy picture back into a clear and identifiable image by operating in its “mind’s eye” or latent space, using a special U-shaped structure to carefully manage detail extraction and restoration.


Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion


VLP: A Survey on Vision-Language Pre-training

Image Feature Extraction

  • By using the Faster R-CNN, VLP models obtain the OD-based Region feature embedding
  • CNNs end-to-end by using the grid features 

Video Feature Extraction

  • VLP models [17, 18] extract the frame features by using the method mentioned above

Text Feature Models

  • For the textual features, following pretrained language model such as BERT [2], RoBERTa [24], AlBERT [25], and XLNet [26], VLP models [9, 27, 28] first segment the input sentence into a sequence of subwords



Ideas summary

  • Detection models for Region feature embedding
  • Grid based feature extraction with CNN
  • Super resolution module to the pipeline
  • Subject Driven Generation, Concept customization or personalized generation - Present an image or a set of images that represent a particular concept, and then generate new images based on that specific concept
  • Gaussian blur on the on certain area based on attention / relevance
  • Captioning, Category recognition
  • Category Recognition (CR) CR refers to identifying the category and sub-category of a product, such as {HOODIES, SWEATERS}, {TROUSERS, PANTS}
  • Multi-modal Sentiment Analysis (MSA) MSA is aimed to detect sentiments in videos by leveraging multi-modal signals 

Text-to-image Diffusion Models in Generative AI: A Survey

The learning goal of DM is to reserve a process of perturbing the data with noise, i.e. diffusion, for sample generation

Diffusion Probabilistic Models (DPM), Score-based Generative model(SGM)

Denoising diffusion probabilistic models (DDPMs) are defined as a parameterized Markov chain

  • Forward pass. In the forward pass, DDPM is a Markov chain where Gaussian noise is added to data in each step until the images are destroyed
  • Reverse pass. With the forward pass defined above, we can train the transition kernels with a reverse process

Conditional diffusion model: A conditional diffusion model learns from additional information (e.g., class and text) by taking them as model input.

Guided diffusion model: During the training of a guided diffusion model, the class-induced gradients (e.g. through an auxiliary classfier) are involved in the sampling process.


Awesome Video Diffusion

Keep Exploring!!!

Interesting Product - sivi.ai

sivi.ai

The concept of blending image/text and providing ad variations is very impressive :)




The next question comes up / How does it compete against other models? Text to image generator options?




Current State of Art models struggle with creating the right mix of design with image and text content.

My Understanding

  • Have variations for text
  • Have Variations for image
  • Leverage past data
  • Position according to domain/data
  • Generate variations


30% image variations
30% text variations
40% templates and positioning based on domain / data / templates

Keep Exploring!!!

October 21, 2023

Baidu LLM Use cases

 

From 19 to 25 mins of talk

The use cases implemented are

Use case #1 - Custom background

  • Input - Photo of car
  • Prompt - New Energy vehicle, create new backgrounds of it
  • Output - New custom background added

Use Case #2 - Generate marketing poster. input is image and prompt about product details

  • Prompt - Take info from site, create poster of it 
  • Output - Creates poster for it with image + text

Use case #3 - Write variations of poster with more quality info data. Create Five more advertising copy

Output - 

  • Positive feature notes in different tones
  • Professional write up for marketing
  • Five copies created

Use Case #4 - Video use case generate Ad

  • Input - Website info, Existing content, Create digital content

Output

  • Video had person to explain
  • Different views of car
  • Images / references in Video

Keep Exploring!!!

 

October 20, 2023

Google Vertex Vision - Analytics - GenAI - Vertex Matching Engine

Vertex Vision 

Feed real-time streaming video

Pick existing models

Plug custom vision models


Architecture references and GenAI - Vision + Text + Catalog Management

Summary items

  • Finetuning with sample images - Few shot learning
  • Step 1 - Image Embedding Extractor - Vertex AI Embedding Extractor to extract embedding for image
  • Step 2 - Vertex Matching Engine to fetch top and similar images
  • Step 3 - Create a new copy for images and new Text, and Upload to the product database

Pre-requisites - Catalog of images


Ref - Accelerate product catalog management with generative AI 

Step 1 - Embedding Extract 

Step 2 - Similar Products

Step 3 - Add Descriptions

Step 4 - Prompt based enrichment

Advantage - Language translations supported

Step 5 - Catalog image creation


Ref -  Accelerating product innovation with generative AI

Step 1- Text data import - reviews, product info

Step 2 - Extract insights from uploaded info

Step 3 - QnA

Step 4 - Product Generation from concept

Summary

  • Concept one-liner (1 word)
  • Features from concept details (Few lines)
  • Prepare description with features (Product V1 Description)
  • Description to create Images (Image template creation)
  • Inspiration with details and images (Draft Product Ready)

Keep Exploring!!!

October 17, 2023

Machine Learning Interpretability / Explainability

Key Notes / Ideas 

Key items from blog / Reposted 

  • Create White-Box / Interpretable Models (Intrinsic): e.g., Linear Regression, Decision Trees.
  • Explain Black-Box / Complex Models (Post-Hoc): e.g., LIME, SHAP.
  • Enhance the Fairness of a Model: e.g., Fairness Indicators, Adversarial Debiasing.
  • Test Sensitivity of Predictions: e.g., Perturbation Analysis.

Local vs Global Interpretations:

  • Local: Dive into a single prediction to understand it. e.g., Individual SHAP values.
  • Global: Grasp the overall model behavior. e.g., Feature Importance Rankings.

Data Types & Applicable Interpretability Methods:

  • Tabular: e.g., Partial Dependence Plots.
  • Text: e.g., Word Embedding Visualizations.
  • Image: e.g., Grad-CAM for CNNs.
  • Graph: e.g., Node Influence Metrics.

Model Specificity:

  • Model Specific: Techniques that apply to a single model or a group of models. e.g., Feature Importances for Trees.
  • Model Agnostic: General methods applicable to any model. e.g., LIME.
Ref - Link

From AI Ethics institute key points Link

  • Transparency and explainability gains may be significant
  • Explainable by justification - Examples could get a better understanding 
  • Explainability through feature importance - understanding of the effect of features - SHAP (SHapley Additive exPlanations
  • Abstracting key patterns identified in the deep learning models as actual features
  • Implications of different types of errors have, as well as what the right way of evaluating these errors should be.

Keep Exploring!!!

Python + Data Pipelines

Data fetched from multiple sources

Data Integrity purity test

Lessons

ETL vs ELT

DAG / Pre-requisites, Remember order of sequence

  • Many ordering would be there
  • CDC will have issues



Lesson

Longer bad data more cost to pay


Backward compatible support schema


Keep Exploring!!!