Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): May 2023

May 31, 2023

Empathy + Kindness

Depression = You cannot stop crying inside even when you smile outside
Empathy and a bit of kindness we need to carry around us
Life is beyond our jobs, our daily routine
Challenges that seem to put us down emotionally, physically we need to handle it every time
It's one life, be kind and keep going!!!

Keep going!!!

May 27, 2023

State of GPT

Brilliant talk on LLMs

Emerging Recipe to train

Pre-training - compile time of 99%, internet scale dataset
Data mixture crawl, high-quality data, mixed up, sampled in proportion

Tokenization
Text to int representation
Similar to embedding, word2vec we discuss

Params
Token size
predict the next integer sequence
1.4 Trillion Tokens

hyper parms

Pretrain - Tokens to Data batches

Probability distribution of what comes next

Low loss higher correct probability
Learn powerful general representations

LLM + Few short learning is practice
Transformer forced to multitask in next token
Forced to understand text, causes
Better than finetune / prompt them

Base models are not systems
It completes what it fills a document

Not very reliable
Supervised finetuning
Small high-quality datasets
Human contractors
Prompt-response collection
Swapping out the training set
QnA - low quantity - high quality
People follow the structure and create responses

Reward read out tokes
Quality of each completion
Reformulate loss function with ground truth

Reinforcement learning with respect to reward model
Reinforce for higher probabilities
base model
SFT model - supervised fine-tuning (SFT)
RM model - reward model (RM) training
RL model - reinforcement learning

Model Ranking

Applications

Template of article

Time spent on each token
Token simulators
Imitate next token
Fact based knowledge/parameters
Large working memory
Transformer direct access to memory
Chain of thought
Prompt it will revisit

Slow and fast reasoning
Step by step vs one-step process
Tree search algorithm

Chain / Agents
General techniques
The sequence of thought/observation

Ask for good performance
You are an expert on this Topic
In data distribution of sci-fi

Tell a prompt not good at
Use a calculator, Teach LLM to use tools
Retrieval only vs memory only
Reteval Augmented models

Constrained prompting
Forcing templates
Output as json

base model clamped

Finetuning
Human contractors

Recommendations

Use cases

Keep Exploring!!!

May 24, 2023

My Learnings in Current Role

Learned how to create a product vision based on domain/product/competitors. Code vs Solution vs Product what things add up in each layer.
Several projects/iterations helped to get clear on how to Convert technical products into product/features
Code up / learn as needed to MVP / Docker / New models/papers
Learnings for product market fit by cost/competition/features
Applying domain knowledge when it comes to correlation vs causation
Re-use real-world knowledge / past experiences in supply chain/retail to trade off decisions in 80% scenarios vs 20% exceptions
Balance Domain + Data + Product all three lenses plus keep up with research as much as possible
More intentional learning to meet the above purpose than generic learning
Teaching to stay clear on thoughts/perspectives

Have to be a blend of product manager + coder + domain expert. Learning to support what sells better, focus where it matters.

Keep Exploring!!!

May 23, 2023

Career Advice

Not everyone will like you, Just move on. It's important to remember that not everyone will like you, and that's okay. Don't waste your time trying to please everyone. Focus on being the best version of yourself and doing your best work.

Do not underestimate your skills, Play according to your strengths. It's important to know your strengths and weaknesses and play to your strengths. Don't try to be something you're not.

Every company will have its own share of politics/impact/influence. Every company has its own culture and politics. It's important to be aware of these things and to be able to navigate them effectively.

Have a secondary passion - travel/learning. Having a secondary passion can help you to stay motivated and engaged in your work. It can also help you to develop new skills and knowledge that can be beneficial in your career.

Look beyond money/title, Be knowledgeable and employable. It's important to remember that money and title are not the only things that matter in a career. It's also important to be knowledgeable and employable. This means being able to learn new things, adapt to change, and be a valuable asset to your team.

Not everyone has the ability to connect domain - data and applications. Not everyone has the ability to see the big picture and connect domain, data, and applications. If you have this ability, it's a valuable asset that can help you to succeed in your career.

The more you learn, the more you can earn and remain employable. This is a fact. The more you learn, the more you can earn and the more employable you will be. It's important to never stop learning and growing in your career.

Enjoy the learning, forget the titles. It's important to enjoy the learning process and not get too caught up in titles. Titles are just a way of measuring your progress, but they shouldn't be the end goal.

Knowledge + Genuine passion + Experience = Satisfaction. This is a great formula for success in your career. If you have knowledge, passion, and experience, you will be successful in whatever you do.

Keep Exploring!!!

Questions

It's useless to teach people who have no clue how they are going to utilize it. As long as the world runs on certification and theoretical learning no real innovation will come, Before you learn anything question its purpose and outcome

Have you coded in the past?
What is the motivation for this course?
What are the ideas you think you can do with this program?
Many people drop out when it comes to coding/hackathons? On a scale of 1-10 with strict project / other timelines, Do you have the motivation to perform the training?

Keep Exploring!!!

My people / data / domain / AI ML learning's at Sensormatic

My journey/lessons

Setting up Team / Hiring all function QA / Development / Automation
Transition and Team Setup
Tools Development / Automation
Database Development
Upskill in AI / ML
Vision patents/hackathons / Sales Analytics / Sweet Hearting
Consulting in AI / ML
Retail tracking, fulfillment, out of stock, complete store operations
Holiday / Non-Holiday forecasting
CC tier - Cycle Counting / Scaling out system learning
It was a blend of DB, QA, AI+ML all of them
Consulting helped me step up into AI / ML, and PI edge devices, solving a wide array of problems

Keep Exploring!!!

My people / data / domain / database learning's at Microsoft

My Learning's

Dealt with hard data issues in Warranty
Improve performance of systems from performance/finetuning (XRR App, RL feeds, Reports)
Able to tradeoff reads vs writes for optimal performance (Order tacking / Centralized Status)
Handle domain issues / out of sequence and customize DB for it
Balance domain + data + customer service
I handle conflicts in a more empathetic way, not my way or no way person
Still, after 15 years remember the key tables/workflows/services
What you learn/are passionate about you will never forget :)

Keep Going!!!

Good Read - GenAI - Productivity Boost

Keep Exploring!!!

GenAI - Entertainment industry

#GenAI will have a significant impact on the #Entertainment industry
Recreating movies with new colors/locations/film restorations
Automated content translations/language translations/lip sync
Deepfakes to customize/edit/introduce new scenes
Create easier / faster new environments / creative scenes
Different custom backgrounds
Video shorts/synthesis creative one-liners generation
A large amount of music inspiration generated with AI prompts
You can enrich/recreate different variations of content plus create more creative works
Multiple variations of content from a single source, reduced rework, and significantly different variations are possible

We are entering a world where you can write your experience to get ideas of what your imagination looks like. #GenAI

With #Bard, Rewrote it.

#GenAI for the entertainment industry.

Recreate movies with new colors, locations, and film restorations.
Automate content #translations, language translations, and lip sync.
Create #deepfakes to customize, edit, or introduce new scenes.
Create new environments and #creative scenes more easily and quickly.
Create different custom #backgrounds.
Generate #video shorts and creative one-liners.
Generate a large amount of musical inspiration with AI prompts.
Enrich, recreate, and create more creative variations of content.
Create multiple variations of content from a single source, reducing rework and creating significantly different variations.

We are entering a world where you can write your experience to get ideas of what your imagination looks like.

#GenAI has the potential to revolutionize the entertainment industry. By automating tasks, creating new content, and generating variations, GenAI can help to create more immersive, engaging, and personalized experiences for audiences.

Keep Exploring!!!

May 21, 2023

Datascience news sharing

There are 4 types of news that get shared in the data science community:

Link resharing: This is when people share links to news articles, blog posts, or other pieces of content about data science.
Analysis sharing: This is when people share their own analysis of data science news. This could include things like providing additional context, explaining the implications of the news, or offering their own opinions.
Research news sharing: This is when people share news about new research in data science. This could include things like new algorithms, new datasets, or new findings.
Tools sharing: This is when people share new tools, libraries, or other resources that can be used for data science.

It is important to be aware of the different types of news that are shared in the data science community so that you can find the information that is most relevant to you. You can also use this information to stay up-to-date on the latest trends and developments in data science.

Keep Exploring!!!

May 20, 2023

Data vs Ideas vs Perspectives vs New Ideas Papers

Share your analysis not news, Every paper has some perspectives so analyze and connect to past lessons
Data engineering has similarities to feature engineering
Feature engineering needs domain and data science lens
Data engineering needs ETL / ELT
Vector databases / Multi models merge all text, data, and audio into one form
Data science has multiple areas of the forecast, regression, recommendations, anomaly detection
NLP has all the NER, Summarization, Topic modeling, Sentiment Analysis
Vision has Classification, Segmentation, Object detection, Action recognition
2015 to 18 - Age of ML (Regression, SVM, Decision Trees, Random Forest)
2015 to 2019 - CNN, RNN, LSTM
2020-22 - Transformers, BERT
2023 - LLM Models, ChatGPT

More and more new tech will come, Filter signal from noise.

Bard has rewritten with more content

Data engineering and feature engineering are both important steps in the machine learning process.

Data engineering is the process of collecting, cleaning, and organizing data so that it can be used for machine learning.
Feature engineering is the process of transforming data into features that are useful for machine learning models.

Both data engineering and feature engineering are essential for creating accurate and reliable machine learning models.

Feature engineering requires a deep understanding of the domain and the data science lens.

The domain knowledge helps the feature engineer to understand the meaning of the data and how to transform it into features that are relevant to the problem at hand.
The data science lens helps the feature engineer to understand the statistical properties of the data and how to transform it into features that are useful for machine learning models.

Data engineering needs ETL (extract, transform, load) or ELT (extract, load, transform) processes.

ETL or ELT processes are used to collect, clean, and organize data so that it can be used for machine learning.
ETL processes typically involve extracting data from various sources, transforming it into a consistent format, and loading it into a data warehouse or data lake.
ELT processes typically involve extracting data from various sources, loading it into a data warehouse or data lake, and then transforming it into a consistent format.

Vector databases and multi-model databases are emerging technologies that can be used to store and process large amounts of data.

Vector databases are designed to store and process large amounts of text data. Multi-model databases are designed to store and process large amounts of data from a variety of sources, including text, audio, and video.
These technologies can be used to improve the performance of machine learning models that are trained on large amounts of data.

Machine Learning has multiple areas of focus, including forecasting, regression, recommendations, and anomaly detection.

Forecasting is the process of predicting future values of a variable.
Regression is the process of finding a relationship between two or more variables.
Recommendations are the process of suggesting items to users based on their past behavior.
Anomaly detection is the process of identifying unusual or unexpected events. These areas of focus are all important for data scientists who are working to solve real-world problems.

Natural language processing (NLP) is a field of computer science that deals with the interaction between computers and human (natural) languages.

NLP has a variety of applications, including text classification, sentiment analysis, and summarization.
Text classification is the process of assigning a category to a piece of text.
Sentiment analysis is the process of determining the sentiment of a piece of text, such as whether it is positive, negative, or neutral.
Summarization is the process of creating a shorter version of a piece of text that retains the most important information.

Computer vision is a field of computer science that deals with the extraction of meaningful information from digital images or videos.

Computer vision has a variety of applications, including image classification, object detection, and action recognition.
Image classification is the process of assigning a category to an image.
Object detection is the process of identifying objects in an image. Action recognition is the process of identifying actions in a video.

The field of machine learning has seen rapid progress in recent years.

In the early 2010s, machine learning was primarily used for regression and classification tasks.
In the mid-2010s, deep learning techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), were developed and began to be used for a wider range of tasks, such as image classification and natural language processing.
In the late 2010s and early 2020s, even more powerful deep learning techniques, such as transformers, were developed and began to be used for a wider range of tasks, such as machine translation and text summarization.

The field of machine learning is constantly evolving and new technologies are emerging all the time.

It is important for data scientists to stay up-to-date on the latest trends in machine learning so that they can use the most effective techniques for solving real-world problems.

Keep Exploring!!!

LLM models Evaluation

Text to image models - Zoo

Text Prompt Models - Link

Product - Link

Prompt - Rewrite three positive features for below review as 3 points. Each point not more than 2 to 3 words. List only 3 points

Ref - Link

Keep Exploring!!!

May 19, 2023

Imagebind - Embed all vector spaces

Cross-modal retrieval, Composing modalities with arithmetic, Cross-modal detection and generation. Blend all digital senses image, audio, video, text

Demo - Link

Keep Exploring!!!

May 18, 2023

Age of Nice UI

designer microsoft

instagram post for water purifier

THE AI DESIGN TOOL FOR BRANDED CONTENT

befunky

Photo Editing and Graphic Design Made for Everyone

booth.ai

Keep Exploring!!!

May 16, 2023

Encoder / Decoder Discussions

GPT

The GPT-2 is built using transformer decoder blocks
GPT generates one token at a time just like decoder of transformer and has causal language modeling so it is strictly decoder only model.
GPT-2 does not require the encoder part of the original transformer architecture as it is decoder-only, and there are no encoder attention blocks, so the decoder is equivalent to the encode
Next word as outputs but it is auto-regressive as each token in the sentence has the context of the previous words

BERT

BERT, on the other hand, uses transformer encoder blocks
BERT gained the ability to incorporate the context on both sides of a word to gain better results
BERT generates same number of tokens as input that can be fed to linear layer and uses masked language modeling so this is strictly encoder only model.
BERT, by contrast, is not auto-regressive. It uses the entire surrounding context all-at-once.

Decoder - pay attention to specific segments from the encoder

Ref - Link1, Link2

Keep Exploring!!!

May 11, 2023

Hormones

Stress - When we experience stress, our bodies release hormones like epinephrine (adrenaline), cortisol, and norepinephrine.
Cortisol helps your body respond to stress, regulate blood sugar, and fight infections.
Increased stress leads to increased blood pressure and heart rate, muscle tension, and the digestive system slamming to a halt, resulting in nausea, vomiting, and diarrhea.
Moods - production of serotonin – serotonin is a hormone that affects your mood, appetite and sleep; a lack of sunlight may lead to lower serotonin levels, which is linked to feelings of depressios
Sleep - Melatonin plays an important role in regulating human sleep
Relationships - oxytocin's role in various behaviors, including orgasm, social recognition, bonding, and maternal behavior.
Happy hormone - Dopamine: Often called the "happy hormone," dopamine results in feelings of well-being. A primary driver of the brain's reward system

Keep Exploring!!!

Vision Project Ideas

Background Remover lets you Remove Background from images and video

PyMatting: A Python Library for Alpha Matting

DeepFloyd IF, a powerful text-to-image model

Reverse image search helps you search for similar or related images given an input image.

Replace Anything

Lama Cleaner

editanything

Keep Exploring!!!

May 09, 2023

ChatGPT - Product Ideas

Keep Exploring!!!

May 07, 2023

Age of Industrialized AI // Dan Jeffries // LLMs in Production Conference

Key notes

1500 people efforts done by 100 people
AI can add value, cut down some challenges
Large Language Models to Large Thinking Models

AI is a collaboration partner

Intelligence in everything
Industrialized AI
Problems solved in isolation
Problems and Opportunities to integrate
LLM rudimentary reasoning engine

Massive and open-ended systems

Error rate will reduce in coming years

newer versions, more consistent reasoning engines will come
add keywords under the hood

heuristic-based hacks
alter prompts

human-based domain knowledge baked in models
generalized models
LLM to label data

Model patching

Adaptors - LoRA

My Summary

There will be a transition from LLM to LTM
Models will learn to reason/validate
LLM can be used to generate labels
LLM can be trained for custom domains
LLM + Reasoning + Continual Learning + Iterations
The newer architecture will emerge to address
Industrialization of domain-specific models and the ability to reason is the way to go

Keep Exploring!!!

May 31, 2023

May 27, 2023

May 24, 2023

May 23, 2023

May 21, 2023

May 20, 2023

May 19, 2023

May 18, 2023

May 16, 2023

May 11, 2023

May 09, 2023

May 07, 2023

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts