Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): May 2022

May 31, 2022

Motivation for Non-convex Optimization

Extremely high dimensional spaces
Web-scale document classification problems
Imposition of structural constraints on the learning models being estimated from data
Structural constraints often turn out to be non-convex.
Non-convex optimization techniques, such as sparse recovery, help discard irrelevant parameters and promote compact and accurate models.

Ref - Link

Why do neural nets need to be non-convex?

Neural networks are universal function approximators
With enough neurons, they can learn to approximate any function arbitrarily well
To do this, they need to be able to approximate non-convex functions

Non-convex Optimization for Machine Learning

Convex Optimization for Neural Networks

Basically, since weights are permutable across layers there are multiple solutions for any minima that will achieve the same results, and thus the function cannot be convex (or concave either).

Link1, Link2

Keep Exploring!!!

May 30, 2022

Backpropagation - Different interesting perspectives

After class, students summary of backpropagation concept :)

Perspective #1 - Back Propagation is tuning the weights of a neural network based on the error rate obtained in the previous iteration

Perspective #2 - It is a process of updating the weights & bias at each layer to minimize the error rate

Perspective #3 - Forward propagation is moving forward step by step, backward propagation is adjusting the sails to move ones defined direction...

Perspective #4 - 1. Calculate the output by forwardprop, 2. Calculate the error, 3. Minimize the error by backprop, 4. Update parameter, 5. Repeat till converge

Perspective #5 - Backpropagation:method or algorithm to find the optimal value of weight and bias to minimise the loss function

Perspective #6 - we feed cumulative input to the neuron and apply activation func. compare the output to actual output and update weight and bias. repeat the cycle until correct output

Perspective #7 - basically to reduce the loss, we change the weights using forward and backward feeds

Keep Thinking!!!

May 28, 2022

Topic Modelling - LDA, LSA

LDA stands for Latent Dirichlet Allocation, and it is a type of topic modeling algorithm
LDA was developed in 2003 by researchers David Blei, Andrew Ng and Michael Jordan
LDA is based on a Bayesian framework. This allows the model to infer topics based on observed data (words) through the use of conditional probabilities
The main difference between LSA and LDA is that LDA assumes that the distribution of topics in a document and the distribution of words in topics are Dirichlet distributions. LSA does not assume any distribution and therefore, leads to more opaque vector representations of topics and documents
Latent Semantic Analysis or Latent Semantic Indexing – Uses Singular Value Decomposition (SVD) on the Document-Term Matrix
In practice, LSA is much faster to train than LDA, but has lower accuracy.

Example

Keep Thinking!!!

SVD vs PCA vs NMF

Singular value decomposition and principal component analysis are two eigenvalue methods used to reduce a high-dimensional dataset into fewer dimensions while retaining important information

SVD

SVD performs low-rank matrix approximation
SVD procedure finds the optimum k vectors

PCA on the other hand is:

1) subtract the mean sample from each row of the data matrix.
2) preform SVD on the resulting matrix.
The core idea behind PCA uses result obtained through SVD as its backbone

NMF: Non-negative matrix factorization. PCA and NMF optimize for a different result. PCA finds a subspace that conserves the data's variance, while NMF finds nonnegative features.

PCA is highly recommended when you have to transform high dimensions into low dimensions and you are okay to loose original features in process as new one are introduced.

The output of NMF can be visualized as a compressed version of the original dataset

Recommendations systems, Topic modeling, Image compression everything uses the same concepts PCA, SVD, NMF...

Keep Thinking!!!

May 24, 2022

Agility / Influence / Collaboration / Work Pressure

In Software development, Influence without authority is a critical team member/mentor/manager trait.

Agile is not about

One sprint for functionality
One sprint for performance
One sprint for refactor

With the world going by weekly demos. Agile means

Collaborating together and bringing the best ideas and building with prioritized ideas
Evaluate, If it fails, back to basics and build
Quick to recognize mistakes, humble enough to rework, listen and evaluate with the interest in time

Work pressure and deadlines are byproducts of poor design choices and poor execution vs great planning. Knowledge does not come with deadlines, It comes with learning and experimentation.

When you learn

It takes 3X efforts to try different codes, understands
When you repeat it again it takes 2X effort as most unknowns are cleared
When you master it takes X efforts as its proven/worked / familiar for you

Everything takes time, In a world driven by demos, and communication. the real effort is considering meaningful work by proactively including design, performance and picking best ideas without considering who / what / why.

Time is a finite resource. Energy is a renewable resource (but only up to a certain point). Are you prioritising the right resource to improve your productivity? pic.twitter.com/KeNh5G46wW
— Aidan (@AidanYeep) May 23, 2022

Take charge of your future! pic.twitter.com/oNXn5PlbMf
— Jeff Kortenbosch (@jeffkortenbosch) May 23, 2022

Keep Thinking!!!

May 23, 2022

Engineering Productivity Myth and Reality

This link was helpful in triggering the thoughts. The framework is interesting

From Work Perspective

Design documents and specs
Work items, pull requests, commits
Code reviews/quality of reviews
CI / CD - Count of build, test, deployment/release
What I would also add up is
Performance and scale while designing
Field Testing to know it meets the customer

From General - work-life aspects

Extensive and effective communication
Block your peers to unblock you
Lack of communication implies a lack of coordination

Mindset and culture play a key role

From ref

Create a solution without understanding how it all works
Continue creating a solution without understanding how it all works
After something has caused you multiple issues, don’t stop and reconsider your approach

My perspectives - Ability to have quality ideas, quality alternatives in interest if time, Come back before its too late is essential.

The culture has to evolve (Ref)

humble — self-aware, intellectually honest
Know their business inside and out
recruit dream people to the cause

Be clear about what you don't know and how you plan to achieve it

My perspectives - Quality ideas, quick experiments, balance time and don't end up half-baked product.

May 20, 2022

Upskilling = Learning Tech + Marketing + Teamwork + Customer Lens

Upskilling in the Late '30s is mixed learning. Its not like mere education in grad days. Its a mix of tech, and business, finding the best of both to make it your competency

Technical Competency -

Whatever you pick, understanding of basics
Upskills with the course, projects, dedicated focus
Applying the lessons/leverage in your job / Apply the learning from your experience perspective

Markets / Products

Awareness of offerings/products
Map the demands vs use cases vs areas to apply your tech and business learning
Collaborate to build possible MVP with stakeholders, business
Sell the value, show the MVP, get the participation

MVP to Production

Work ahead to plan / looks for product vision
Map the factors of scale and performance
Be clear on data aspects to map/mimic the architecture

One interesting note on different parts of work

Keep Thinking!!!

May 18, 2022

Getting things done in Data Science

2015 - 17 - As you learn, What you don't know feels more to learn, What you know about the domain felt not that important
2020 - Reality is both domain and data science are equally important
2022 - Solving and selling are equally important, Being on the same page, and aligned is key

Keep Thinking!!!

May 17, 2022

A very interesting read - Personalization also can lead to unfair pricing

When you visit several times, Your interest is captured in clicks. More you visit, More you have interest to engage

Tinder - participants aged 30-49 on average paid 65.3% more than those aged 18-29

Age group 30-49 - Earn more than teens, They can afford, Higher conversions/success rates. The fear of missing out to settle have a family / engage / search and end up paying more :)

Reference - Link1, Link2

Keep Exploring!!!

May 16, 2022

Big company, More Data, Smaller Dataset, Medium Sized Models - Challenges at Different Levels - Different Data Science Backlogs

FAANG

FAANG Companies have no shortage of data, more data and really loads of streaming, insights, all types of clicks
Model complexity and large scale training deployment are your challenges

Domain (Automotive / Retail) - Next List after Core Companies

Companies adopting Data Models
Companies aiming for DBT
The challenges in Data Science where you are ahead of data collection, and data maturity is different
Data Collection, Engaging, and Selling Data Science use case becomes elementary
Mid to small scale models deployed based on business needs
Learning at both places is different. Challenges are different.

You can specialize in multiple areas

NLP
Vision
Recommendations
Forecasting
Anamoly Detection

Business knowledge + Feature knowledge + Impact + Selling + Building + Deploying is a never-ending learning curve :)

Keep Thinking!!!

May 15, 2022

There is no single formula for product success.

Your start with idea

You adjust your product based on market
You add features relevant to adoption
You designed X and sold Y but learnt X to Y during initial days
Sometimes process wins, sometimes idea wins, sometimes consistency wins, sometimes agility wins
You are selling to another human, empathy, and support hold a long way toward adoption
Sell and Support are two parallel systems like lungs / heart which keep your company and product alive :)

Keep Thinking!!!

May 14, 2022

Experience

Experience = Able to predict the future based on past
Experience = Able to unblock the team when they are blocked
Experience = Read up before based on intuition
Experience = Aware of alternatives/tradeoffs
Experience = Negotiate based on what's best for the customer not your own bias
Experience = Connect and align with purpose not giving orders
Experience = Unlearn, Relearn
Experience = Go back to basics and rework, pick your priorities
Experience = Embrace uncertainty and embed optimism :)

Experience = It's okay to be yourself

Keep Going!!!

May 13, 2022

Its okay, Keep Walking

You may work on multiple problems
You may forget a few things you have worked
You may relearn it every 10 years, Still, it feels like everything new
You may fail in a few, succeed in few experiments
You may learn more than one language
You may be happy with a few milestones
When you look back suddenly you are in your 40s
Days and nights have gone fast
4 walls of the office and my Laptop kept me engaged
Some production deployment memories, some bug fixes, some not so good memories
Some travel was with the team and some were solo travel

Keep Going!!!

Developer Lens vs Customer Lens

We have a new release of SQL, can we learn/try it (Developer point of view)

Database loads are going to increase for the holiday season, we need to be prepared, Can we migrate to the latest DB Version to handle performance (Customer needs it)

Balancing both lines is leadership 😊

May 12, 2022

Hair Styles - segmentation - paper reads

Paper #1 - Barbershop - Segmentation Masks

Git code - Link

Notes

GAN-based semantic alignment step which generates high quality images similar to the
input images
The shape of the hair is the binary segmentation region, and the
identity of a head-image

Architecture Ref

segmentation network such as BiSeNET
alignment in 𝑊 + space
a close-up view of the face (top) and hair (bottom) in 𝑊 + space
close-up views after details are transferred

(1) Reconstruction: A latent code Crec found to reconstruct the input image I𝑘

(2) Alignment: A nearby latent code C align is found that minimizes the cross-entropy between the generated image and the target mask M.

manipulating segmentation masks and copying content from different reference images.
Copy and replace eyes / nose / lips area pixels and values
Copy the landmark areas / eyes / face lips
Copy all boundaries of key facial landmarks

Face parsing - link

Paper #2 - Face shape classification using Inception v3

Paper #3 - CelebHair: A New Large-Scale Dataset for Hairstyle Recommendation based on CelebA

Paper - #4 - Fashion Meets Computer Vision: A Survey

Products

Beauty Opportunities

Project - Link

To deploy this app, follow the procedure below in a GCP console terminal:

Clone repository: git clone https://github.com/shawnhan108/BiSeNet-app.git.
Set project ID: export PROJECT_ID=bisenet.
Build docker image: docker build -t gcr.io/bisenet/bisenet-app:v1 ..
Authorize docker: gcloud auth configure-docker.
Push docker image to Container Registry: docker push gcr.io/bisenet/bisenet-app:v1.
Set computing region: gcloud config set compute/zone us-central1-a.
Create a Kubernetes Engine cluster: gcloud container clusters create bisenet-cluster --num-nodes=2.
Create a Kubernetes deployment of the app: kubectl create deployment bisenet-app --image=gcr.io/bisenet/bisenet-app:v1.
Expose the app with a Load Balancer service: kubectl expose deployment bisenet-app --type=LoadBalancer --port 80 --target-port 8080.
Go to the browser and test the app: http://[EXTERNAL-IP], where EXTERNAL-IP can be obtained using kubectl get service.

Keep Thinking!!!

Image Super resolution

Paper - Link

Super-Resolution Generative Adversarial Network
adversarial loss and perceptual loss
Improve the network structure by introducing the Residual-in-Residual Dense Block (RDDB)
Perceptual loss by using the VGG features
Basic architecture

Replace the original basic block with the proposed Residual-in-Residual Dense Block (RRDB)

GAN Loss functions

minimax loss: The loss function used in the paper that introduced GANs.
Wasserstein loss: The default loss function for TF-GAN Estimators. First described in a 2017 paper.

Ref - link

Demo and Sample results

Tensorflow colab code - link

Keep Exploring!!!

May 09, 2022

Deep Learning Revisions

It's always good to take a pause/revise / add a few more learning pointers :)

Key Notes

ML operates by handcrafted features
DL features learned directly from data
Data prevalent, Parallelizable models / hardware, GPU/ CUDA, TF / Pytorch
Activation functions and their differentiation

Non-Linear functions help to build boundaries

Text - sequence of characters / words
Stock prices / DNA sequences
Temporal dimension to models
Same series once for each Timestep
Horizontal to vertical view
Each output is connected/is input to the next timestamp
Internal memory / state-maintained

Individual Loss for each timestep
Backprop for all timestamps
Forward pass across time

Back propagate through time
Loss with respect to the internal state
Attention

Ref - Course Link

Keep Thinking!!!

Feedback / Retrospect your work / Quality of Your Work

How many problems did we solve in the past 2 weeks?
How many problems have similar issues / same solutions applicable?
Did we think of the consequences when we fixed it?
Are we over short of thinking / biased for deadlines?
Did the fixes meaningfully add value or overhead?
How realistically do we rate the quality of solutions
Did we do enough homework on options or our thinking is limited?
How much testing we did do?
Did we call out the flaws and be transparent about output / next steps?

Productivity is not about being engaged its about value delivered :)

Keep Thinking!!!

ML Datasets

Keep Exploring!!!

May 07, 2022

Generative VS Discriminative Models

Generative VS Discriminative Models

Generative Models- He can learn everything in depth. Generative Model, A Generative Model ‌explicitly models the actual distribution of each class. A Generative Model ‌learns the joint probability distribution p(x,y). It predicts conditional probability with the help of the Bayes Theorem. A joint probability is the likelihood of more than one event occurring at the same time

Generative classifiers

‌Naïve Bayes
Bayesian networks
Markov random fields
‌Hidden Markov Models (HMM)

The Naive Bayes (NB) classifier is a generative model, which builds a model of each possible class based on the training examples for each class. Then, in prediction, given an observation, it computes the predictions for all classes and returns the class most likely to have generated the observation.

HMM

The Hidden Markov Model (HMM) is a relatively simple way to model sequential data
A HMM consists of two components. Each HMM contains a series of discrete-state, time-homologous, first-order Markov chains (MC) with suitable transition probabilities between states and an initial distribution
Model the probabilities of different states and the rates of transitions among them
HMMs take a generative approach to labeling, defining
constrained to binary transition and emission feature functions, which force each word to depend only on the current label and each label to depend only on the previous label)
Markov Assumption: - the probability of a particular state is dependent only on the previous state

Discriminative model

Discriminative mode - Learn the differences between what he saw. Discriminative Model. A Discriminative model ‌models the decision boundary between the classes. A Discriminative model ‌learns the conditional probability distribution p(y|x).

Discriminative Classifiers

‌Logistic regression
Scalar Vector Machine
‌Traditional neural networks
‌Nearest neighbour
Conditional Random Fields (CRF)s

In contrast discriminative models, like logistic regression, tries to learn which features from the training examples are most useful to discriminate between the different possible classes

Ref - Link

Keep Thinking!!!

Loss Functions - Deep Learning

The choice of loss depends on the desired output (e.g., classification vs. regression)

Regression Loss Functions

Mean Squared Error Loss
L1 Loss
L2 Loss
Mean Squared Logarithmic Error Loss
Mean Absolute Error Loss

L2 Norm, mean squared error. Mean Squared Error - The mean square error is probably straight forward. You take the difference of the result and the ground truth for this sample and square it.

The L1 loss is basically the Absolut value of the difference between the current sample’s actual output and the desired output.

Binary Classification Loss Functions

Binary Cross-Entropy
Hinge Loss
Squared Hinge Loss

Multi-class Classification Loss Functions

Multi-class Cross Entropy Loss
Sparse Multiclass Cross-Entropy Loss
Kullback Leibler Divergence Loss

The Negative log-likelihood loss is based on the idea that every output represents a likelihood for example a particular class. It aims to make the output for the correct class has high as possible and for others as small as possible.

Cross entropy loss - The cross entropy loss is very popular for classification problems. The losses are averaged across observations for each minibatch

Kullback-Leibler Divergence Loss - Measures distance between distributions

Ref - Link1, Link2

Keep Thinking!!!

May 04, 2022

Discussions - Perspectives - Code - Observe - Learn

Above my levels

Aligned and connected
Convey the answers in Impact, Costs, Better aligned to the vision

Within my Team

Pick over priorities / Prioritizing things
Working on timelines without overdoing or chewing more than you can
50% learning, 30% Experimenting, 20% Alignment

My to-do list

Experiment / Code / Unblock
Plan ahead / get the required perspective for decisions
Reverse engineer / Analyze competitive products / Build / identify features

WFH is - Alert / Aware / Available. All this applies only in IT :)

Keep Thinking!!!

May 03, 2022

Segmentation Notes

Loss functions for image segmentation, Link1

Loss Functions in Segmentation

Image segmentation can be thought of a classification task on the pixel level, and the choice of loss function for the task of segmentation is key in determining both the speed at which a Machine-Learning model converges, as well to some extent, the accuracy of the model.

Ref - Link

Dice loss. This loss is obtained by calculating smooth dice coefficient function. This loss is the most commonly used loss is segmentation problems.

Ref - Link

Keep Thinking!!!

May 31, 2022

May 30, 2022

May 28, 2022

May 24, 2022

May 23, 2022

May 20, 2022

May 18, 2022

May 17, 2022

May 16, 2022

May 15, 2022

May 14, 2022

May 13, 2022

May 12, 2022

May 09, 2022

May 07, 2022

May 04, 2022

May 03, 2022

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts