"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

May 31, 2022

Motivation for Non-convex Optimization

  • Extremely high dimensional spaces
  • Web-scale document classification problems
  • Imposition of structural constraints on the learning models being estimated from data
  • Structural constraints often turn out to be non-convex.
  • Non-convex optimization techniques, such as sparse recovery, help discard irrelevant parameters and promote compact and accurate models.

Ref - Link


Ref - Link

Why do neural nets need to be non-convex?
  • Neural networks are universal function approximators
  • With enough neurons, they can learn to approximate any function arbitrarily well
  • To do this, they need to be able to approximate non-convex functions

Basically, since weights are permutable across layers there are multiple solutions for any minima that will achieve the same results, and thus the function cannot be convex (or concave either).


Keep Exploring!!!




May 30, 2022

Backpropagation - Different interesting perspectives

After class, students summary of backpropagation concept :)

Perspective #1 - Back Propagation is tuning the weights of a neural network based on the error rate obtained in the previous iteration

Perspective #2 - It is a process of updating the weights & bias at each layer to minimize the error rate

Perspective #3 - Forward propagation is moving forward step by step, backward propagation is adjusting the sails to move ones defined direction...

Perspective #4 - 1. Calculate the output by forwardprop, 2. Calculate the error, 3. Minimize the error by backprop, 4. Update parameter, 5. Repeat till converge

Perspective #5 - Backpropagation:method or algorithm to find the optimal value of weight and bias to minimise the loss function

Perspective #6 - we feed cumulative input to the neuron and apply activation func. compare the output to actual output and update weight and bias. repeat the cycle until correct output

Perspective #7 - basically to reduce the loss, we change the weights using forward and backward feeds

Keep Thinking!!!

May 28, 2022

Topic Modelling - LDA, LSA

  • LDA stands for Latent Dirichlet Allocation, and it is a type of topic modeling algorithm
  • LDA was developed in 2003 by researchers David Blei, Andrew Ng and Michael Jordan
  • LDA is based on a Bayesian framework. This allows the model to infer topics based on observed data (words) through the use of conditional probabilities
  • The main difference between LSA and LDA is that LDA assumes that the distribution of topics in a document and the distribution of words in topics are Dirichlet distributions. LSA does not assume any distribution and therefore, leads to more opaque vector representations of topics and documents
  • Latent Semantic Analysis or Latent Semantic Indexing – Uses Singular Value Decomposition (SVD) on the Document-Term Matrix
  • In practice, LSA is much faster to train than LDA, but has lower accuracy.

Example


Keep Thinking!!!

SVD vs PCA vs NMF

Singular value decomposition and principal component analysis are two eigenvalue methods used to reduce a high-dimensional dataset into fewer dimensions while retaining important information

SVD

  • SVD performs low-rank matrix approximation
  • SVD procedure finds the optimum k vectors

PCA on the other hand is:

  • 1) subtract the mean sample from each row of the data matrix.
  • 2) preform SVD on the resulting matrix.
  • The core idea behind PCA uses result obtained through SVD as its backbone

NMF: Non-negative matrix factorization. PCA and NMF optimize for a different result. PCA finds a subspace that conserves the data's variance, while NMF finds nonnegative features.

PCA is highly recommended when you have to transform high dimensions into low dimensions and you are okay to loose original features in process as new one are introduced.

The output of NMF can be visualized as a compressed version of the original dataset

Recommendations systems, Topic modeling, Image compression everything uses the same concepts PCA, SVD, NMF...

Keep Thinking!!!

May 24, 2022

Agility / Influence / Collaboration / Work Pressure

In Software development, Influence without authority is a critical team member/mentor/manager trait.

Agile is not about

  • One sprint for functionality
  • One sprint for performance
  • One sprint for refactor

With the world going by weekly demos. Agile means

  • Collaborating together and bringing the best ideas and building with prioritized ideas
  • Evaluate, If it fails, back to basics and build
  • Quick to recognize mistakes, humble enough to rework, listen and evaluate with the interest in time

Work pressure and deadlines are byproducts of poor design choices and poor execution vs great planning. Knowledge does not come with deadlines, It comes with learning and experimentation. 

When you learn

  • It takes 3X efforts to try different codes, understands
  • When you repeat it again it takes 2X effort as most unknowns are cleared
  • When you master it takes X efforts as its proven/worked / familiar for you

Everything takes time, In a world driven by demos, and communication. the real effort is considering meaningful work by proactively including design, performance and picking best ideas without considering who / what / why.

Keep Thinking!!!

May 23, 2022

Engineering Productivity Myth and Reality

This link was helpful in triggering the thoughts. The framework is interesting

From Work Perspective

  • Design documents and specs
  • Work items, pull requests, commits
  • Code reviews/quality of reviews
  • CI / CD - Count of build, test, deployment/release
  • What I would also add up is
  • Performance and scale while designing
  • Field Testing to know it meets the customer

From General - work-life aspects

  • Extensive and effective communication
  • Block your peers to unblock you
  • Lack of communication implies a lack of coordination

Mindset and culture play a key role

From ref

  • Create a solution without understanding how it all works
  • Continue creating a solution without understanding how it all works
  • After something has caused you multiple issues, don’t stop and reconsider your approach

My perspectives - Ability to have quality ideas, quality alternatives in interest if time, Come back before its too late is essential.

The culture has to evolve (Ref)

  • humble — self-aware, intellectually honest
  • Know their business inside and out
  • recruit dream people to the cause

Be clear about what you don't know and how you plan to achieve it

My perspectives - Quality ideas, quick experiments, balance time and don't end up half-baked product.

More reads

Link1, Link2, Link3, 1-1 discussions, Link4

Keep Thinking!!!

May 20, 2022

Upskilling = Learning Tech + Marketing + Teamwork + Customer Lens

Upskilling in the Late '30s is mixed learning. Its not like mere education in grad days. Its a mix of tech, and business, finding the best of both to make it your competency

Technical Competency - 

  • Whatever you pick, understanding of basics
  • Upskills with the course, projects, dedicated focus
  • Applying the lessons/leverage in your job / Apply the learning from your experience perspective

Markets / Products

  • Awareness of offerings/products
  • Map the demands vs use cases vs areas to apply your tech and business learning
  • Collaborate to build possible MVP with stakeholders, business
  • Sell the value, show the MVP, get the participation

MVP to Production

  • Work ahead to plan / looks for product vision
  • Map the factors of scale and performance
  • Be clear on data aspects to map/mimic the architecture
One interesting note on different parts of work

Keep Thinking!!!

May 18, 2022

Getting things done in Data Science

  • 2015 - 17 - As you learn, What you don't know feels more to learn, What you know about the domain felt not that important
  • 2020 - Reality is both domain and data science are equally important
  • 2022 - Solving and selling are equally important, Being on the same page, and aligned is key

Keep Thinking!!!

May 17, 2022

A very interesting read - Personalization also can lead to unfair pricing

When you visit several times, Your interest is captured in clicks. More you visit, More you have interest to engage

Tinder - participants aged 30-49 on average paid 65.3% more than those aged 18-29

Age group 30-49 - Earn more than teens, They can afford, Higher conversions/success rates. The fear of missing out to settle have a family / engage / search and end up paying more :)

Reference - Link1, Link2

Keep Exploring!!!



May 16, 2022

Big company, More Data, Smaller Dataset, Medium Sized Models - Challenges at Different Levels - Different Data Science Backlogs

FAANG

  • FAANG Companies have no shortage of data, more data and really loads of streaming, insights, all types of clicks
  • Model complexity and large scale training deployment are your challenges

Domain (Automotive / Retail) - Next List after Core Companies

  • Companies adopting Data Models
  • Companies aiming for DBT
  • The challenges in Data Science where you are ahead of data collection, and data maturity is different
  • Data Collection, Engaging, and Selling Data Science use case becomes elementary
  • Mid to small scale models deployed based on business needs
  • Learning at both places is different. Challenges are different.

You can specialize in multiple areas

  • NLP  
  • Vision  
  • Recommendations
  • Forecasting
  • Anamoly Detection

Business knowledge + Feature knowledge + Impact + Selling + Building + Deploying is a never-ending learning curve :)

Keep Thinking!!!

May 15, 2022

There is no single formula for product success.

Your start with idea

  • You adjust your product based on market
  • You add features relevant to adoption
  • You designed X and sold Y but learnt X to Y during initial days
  • Sometimes process wins, sometimes idea wins, sometimes consistency wins, sometimes agility wins
  • You are selling to another human, empathy, and support hold a long way toward adoption
  • Sell and Support are two parallel systems like lungs / heart which keep your company and product alive :)

Keep Thinking!!!

May 14, 2022

Experience

Experience = Able to predict the future based on past
Experience = Able to unblock the team when they are blocked
Experience = Read up before based on intuition
Experience = Aware of alternatives/tradeoffs
Experience = Negotiate based on what's best for the customer not your own bias
Experience = Connect and align with purpose not giving orders
Experience = Unlearn, Relearn
Experience = Go back to basics and rework, pick your priorities
Experience = Embrace uncertainty and embed optimism :)

Experience = It's okay to be yourself

Keep Going!!!

May 13, 2022

Its okay, Keep Walking

You may work on multiple problems
You may forget a few things you have worked

You may relearn it every 10 years, Still, it feels like everything new
You may fail in a few, succeed in few experiments

You may learn more than one language
You may be happy with a few milestones

When you look back suddenly you are in your 40s
Days and nights have gone fast

4 walls of the office and my Laptop kept me engaged
Some production deployment memories, some bug fixes, some not so good memories
Some travel was with the team and some were solo travel

Keep Going!!!

Developer Lens vs Customer Lens

We have a new release of SQL, can we learn/try it (Developer point of view)

Database loads are going to increase for the holiday season, we need to be prepared, Can we migrate to the latest DB Version to handle performance (Customer needs it)

Balancing both lines is leadership 😊


May 12, 2022

Hair Styles - segmentation - paper reads

Paper #1 - Barbershop - Segmentation Masks

Git code  - Link

Notes

  • GAN-based semantic alignment step which generates high quality images similar to the
  • input images
  • The shape of the hair is the binary segmentation region, and the
  • identity of a head-image

Architecture Ref


  • segmentation network such as BiSeNET
  • alignment in 𝑊 + space
  • a close-up view of the face (top) and hair (bottom) in 𝑊 + space
  • close-up views after details are transferred

(1) Reconstruction: A latent code Crec found to reconstruct the input image I𝑘

(2) Alignment: A nearby latent code C align is found that minimizes the cross-entropy between the generated image and the target mask M.

  • manipulating segmentation masks and copying content from different reference images.
  • Copy and replace eyes / nose / lips area pixels and values
  • Copy the landmark areas / eyes / face lips
  • Copy all boundaries of key facial landmarks

Face parsing - link


Paper #2 - Face shape classification using Inception v3



Paper #3 - CelebHair: A New Large-Scale Dataset for Hairstyle Recommendation based on CelebA




Paper - #4 - Fashion Meets Computer Vision: A Survey





Beauty Opportunities





Project - Link

To deploy this app, follow the procedure below in a GCP console terminal:

  • Clone repository: git clone https://github.com/shawnhan108/BiSeNet-app.git.
  • Set project ID: export PROJECT_ID=bisenet.
  • Build docker image: docker build -t gcr.io/bisenet/bisenet-app:v1 ..
  • Authorize docker: gcloud auth configure-docker.
  • Push docker image to Container Registry: docker push gcr.io/bisenet/bisenet-app:v1.
  • Set computing region: gcloud config set compute/zone us-central1-a.
  • Create a Kubernetes Engine cluster: gcloud container clusters create bisenet-cluster --num-nodes=2.
  • Create a Kubernetes deployment of the app: kubectl create deployment bisenet-app --image=gcr.io/bisenet/bisenet-app:v1.
  • Expose the app with a Load Balancer service: kubectl expose deployment bisenet-app --type=LoadBalancer --port 80 --target-port 8080.
  • Go to the browser and test the app: http://[EXTERNAL-IP], where EXTERNAL-IP can be obtained using kubectl get service.

Keep Thinking!!!

Image Super resolution

Paper - Link

  • Super-Resolution Generative Adversarial Network
  • adversarial loss and perceptual loss
  • Improve the network structure by introducing the Residual-in-Residual Dense Block (RDDB)
  • Perceptual loss by using the VGG features
  • Basic architecture

  • Replace the original basic block with the proposed Residual-in-Residual Dense Block (RRDB)

GAN Loss functions

  • minimax loss: The loss function used in the paper that introduced GANs.
  • Wasserstein loss: The default loss function for TF-GAN Estimators. First described in a 2017 paper.

Ref - link

Demo and Sample results

Tensorflow colab code - link

Keep Exploring!!!

May 09, 2022

Deep Learning Revisions

It's always good to take a pause/revise / add a few more learning pointers :)


Key Notes
  • ML operates by handcrafted features
  • DL features learned directly from data
  • Data prevalent, Parallelizable models / hardware, GPU/ CUDA, TF / Pytorch
  • Activation functions and their differentiation


  • Non-Linear functions help to build boundaries






  • Text - sequence of characters / words
  • Stock prices / DNA sequences
  • Temporal dimension to models
  • Same series once for each Timestep
  • Horizontal to vertical view
  • Each output is connected/is input to the next timestamp
  • Internal memory / state-maintained


  • Individual Loss for each timestep
  • Backprop for all timestamps
  • Forward pass across time




  • Back propagate through time
  • Loss with respect to the internal state
  • Attention

Ref - Course Link

Keep Thinking!!!

Feedback / Retrospect your work / Quality of Your Work

  • How many problems did we solve in the past 2 weeks?
  • How many problems have similar issues / same solutions applicable?
  • Did we think of the consequences when we fixed it?
  • Are we over short of thinking / biased for deadlines?
  • Did the fixes meaningfully add value or overhead?
  • How realistically do we rate the quality of solutions
  • Did we do enough homework on options or our thinking is limited?
  • How much testing we did do?
  • Did we call out the flaws and be transparent about output / next steps?

Productivity is not about being engaged its about value delivered :)


Keep Thinking!!!

ML Datasets

Keep Exploring!!!


May 07, 2022

Generative VS Discriminative Models

Generative VS Discriminative Models

Generative Models- He can learn everything in depth. Generative Model, A Generative Model ‌explicitly models the actual distribution of each class. A Generative Model ‌learns the joint probability distribution p(x,y). It predicts conditional probability with the help of the Bayes Theorem. A joint probability is the likelihood of more than one event occurring at the same time

Generative classifiers

  • ‌Naïve Bayes
  • Bayesian networks
  • Markov random fields
  • ‌Hidden Markov Models (HMM)

The Naive Bayes (NB) classifier is a generative model, which builds a model of each possible class based on the training examples for each class. Then, in prediction, given an observation, it computes the predictions for all classes and returns the class most likely to have generated the observation.

HMM

  • The Hidden Markov Model (HMM) is a relatively simple way to model sequential data
  • A HMM consists of two components. Each HMM contains a series of discrete-state, time-homologous, first-order Markov chains (MC) with suitable transition probabilities between states and an initial distribution
  • Model the probabilities of different states and the rates of transitions among them
  • HMMs take a generative approach to labeling, defining
  • constrained to binary transition and emission feature functions, which force each word to depend only on the current label and each label to depend only on the previous label)
  • Markov Assumption: - the probability of a particular state is dependent only on the previous state

Discriminative model

Discriminative mode - Learn the differences between what he saw. Discriminative Model. A Discriminative model ‌models the decision boundary between the classes. A Discriminative model ‌learns the conditional probability distribution p(y|x).

Discriminative Classifiers

  • Logistic regression
  • Scalar Vector Machine
  • ‌Traditional neural networks
  • ‌Nearest neighbour
  • Conditional Random Fields (CRF)s

In contrast discriminative models, like logistic regression, tries to learn which features from the training examples are most useful to discriminate between the different possible classes


Ref  - Link

Keep Thinking!!!

Loss Functions - Deep Learning

The choice of loss depends on the desired output (e.g., classification vs. regression)

Regression Loss Functions

  • Mean Squared Error Loss
  • L1 Loss
  • L2 Loss
  • Mean Squared Logarithmic Error Loss
  • Mean Absolute Error Loss

L2 Norm, mean squared error. Mean Squared Error - The mean square error is probably straight forward. You take the difference of the result and the ground truth for this sample and square it.

The L1 loss is basically the Absolut value of the difference between the current sample’s actual output and the desired output.

Binary Classification Loss Functions

  • Binary Cross-Entropy
  • Hinge Loss
  • Squared Hinge Loss

Multi-class Classification Loss Functions

  • Multi-class Cross Entropy Loss
  • Sparse Multiclass Cross-Entropy Loss
  • Kullback Leibler Divergence Loss

The Negative log-likelihood loss is based on the idea that every output represents a likelihood for example a particular class. It aims to make the output for the correct class has high as possible and for others as small as possible.

Cross entropy loss - The cross entropy loss is very popular for classification problems. The losses are averaged across observations for each minibatch

Kullback-Leibler Divergence Loss - Measures distance between distributions

Ref - Link1, Link2

Keep Thinking!!!

May 04, 2022

Discussions - Perspectives - Code - Observe - Learn

Above my levels

  •  Aligned and connected
  •  Convey the answers in Impact, Costs, Better aligned to the vision

Within my Team

  • Pick over priorities / Prioritizing things
  • Working on timelines without overdoing or chewing more than you can
  • 50% learning, 30% Experimenting, 20% Alignment

My to-do list

  • Experiment / Code / Unblock 
  • Plan ahead / get the required perspective for decisions
  • Reverse engineer / Analyze competitive products / Build / identify features

WFH is - Alert / Aware / Available. All this applies only in IT :)

Keep Thinking!!!

May 03, 2022

Segmentation Notes


Loss functions for image segmentation, Link1

Loss Functions in Segmentation

Image segmentation can be thought of a classification task on the pixel level, and the choice of loss function for the task of segmentation is key in determining both the speed at which a Machine-Learning model converges, as well to some extent, the accuracy of the model.

Ref - Link

Dice loss. This loss is obtained by calculating smooth dice coefficient function. This loss is the most commonly used loss is segmentation problems.

Ref - Link

Keep Thinking!!!