"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

July 30, 2022

What interviews are ? Logic or Design or Syntax ?

  • Logic depends on ideas and experience 
  • Design depends on performance/scalability and upfront thinking
  • Syntax need google search and fixes

A good interview, I loved the aspects they covered. 


This is far better than pointed questions like
  • Sort algorithm time for best / worst case - I hardly remember 
  • Puzzles on Physics - I don't practice every day
  • Write a sort function - Common google it
We live in a system where we don't know how to hire or spot people who are team players.
  • Faking or cracking interviews and building a team of solo performers vs building a team with a lot of team players will have different impacts in long term.
  • The reward is for teamwork, and openness is not for stars.
  • Motivated team vs Team which delivers something working due to work pressure
Keep Thinking!!!

Catalog management - Papers Read

Deep Learning for Automated Tagging of Fashion Images

  • We present 9 deep learning classifiers to predict Fashion attributes in 4 different categories: apparel (dresses and tops), shoes, watches and luggages.
  • By extracting these tags or attributes from fashion images, queries to the products catalogue can be generated looking for similar or complementary products, produce recommendations for the user, fill missing metadata, and overall provide an improved search experience



Tiered Deep Similarity Search for Fashion

  • We propose a new attribute-guided metric learning (AGML) with multitask CNNs that jointly learns fashion attributes and image embeddings


FashionSearchNet: Fashion Search with Attribute Manipulation

  • The focus of this paper is on retrieval of fashion images after manipulating attributes of the query images.

Keep Exploring!!!

July 29, 2022

Virtual Try on - Fashion - Paper - Reading Notes

Key Techniques

  • Landmarks
  • Segmentation
  • Masking
  • Target addition
Approach #1

Approach #2


Ref 

Approach 3


Ref

Papers

Keep Exploring!!!


Interesting poster sessions


Good Summary - AI ML Use cases



 


Ref - Link

Keep Exploring!!!

MLops Tools

MLOps tools link 

  • CI/CD For Machine learning: ClearML, CML, Gitlab
  • CronJob Monitoring: Cronitor, HealthchecksIO 
  • Data Exploration: Apache Zeppelin, BambooLib, Google Colab, Jupyter Notebook, JupyterLab
  • Data Management: DVC, Arrikto, BlazingSQL, Delta Lake, Dolt, DVC, Git LFS
  • Data Processing: AirFlow, Hadoop
  • Data Validation: Cerberus, Great Expectations
  • Data Visualization: SuperSet, Tableau, Facet, Dash
  • Feature Engineering: Featuretools, TSFresh
  • Feature Store: Butterfree, ByteHub, Feast, Tecton
  • Hyperparameter Tuning: Hyperas, Hyperopt, Kabit, KerasTuner, Optuna, Scikit Optimize, Optuna
  • Machine Learning Platform: SageMaker, Kubeflow, H2O, MLReef, algorithmia, DataRobot, DAGsHub
  • Model FairNess: AI 360, FairLearn, Opacus
  • Model Interpretability: Alibi, Captum, ELI5, InterpretML, LIME, Lucid, SAGE, SHAP, Skater
  • Model LifeCycle: MLflow, NeptuneAI, Comet, Keepsake, ModelDB, Weights and Biases
  • Model Serving: BentoML, Tensorflow Serving, KFServing, SeldonCore, Streamlit, TorchServce, Gradio, Graphpipe, Hydrosphereout
  • Model Testing and Validation: DeepChecks
  • Optimization Tools: Dask, DeepSpeed, Horovod, Tpot, Ray Rapids
  • Simplification Tools for ML: Pycaret, Hermione, Hydra, Koalas, TuriCreate(apple), TrainGenerator
  • Visual Analysis and Debugging: Aporia, Evidently, Yellowricks, Netron, Fiddler, Manifold
  • Workflow Tools: MLRun,Flyte, Metaflow, Ploomber, ZenML, Kedro
Ref - Link


Big Picture - Different phases of Model Development

Ref - Link

Overall Landscape - Monitor, Manage, Retrain, Tools Stack


MLOps vs Data Engineering

I always had a mixed opinion of different tasks in ML vs Data Engineering overlapping. This article I align to the views


Data in different forms and the reporting aspects
  • Transaction Data
  • BI Reports
  • ML Features
  • ML Dashboards
  • Everything operates on same data. 
Key Questions from article
  • How different is the observability of model quality metrics like drift different to any product-related monitoring? 
  • In product we keep monitoring the performance of our features, do people engage with them in the way we expect?



Keep Exploring!!!

July 28, 2022

Growth Dimensions of Career

Fantastic blog on career growth and areas, Taking reference from link


Success is Teamwork. It combines Tech, Domain, Collaboration, Process, and aligned vision.

Developer roles and growth path link

As Engineering Manager, I find myself balancing this perspective


Building new offerings. Knowing the tech trend. Implementing/connecting to relevant opportunities. Learning aligned with new opportunities in business. Its never ending but it's interesting!!!

Keep Exploring!!!

Face Beauty Papers

Paper #1 - FabSoften: Face Beautification via Dynamic Skin Smoothing, Guided Feathering, and Texture Restoration

Key Notes

  • Softening is carried out by an attribute-aware dynamic smoothing filter
  • YouCam, B612 and ModiFace
  • Smoothing blemishes in the facial region, including wrinkles, spots, patchy reflections, and skin nonuniformities.

Related Work

  • Edge-Aware Smoothing Filters; 
  • Layer Decomposition Based Approaches; 
  • Deep Learning-Based Approaches; 
  • Generative Models

  • To detect blemishes in the skin region, we first employ the Canny Edge detector [6] to localize strong edge patterns

Skin-mask generation algorithms generally fall into three broad categories:

  • Color pixel classification 
  • Gaussian Mixture Model
  • Mutually Guided Image Filtering (muGIF) 
  • Fast Global Smoothing filter (FGS)

Guided Filter Python Code

  • We crop zoomed-in portions of the image to highlight each method’s performance on skin texture retainment and preserving hair regions

Fabsoften Key Features

  • Preprocessing
  • Landmark Detection
  • Binary Skin Mask
  • Blemish Detection and Concealment
  • Skin Mask Generation and Refinement
  • GMM Clustering (#6)

Segmentation?

  •  Guided Feathering (pending)
  •  Skin Imperfection Smoothing
  •  Dynamic Mean Filter
  •  Attribute-aware Dynamic Guided Filter
  •  Skin Texture Restoration (Wavelet-based STR)

BeautyGAN

Facebeauty

Beautyfinder

Face Smoothing: Detection and Beautification

  • Change image from BGR to HSV colorspace
  • Create mask of HSV image
  • Apply a bilateral filter to the Region of Interest
  • Apply filtered ROI back to original image

Guided Filter - Simple Python implementation of paper:

Face beautification algorithm

Paper - Face Beautification and Color Enhancement with Scene Mode Detection

  • A bilateral filter is an edge-preserving smoothing filter.
  • Contrast enhancement methods can produce strong effect on local contrast enhancement
  • Gaussian Blur of space and intensity of each pixel
  • Automatically detect the best-match scene mode for input image. 

BEHOLDER-GAN: GENERATION AND BEAUTIFICATION OF FACIAL IMAGES WITH CONDITIONING ON THEIR BEAUTY LEVEL

  • Given an image x and the pre-trained generator G, we want to recover the corresponding latent vector z and beauty score β

Face Beautification: Beyond Makeup Transfer

More Reads

Keep Exploring!!!

July 27, 2022

Good Read - Hybrid WHF = Productivity

 



Keep Thinking!!!

Skills / Knowledge / Perspective

Domain and Tech Expertise is a mix of

  • Awareness of tech/trends and tech convergence
  • MVP skills to code / demonstrate product / idea
  • Storytelling skills connecting business, ideas, tech

Moving from MVP to product needs

  • Ability to spot what will scale / what will fail
  • Think from a competitor's view
  • Think from a customer usage point of view

Observe / Analyze / Incorporate / Grow yourself and your Team

Keep Thinking!!!

July 24, 2022

NoSQL Summary - Options

 A bit of a relook on NoSQL for a class helped me consolidate my learning.

NoSql - Not only SQL. During Engineering when it comes to Database design it is all about

  • Codd's Rule
  • Normalization Techniques 

What I thought in 2005

  • How we handle columns, data types, relationships everything is key. Handling Null, Default values, constraints, etc...

Systems data was structured in 2000

  • 20 years back there was no social media, no WhatsApp. Most of the data is structured data, transactions, automating orders, etc..
What all performance improvements/challenges came as data volumes increased?
  • Partitioning by products/duration
  • Replication to manage read / writes
  • Use of Snapshot isolation/options
  • Denomalizing few tables
  • Migrating to the latest version / Rewriting some of the slow-performing reports
  • Pagination of reports instead of fetch all approach
  • Archiving completed orders
  • Vertical Scaling- Add more RAM, CPU
Since the social media age

  • Now we have more unstructured, semi-structured data from mobile phones, social media, reviews, ratings, rankings, messages, images, and videos.

I still remember the 2010 period when Hadoop was much spoken about. Moving computation where data is available. I looked up my post in 2011 on MongoDB. 

The evolution of databases is from

  • Stage 1 - Papers, Ledgers
  • Stage 2 - Excel, Access
  • Stage 3 - Databases
  • Stage 4 - Hadoop for large-scale data
  • Stage 5 - NoSQL
  • Stage 6 - lakehouse = (Hadoop + RDBMS + NOSQL + AI for data extraction from unstructured sources)

Building a RDBMS perspective is Tables, Keys, Relationships


Ref - Link

Everything revolves around Reading Correct Data vs Dirty Data (Transactions in progress may or may not commit). 

Everything in DBMS is

  • Create
  • Read
  • Update
  • Delete

How does read/write balance, Essentially a record or row needs to be locked before update. This ensures we work in a consistent state.

CAP theorem is the Crux of Everything


Ref - Link

Now you need to choose DB based on preference (C - A - P)


Questions to ask to decide on the choice of Database?
  • Is Query pattern aggregates or select for individual records?
  • What is projected database growth?
  • Is it structured / semi-structured data?
  • What are my top 2 choices, can I do a quick prototype and performance test to validate 
  • Schema design what practices are relevant to each database type? What maps closely to the current context?
  • Is Consistency a key thing, What about Availability / Partition tolerance, Is this system queried across geography to have availability in different regions
  • If it exists how different copies will sync up, Will there be a master-slave approach / Replication / Log copy?
  • What is cost allocated considering volume, and high availability needs?
Different NoSQL Systems

  • In one of the SaaS products we worked on, the Redis Key value pair was used for session management
  • IoT platform for device management in one of my friends Team Cassandra was used to push device data / Generate reports
  • One of the big retailers I was familiar used heavily columnar database Vertica to manage all their aggregate data for BI / ML work
We need to consider the use case, data volume, velocity, type of reporting, cost, growth, security everything to decide on choosing a database.

Ref - Link

I say table in RDBMS, Collection in MongoDB, What is the conceptual mapping?


SQL vs NoSQL Design Thinking 

  • How do I design collection in document DB, nested 1 to 1, 1 to many relationships
  • What information I store in key-value pair, What key value will be unique and will not result in duplicates
  • What column family I will create, How many aggregate queries will look like
Schema / Relationships / Keys will vary based on the Database type.

Which Database for What Application Purpose?
  • High reads consistent data - RDBMS
  • High writes low reads - HBase, Cassandra
  • Document-based storage (multiple key-value pairs, or key-array pairs, or even nested documents) - Mongodb, Couchdb
  • Key-Value stores are similar to maps or dictionaries where data is addressed by a unique key - Redis

Above all cost also plays a key role. Knowing what to choose based on size, data growth, and access patterns is key to deciding the type of Database for implementation.

RDBMS, KeyValue, Columnar, Graph, Document Collection all these forms of databases will co-exist :)

Data Stack

Ref - Link

Modern Analytics Stack



Ref - Link


Source - Link

July 21, 2022

Personal Mastery

  • Speak Less, Authentic makes you be sensitive
  • Authentic/sensitive, Balance of Above neck / below the neck. 
  • Making authentic conversation, and balancing being sensitive vs authentic is good learning, Everyone's pattern is different but how you balance comes with time
  • Being diplomatic vs openness, Here sensitivity may be missing
  • Being Authentic vs openness, Here sensitiveness is balanced 

Keep Exploring!!!

Being a Mentor

  • You have to be careful to communicate as well as not to hurt. it has to result in improvement / fill the gap / not create more gaps
  • Cautiousness / Respect / Tolerant
  • Speak up without worrying. Don't overthink
  • Don't struggle to give tough feedback / Be Authentic and Sensitive

Keep Mentoring!!!

July 20, 2022

Recommendation Systems

A Review of Modern Fashion Recommender Systems

  • Key Notes
  • Recommender systems have grown to be an essential part of all large Internet retailers, driving up to 35% of Amazon sales [103] or over 80% of the content watched on Netflix [31].
  • Localizing fashion items
  • Determining their category and attributes
  • Degree of similarity to other products
  • Product-to-product relationships
  • Product-to-user uncertainties
  • Fashion item compatibility - associated image and text data is then used to learn to generalize to stylistically similar products
  • The fashion item recommendation task, similar to the classical recommendation problem, focuses on suggesting individual fashion items (clothing), that match users’ preferences.
  • Fashion pair and outfit recommendation: Fashion outfits are sets of 𝑁 items that are worn together, e.g., for an outdoor wedding, graduation party, baby shower, and so forth
  • Modeling outfits as a sequence. to take advantage of the representation of order-aware models such as LSTMs
  • Fashion Item Relevancy network (FIR) learns the compatibility of fashion items and learns garment item relevance embeddings
  • Physical body-related features. The easiest way to make effective sizing recommendations is to use data from certain parts of the body [58, 60] such as bust, waist, and hip
  • User-item fit feedback. To provide personalized size recommendations, the interaction between the user and the item is essential

  • Color. The most common means to identify how one looks is achieved via colors, materials, and silhouettes on the body
  • Brand. Product brands are a critical feature users consider when deciding among items.
  • Texture. The texture describes the body and surface of a garment. 
  • Context = image + text. In addition to images, users may also include words (textual descriptions) to aid in the recommendation process
  • Context = image. Images are an important visual tool for users to communicate with a fashion recommender system

Toward Explainable Fashion Recommendation

  • Influence of the itemfeature pair, which we call its Item-Feature Influence Value (IFIV)
  • CNNs trained for generic image recognition are used to extract features for their respective purposes. 


Fashion Recommendation and Compatibility Prediction Using Relational Network

  • Learning compatibility between "tops" and "bottoms" Treating outfits as a sequence and using an LSTM-based mode

Single-Item Fashion Recommender: Towards Cross-Domain Recommendations

  • Category: Defines the main category of an image, such as top, bottom, footwear, and jewelry.
  • Subtype: Defines subtypes of the same category, such as boots, high heels, college, and slippers.
  • Fabric/Texture: Shows the main fabric or garment’s texture, such as denim, leather, smooth, and shiny.
  • Color: Defines the dominant color of the item, such as red, green, blue, yellow.
  • Variety: The number of novel items (different category, subtype, or color). Almost on the opposite side of the other criteria, because the higher the variety score is, the lower other scores will be.
  • Details: The number of results that follow fine details, such as necklines, zipper, pockets, and design.
  • Shape Difference: The number of items that do not follow the outline of the query item, such as images with different angles, different perspectives, rotations, flips
Key Concepts
  • Data generation
  • Embedding generation
Similarity
  • Cosine
  • Euclidean
Data Size Reduction
  • SVD
  • NMF


The tradeoff between batch vs realtime



  • What is computed prior? 
  • What is used in real time to adjust prior recommendations?



  • Offline - creating embeddings for catalog items, and building an approximate nearest neighbors (ANN) 
  • Online - converting the input item or search query into an embedding, followed by candidate retrieval and ranking



Transitioning to a real-time serving system has been made possible by two products: Feature Store and Online Inference Platform


July 19, 2022

Model drift examples

evidently - evaluate, test and monitor ML models in production

Detailed Examples - Link

Example notebooks - Link

  • Compare two distributions and measure / report

Key code snippets - Link

Duration of distributions


Generate both predictions


Comparisons


Keep Exploring!!!

Feast - Featurestores

  • Feast joins these table
  • Feast manages deployment to a variety of online stores

Colab Example

  • A feature repository consists of:
  • A collection of Python files containing feature declarations.
  • A feature_store.yaml file containing infrastructural configuration.

You define the features

  • It generates it and stores it as SQLite 

I have a question, Can I do the same in SQL itself :)

Connect and query created features


Key Steps

  • Load Data - Supports multiple DB connectors
  • Define features yaml definitions
  • Save them in SQLite format
  • Consume in the model development



Tracking and managing features

  • s3 - Redshiftspectrum - Featurestores created
  • Redshift - Query Engine
  • Feature service - Map features to models

Keep Checking!!!

July 12, 2022

Bias and AI

Comparing Human and Machine Bias in Face Recognition

Key Notes

  • Disparities between groups of people based on perceived gender, skin type, lighting condition
  • poor light exposure, blurriness, facial obstruction

A survey on bias in visual datasets

  • Selection bias is the type of bias that “occurs when individuals or groups differ systematically from the population of interest
  • We refer to framing bias as any associations or disparities that can be used to convey different messages and/or that can be traced back to the way in which the visual content has been composed.
  • We define label bias as any errors in the labelling of visual data, with respect to some ground truth, or the use of poorly defined or inappropriate semantic categories.

Keep Checking!!!

Good Read Modern Data Stack vs Realistic cost friendly working architecture

 


Keep Thinking!!!

July 11, 2022

What does from_logits=True do in SparseCategoricalcrossEntropy loss function?

  •  The from_logits=True attribute inform the loss function that the output values generated by the model are not normalized
  • In other words, the softmax function has not been applied on them to produce a probability distribution
  • Basically we need to softmax and pick max value or the max value from list is the prediction
Keep Learning!!!

July 10, 2022

Learning to know what's missing

Where I spent time

  • Computer Vision
  • Vision / Transformer Models
  • Forecasting Algos
  • Data / Stored procedures
  • Dockerization / Streamlit / Deployment

Where I know but spend less time as other folks take care :)

  • Elayra Pipelines
  • Ray optimization
  • Terraform
  • Kubeflow custom deployment 
  • Ingress
  • Keycloak
  • Monitoring tools 

It's a never-ending loop of tool learning!!!

July 08, 2022

Vision across domains - Tools vs Solutions

Solutions vs Familiarity vs Awareness vs Skills

You will observe a bunch of startups in each domain/category. Learning the tools vs solving a problem both require different levels of skills. Learn the tool, and master the problem, techniques, and alternatives/solutions.

  • Vision in healthcare - Xray, MRI, Scan images
  • Vision for BPO Automation - Automated invoice processing
  • Vision for Monitoring / Inspection - Quality Assessment
  • Vision for Safety - Industrial safety
  • Vision for Fruit Freshness - Automated pricing on quality

Keep Thinking!!!

July 07, 2022

Career in Data Science - Fresher / Lateral moves

A job is like onboarding and getting started, When you get into the train it doesn't matter it's unreserved / sleeper / ac. First, you need to get started. If you are starting your career in data, BI, reporting, SQL development. Everything connects to data science getting started.

A job is important, to begin with. Going towards a destination is as important as waiting for the best train. 

Every job you are paid to solve problems, there is no good data, good upstream, all set and come and code. Build solutions with what you have matter than complaining there is no good data / work.

After a few years when you switch into Data Science, focus more on value addition. How your experience can help in data, domain, and selling. An experience transition should offer more than just a data science developer role.

You cannot learn a new skill every day. You can have primary skills, secondary skills. Collaborate with people, work on joint success. You can never be solo successful anywhere. Be a team player.

Mindset / Focused learning / Get over failures / Remain focused on value creation are key traits of career growth.

Keep Thinking!!!

How to write a short summary on a Topic ?

Forget what you have written, Evaluate your answer/perspective

  • What is the takeaway you want for the reader
  • What points will create interest
  • What 2-3 Buzzwords in it will create curiosity
Keep connecting to your readers!!!

Focus / Being Aligned

  • Follow through, Don't miss look at big picture
  • Fence your focus, weed your distraction
  • Cycle around your focus, keep learning, incremental additions
  • Discuss concerns / issues
  • Keep passion / interests always focused


Keep Exploring!!!

July 06, 2022

Being ahead of Learning curve

  • Keep a connecting line for every new concept / idea to the past lessons
  • Cycle the subject, again and again, Learning needs iterations
  • Awareness - Code - Practice - Do Often
  • Aware of trends / Competitors
  • Provide insights / inference / observations

Tech is interesting, Keep Exploring!!!

OpenCV Notes

Color channels

  • Hue: Measures the color of the pixel.
  • Saturation: Measures the intensity of color of the pixel.
  • Value: Measures the brighness of the pixel.

Hue range

  • Red (0-60)
  • Yellow (60-120)
  • Green (120-180)
  • Cyan (180-240)
  • Blue (240-300)
  • Magenta (300-360)

RGB Channels

  • (0,0,0) is a black color.
  • (255,0,0) is a pure red color.
  • (0,255,0) is a pure green color.

 

Smoothing Functions

  • Image Smoothing - convolving the image with a low-pass filter kernel. It is useful for removing noise
  • Bilateral filter - replaces the intensity of each pixel with a weighted average of intensity values. cv.bilateralFilter() is highly effective in noise removal while keeping edges sharp


Color Codes

Mask for custom color range


Keep Thinking!!!!

July 04, 2022

Myth of Data

  •  I don't like my job
  • Data is insufficient
  • There are no good data science use cases

Learn all coding questions, Practice and learn all ML maths, Solve all kaggle problems, Land your dream job, the data you will face the same problem

  • Data is insufficient

What you see in learning / kaggle is not real-world data issues

Wherever you go, bad data and incomplete data only will be there. 

Keep Thinking!!!

Recommendation Systems

Bookmark of Repos for my ongoing learning

Deep Learning based Recommender System: A Survey and New Perspectives





High level thoughts

  • Data connectors class
  • SVDclass - Dataload - Algorun - Validate - Results
  • NMFclass - Dataload - Algorun - Validate - Results
  • ItemItemrecomclass - Dataload - Algorun - Validate - Results
  • UserUserrecomclass - Dataload - Algorun - Validate - Results

Structuring Your Project

Keras Reads

July 5th Updates

Text to Image generation

min(DALL·E)

Sample input


Output

DALL·E Mini

paper - Zero-Shot Text-to-Image Generation

Colab


Keep Exploring!!! 

July 03, 2022

Personalized - Redefined - Recommendations - Paid Service

What if we get a customized recommendation based on our needs than based on what we are forced to see. 

Workday recommendations (Articles / Videos based on interests)

  • Data Science (20%)
  • Startups (20%)
  • Stock markets (20%)
  • Travel Vlog (10%)
  • Paranormal Vlog (5%)
  • Food Vlog(5%)
  • Emotions / Wellness (10%)

Weekend Recommendations

  • Travel Vlog (20%)
  • Paranormal Vlog (20%)
  • Food Vlog(10%)
  • Emotions / Wellness (20%)
  • Music (30%)

Tired of seeing the same swiggy ads, zomato ads, irrelevant to the context ads.

Ref - Link


Keep Thinking!!!