Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): July 2022

July 30, 2022

What interviews are ? Logic or Design or Syntax ?

Logic depends on ideas and experience
Design depends on performance/scalability and upfront thinking
Syntax need google search and fixes

A good interview, I loved the aspects they covered.

This is far better than pointed questions like

Sort algorithm time for best / worst case - I hardly remember
Puzzles on Physics - I don't practice every day
Write a sort function - Common google it

We live in a system where we don't know how to hire or spot people who are team players.

Faking or cracking interviews and building a team of solo performers vs building a team with a lot of team players will have different impacts in long term.
The reward is for teamwork, and openness is not for stars.
Motivated team vs Team which delivers something working due to work pressure

Keep Thinking!!!

Catalog management - Papers Read

Deep Learning for Automated Tagging of Fashion Images

We present 9 deep learning classifiers to predict Fashion attributes in 4 different categories: apparel (dresses and tops), shoes, watches and luggages.
By extracting these tags or attributes from fashion images, queries to the products catalogue can be generated looking for similar or complementary products, produce recommendations for the user, fill missing metadata, and overall provide an improved search experience

We propose a new attribute-guided metric learning (AGML) with multitask CNNs that jointly learns fashion attributes and image embeddings

FashionSearchNet: Fashion Search with Attribute Manipulation

The focus of this paper is on retrieval of fashion images after manipulating attributes of the query images.

Keep Exploring!!!

July 29, 2022

Virtual Try on - Fashion - Paper - Reading Notes

Key Techniques

Landmarks
Segmentation
Masking
Target addition

Approach #1

Ref1

Approach #2

Ref

Approach 3

Ref

Papers

Keep Exploring!!!

Interesting poster sessions

Good Summary - AI ML Use cases

#ML, #AI, #DL #NLP use cases in #Retail #BigData #IoT #Datascience #Fintech #Blockchain #CX #UX #SmartCity #BI #DX #MachineLearning #IIoT pic.twitter.com/mbQBGnelpD
— HumanWare (@humanwareonline) September 25, 2017

Ref - Link

Keep Exploring!!!

MLops Tools

MLOps tools link

CI/CD For Machine learning: ClearML, CML, Gitlab
CronJob Monitoring: Cronitor, HealthchecksIO
Data Exploration: Apache Zeppelin, BambooLib, Google Colab, Jupyter Notebook, JupyterLab
Data Management: DVC, Arrikto, BlazingSQL, Delta Lake, Dolt, DVC, Git LFS
Data Processing: AirFlow, Hadoop
Data Validation: Cerberus, Great Expectations
Data Visualization: SuperSet, Tableau, Facet, Dash
Feature Engineering: Featuretools, TSFresh
Feature Store: Butterfree, ByteHub, Feast, Tecton
Hyperparameter Tuning: Hyperas, Hyperopt, Kabit, KerasTuner, Optuna, Scikit Optimize, Optuna
Machine Learning Platform: SageMaker, Kubeflow, H2O, MLReef, algorithmia, DataRobot, DAGsHub
Model FairNess: AI 360, FairLearn, Opacus
Model Interpretability: Alibi, Captum, ELI5, InterpretML, LIME, Lucid, SAGE, SHAP, Skater
Model LifeCycle: MLflow, NeptuneAI, Comet, Keepsake, ModelDB, Weights and Biases
Model Serving: BentoML, Tensorflow Serving, KFServing, SeldonCore, Streamlit, TorchServce, Gradio, Graphpipe, Hydrosphereout
Model Testing and Validation: DeepChecks
Optimization Tools: Dask, DeepSpeed, Horovod, Tpot, Ray Rapids
Simplification Tools for ML: Pycaret, Hermione, Hydra, Koalas, TuriCreate(apple), TrainGenerator
Visual Analysis and Debugging: Aporia, Evidently, Yellowricks, Netron, Fiddler, Manifold
Workflow Tools: MLRun,Flyte, Metaflow, Ploomber, ZenML, Kedro

Ref - Link

Big Picture - Different phases of Model Development

Ref - Link

Overall Landscape - Monitor, Manage, Retrain, Tools Stack

MLOps vs Data Engineering

I always had a mixed opinion of different tasks in ML vs Data Engineering overlapping. This article I align to the views

MLOps is Mostly Data Engineering.

Data in different forms and the reporting aspects

Transaction Data
BI Reports
ML Features
ML Dashboards
Everything operates on same data.

Key Questions from article

How different is the observability of model quality metrics like drift different to any product-related monitoring?
In product we keep monitoring the performance of our features, do people engage with them in the way we expect?

Cycle of Activities

Keep Exploring!!!

July 28, 2022

Growth Dimensions of Career

Fantastic blog on career growth and areas, Taking reference from link

Success is Teamwork. It combines Tech, Domain, Collaboration, Process, and aligned vision.

Developer roles and growth path link

As Engineering Manager, I find myself balancing this perspective

Building new offerings. Knowing the tech trend. Implementing/connecting to relevant opportunities. Learning aligned with new opportunities in business. Its never ending but it's interesting!!!

Keep Exploring!!!

Face Beauty Papers

Paper #1 - FabSoften: Face Beautification via Dynamic Skin Smoothing, Guided Feathering, and Texture Restoration

Key Notes

Softening is carried out by an attribute-aware dynamic smoothing filter
YouCam, B612 and ModiFace
Smoothing blemishes in the facial region, including wrinkles, spots, patchy reflections, and skin nonuniformities.

Related Work

Edge-Aware Smoothing Filters;
Layer Decomposition Based Approaches;
Deep Learning-Based Approaches;
Generative Models

To detect blemishes in the skin region, we first employ the Canny Edge detector [6] to localize strong edge patterns

Skin-mask generation algorithms generally fall into three broad categories:

Color pixel classification
Gaussian Mixture Model
Mutually Guided Image Filtering (muGIF)
Fast Global Smoothing filter (FGS)

Guided Filter Python Code

We crop zoomed-in portions of the image to highlight each method’s performance on skin texture retainment and preserving hair regions

Fabsoften Key Features

Preprocessing
Landmark Detection
Binary Skin Mask
Blemish Detection and Concealment
Skin Mask Generation and Refinement
GMM Clustering (#6)

Segmentation?

Guided Feathering (pending)
Skin Imperfection Smoothing
Dynamic Mean Filter
Attribute-aware Dynamic Guided Filter
Skin Texture Restoration (Wavelet-based STR)

BeautyGAN

Facebeauty

Beautyfinder

Face Smoothing: Detection and Beautification

Change image from BGR to HSV colorspace
Create mask of HSV image
Apply a bilateral filter to the Region of Interest
Apply filtered ROI back to original image

Guided Filter - Simple Python implementation of paper:

Face beautification algorithm

Paper - Face Beautification and Color Enhancement with Scene Mode Detection

A bilateral filter is an edge-preserving smoothing filter.
Contrast enhancement methods can produce strong effect on local contrast enhancement
Gaussian Blur of space and intensity of each pixel
Automatically detect the best-match scene mode for input image.

BEHOLDER-GAN: GENERATION AND BEAUTIFICATION OF FACIAL IMAGES WITH CONDITIONING ON THEIR BEAUTY LEVEL

Given an image x and the pre-trained generator G, we want to recover the corresponding latent vector z and beauty score β

Face Beautification: Beyond Makeup Transfer

More Reads

Keep Exploring!!!

July 27, 2022

Good Read - Hybrid WHF = Productivity

Keep Thinking!!!

Skills / Knowledge / Perspective

Domain and Tech Expertise is a mix of

Awareness of tech/trends and tech convergence
MVP skills to code / demonstrate product / idea
Storytelling skills connecting business, ideas, tech

Moving from MVP to product needs

Ability to spot what will scale / what will fail
Think from a competitor's view
Think from a customer usage point of view

Observe / Analyze / Incorporate / Grow yourself and your Team

Keep Thinking!!!

July 24, 2022

NoSQL Summary - Options

A bit of a relook on NoSQL for a class helped me consolidate my learning.

NoSql - Not only SQL. During Engineering when it comes to Database design it is all about

Codd's Rule
Normalization Techniques

What I thought in 2005

How we handle columns, data types, relationships everything is key. Handling Null, Default values, constraints, etc...

Systems data was structured in 2000

20 years back there was no social media, no WhatsApp. Most of the data is structured data, transactions, automating orders, etc..

What all performance improvements/challenges came as data volumes increased?

Partitioning by products/duration
Replication to manage read / writes
Use of Snapshot isolation/options
Denomalizing few tables
Migrating to the latest version / Rewriting some of the slow-performing reports
Pagination of reports instead of fetch all approach
Archiving completed orders
Vertical Scaling- Add more RAM, CPU

Since the social media age

Now we have more unstructured, semi-structured data from mobile phones, social media, reviews, ratings, rankings, messages, images, and videos.

I still remember the 2010 period when Hadoop was much spoken about. Moving computation where data is available. I looked up my post in 2011 on MongoDB.

The evolution of databases is from

Stage 1 - Papers, Ledgers
Stage 2 - Excel, Access
Stage 3 - Databases
Stage 4 - Hadoop for large-scale data
Stage 5 - NoSQL
Stage 6 - lakehouse = (Hadoop + RDBMS + NOSQL + AI for data extraction from unstructured sources)

Building a RDBMS perspective is Tables, Keys, Relationships

Ref - Link

Everything revolves around Reading Correct Data vs Dirty Data (Transactions in progress may or may not commit).

Everything in DBMS is

Create
Read
Update
Delete

How does read/write balance, Essentially a record or row needs to be locked before update. This ensures we work in a consistent state.

CAP theorem is the Crux of Everything

Ref - Link

Now you need to choose DB based on preference (C - A - P)

Questions to ask to decide on the choice of Database?

Is Query pattern aggregates or select for individual records?
What is projected database growth?
Is it structured / semi-structured data?
What are my top 2 choices, can I do a quick prototype and performance test to validate
Schema design what practices are relevant to each database type? What maps closely to the current context?
Is Consistency a key thing, What about Availability / Partition tolerance, Is this system queried across geography to have availability in different regions
If it exists how different copies will sync up, Will there be a master-slave approach / Replication / Log copy?
What is cost allocated considering volume, and high availability needs?

Different NoSQL Systems

In one of the SaaS products we worked on, the Redis Key value pair was used for session management
IoT platform for device management in one of my friends Team Cassandra was used to push device data / Generate reports
One of the big retailers I was familiar used heavily columnar database Vertica to manage all their aggregate data for BI / ML work

We need to consider the use case, data volume, velocity, type of reporting, cost, growth, security everything to decide on choosing a database.

Ref - Link

I say table in RDBMS, Collection in MongoDB, What is the conceptual mapping?

SQL vs NoSQL Design Thinking

How do I design collection in document DB, nested 1 to 1, 1 to many relationships
What information I store in key-value pair, What key value will be unique and will not result in duplicates
What column family I will create, How many aggregate queries will look like

Schema / Relationships / Keys will vary based on the Database type.

Which Database for What Application Purpose?

High reads consistent data - RDBMS
High writes low reads - HBase, Cassandra
Document-based storage (multiple key-value pairs, or key-array pairs, or even nested documents) - Mongodb, Couchdb
Key-Value stores are similar to maps or dictionaries where data is addressed by a unique key - Redis

Above all cost also plays a key role. Knowing what to choose based on size, data growth, and access patterns is key to deciding the type of Database for implementation.

RDBMS, KeyValue, Columnar, Graph, Document Collection all these forms of databases will co-exist :)

Data Stack

Ref - Link

Modern Analytics Stack

Ref - Link

Source - Link

New Cardinality Notations and Styles for Modeling NoSQL Document-store Databases

NoSQL data modeling

ERD Plus

Keep Learning!!!

July 21, 2022

Personal Mastery

Speak Less, Authentic makes you be sensitive
Authentic/sensitive, Balance of Above neck / below the neck.
Making authentic conversation, and balancing being sensitive vs authentic is good learning, Everyone's pattern is different but how you balance comes with time
Being diplomatic vs openness, Here sensitivity may be missing
Being Authentic vs openness, Here sensitiveness is balanced

Keep Exploring!!!

Being a Mentor

You have to be careful to communicate as well as not to hurt. it has to result in improvement / fill the gap / not create more gaps
Cautiousness / Respect / Tolerant
Speak up without worrying. Don't overthink
Don't struggle to give tough feedback / Be Authentic and Sensitive

Keep Mentoring!!!

July 20, 2022

Recommendation Systems

A Review of Modern Fashion Recommender Systems

Key Notes
Recommender systems have grown to be an essential part of all large Internet retailers, driving up to 35% of Amazon sales [103] or over 80% of the content watched on Netflix [31].
Localizing fashion items
Determining their category and attributes
Degree of similarity to other products
Product-to-product relationships
Product-to-user uncertainties
Fashion item compatibility - associated image and text data is then used to learn to generalize to stylistically similar products
The fashion item recommendation task, similar to the classical recommendation problem, focuses on suggesting individual fashion items (clothing), that match users’ preferences.
Fashion pair and outfit recommendation: Fashion outfits are sets of 𝑁 items that are worn together, e.g., for an outdoor wedding, graduation party, baby shower, and so forth
Modeling outfits as a sequence. to take advantage of the representation of order-aware models such as LSTMs
Fashion Item Relevancy network (FIR) learns the compatibility of fashion items and learns garment item relevance embeddings
Physical body-related features. The easiest way to make effective sizing recommendations is to use data from certain parts of the body [58, 60] such as bust, waist, and hip
User-item fit feedback. To provide personalized size recommendations, the interaction between the user and the item is essential

Color. The most common means to identify how one looks is achieved via colors, materials, and silhouettes on the body
Brand. Product brands are a critical feature users consider when deciding among items.
Texture. The texture describes the body and surface of a garment.
Context = image + text. In addition to images, users may also include words (textual descriptions) to aid in the recommendation process
Context = image. Images are an important visual tool for users to communicate with a fashion recommender system

Toward Explainable Fashion Recommendation

Influence of the itemfeature pair, which we call its Item-Feature Influence Value (IFIV)
CNNs trained for generic image recognition are used to extract features for their respective purposes.

Fashion Recommendation and Compatibility Prediction Using Relational Network

Learning compatibility between "tops" and "bottoms" Treating outfits as a sequence and using an LSTM-based mode

Single-Item Fashion Recommender: Towards Cross-Domain Recommendations

Category: Defines the main category of an image, such as top, bottom, footwear, and jewelry.
Subtype: Defines subtypes of the same category, such as boots, high heels, college, and slippers.
Fabric/Texture: Shows the main fabric or garment’s texture, such as denim, leather, smooth, and shiny.
Color: Defines the dominant color of the item, such as red, green, blue, yellow.
Variety: The number of novel items (different category, subtype, or color). Almost on the opposite side of the other criteria, because the higher the variety score is, the lower other scores will be.
Details: The number of results that follow fine details, such as necklines, zipper, pockets, and design.
Shape Difference: The number of items that do not follow the outline of the query item, such as images with different angles, different perspectives, rotations, flips

Key Concepts

Data generation
Embedding generation

Similarity

Cosine
Euclidean

Data Size Reduction

Candidate Generation

Building a book Recommendation System using Keras

Collaborative Filtering Models

Ranking

Reranking

Good Read - System Design for Recommendations and Search

The tradeoff between batch vs realtime

What is computed prior?
What is used in real time to adjust prior recommendations?

Offline - creating embeddings for catalog items, and building an approximate nearest neighbors (ANN)
Online - converting the input item or search query into an embedding, followed by candidate retrieval and ranking

Let's explore how recsys & search are often split into:

• Latency-constrained online vs. less-demanding offline environments
• Fast but coarse candidate retrieval vs. slower and more precise ranking

Examples from Alibaba, Facebook, JD, DoorDash, etc.https://t.co/zTsfElLw1z
— Eugene Yan (@eugeneyan) June 30, 2021

Real time Recommendations

Transitioning to a real-time serving system has been made possible by two products: Feature Store and Online Inference Platform

Recommender Systems, Not Just Recommender Models

Blueprints for recommender system architectures: 10th anniversary edition

Real World Recommendation System – Part 1

Real World Recommendation Systems – Part 2 (Training Data Generation)

Real World Recommendation Systems – Part 3 (Modeling)

Keep Exploring!!!

July 19, 2022

Model drift examples

evidently - evaluate, test and monitor ML models in production

Detailed Examples - Link

Example notebooks - Link

Compare two distributions and measure / report

Key code snippets - Link

Duration of distributions

Generate both predictions

Comparisons

Keep Exploring!!!

Feast - Featurestores

Feast joins these table
Feast manages deployment to a variety of online stores

Colab Example

A feature repository consists of:
A collection of Python files containing feature declarations.
A feature_store.yaml file containing infrastructural configuration.

You define the features

It generates it and stores it as SQLite

I have a question, Can I do the same in SQL itself :)

Connect and query created features

Key Steps

Load Data - Supports multiple DB connectors
Define features yaml definitions
Save them in SQLite format
Consume in the model development

Tracking and managing features

s3 - Redshiftspectrum - Featurestores created
Redshift - Query Engine
Feature service - Map features to models

Keep Checking!!!

July 12, 2022

Bias and AI

Comparing Human and Machine Bias in Face Recognition

Key Notes

Disparities between groups of people based on perceived gender, skin type, lighting condition
poor light exposure, blurriness, facial obstruction

A survey on bias in visual datasets

Selection bias is the type of bias that “occurs when individuals or groups differ systematically from the population of interest
We refer to framing bias as any associations or disparities that can be used to convey different messages and/or that can be traced back to the way in which the visual content has been composed.
We define label bias as any errors in the labelling of visual data, with respect to some ground truth, or the use of poorly defined or inappropriate semantic categories.

Keep Checking!!!

Good Read Modern Data Stack vs Realistic cost friendly working architecture

Keep Thinking!!!

July 11, 2022

What does from_logits=True do in SparseCategoricalcrossEntropy loss function?

The from_logits=True attribute inform the loss function that the output values generated by the model are not normalized
In other words, the softmax function has not been applied on them to produce a probability distribution
Basically we need to softmax and pick max value or the max value from list is the prediction

Keep Learning!!!

July 10, 2022

Learning to know what's missing

Where I spent time

Computer Vision
Vision / Transformer Models
Forecasting Algos
Data / Stored procedures
Dockerization / Streamlit / Deployment

Where I know but spend less time as other folks take care :)

Elayra Pipelines
Ray optimization
Terraform
Kubeflow custom deployment
Ingress
Keycloak
Monitoring tools

It's a never-ending loop of tool learning!!!

July 08, 2022

Vision across domains - Tools vs Solutions

Solutions vs Familiarity vs Awareness vs Skills

You will observe a bunch of startups in each domain/category. Learning the tools vs solving a problem both require different levels of skills. Learn the tool, and master the problem, techniques, and alternatives/solutions.

Vision in healthcare - Xray, MRI, Scan images
Vision for BPO Automation - Automated invoice processing
Vision for Monitoring / Inspection - Quality Assessment
Vision for Safety - Industrial safety
Vision for Fruit Freshness - Automated pricing on quality

Keep Thinking!!!

July 07, 2022

Career in Data Science - Fresher / Lateral moves

A job is like onboarding and getting started, When you get into the train it doesn't matter it's unreserved / sleeper / ac. First, you need to get started. If you are starting your career in data, BI, reporting, SQL development. Everything connects to data science getting started.

A job is important, to begin with. Going towards a destination is as important as waiting for the best train.

Every job you are paid to solve problems, there is no good data, good upstream, all set and come and code. Build solutions with what you have matter than complaining there is no good data / work.

After a few years when you switch into Data Science, focus more on value addition. How your experience can help in data, domain, and selling. An experience transition should offer more than just a data science developer role.

You cannot learn a new skill every day. You can have primary skills, secondary skills. Collaborate with people, work on joint success. You can never be solo successful anywhere. Be a team player.

Mindset / Focused learning / Get over failures / Remain focused on value creation are key traits of career growth.

Keep Thinking!!!

How to write a short summary on a Topic ?

Forget what you have written, Evaluate your answer/perspective

What is the takeaway you want for the reader
What points will create interest
What 2-3 Buzzwords in it will create curiosity

Keep connecting to your readers!!!

Focus / Being Aligned

Follow through, Don't miss look at big picture
Fence your focus, weed your distraction
Cycle around your focus, keep learning, incremental additions
Discuss concerns / issues
Keep passion / interests always focused

Keep Exploring!!!

July 06, 2022

Being ahead of Learning curve

Keep a connecting line for every new concept / idea to the past lessons
Cycle the subject, again and again, Learning needs iterations
Awareness - Code - Practice - Do Often
Aware of trends / Competitors
Provide insights / inference / observations

Tech is interesting, Keep Exploring!!!

OpenCV Notes

Color channels

Hue: Measures the color of the pixel.
Saturation: Measures the intensity of color of the pixel.
Value: Measures the brighness of the pixel.

Hue range

Red (0-60)
Yellow (60-120)
Green (120-180)
Cyan (180-240)
Blue (240-300)
Magenta (300-360)

RGB Channels

(0,0,0) is a black color.
(255,0,0) is a pure red color.
(0,255,0) is a pure green color.

	#import the libraries
	import cv2 as cv
	import numpy as np

	#read the image
	img = cv.imread(r"E:\5.jpg")

	#convert the BGR image to HSV colour space
	hsv = cv.cvtColor(img, cv.COLOR_BGR2HSV)
	cv.imshow("hsv", hsv)
	cv.waitKey(0)

	hsv[:,:,2] = 200 # Changes the V value
	cv.imshow("hsv", hsv)
	cv.waitKey(0)

	rgbimg = cv.cvtColor(hsv, cv.COLOR_HSV2RGB)
	cv.imshow("rgbimg", rgbimg)
	cv.waitKey(0)

view raw hsvexample1.py hosted with ❤ by GitHub

Smoothing Functions

Image Smoothing - convolving the image with a low-pass filter kernel. It is useful for removing noise
Bilateral filter - replaces the intensity of each pixel with a weighted average of intensity values. cv.bilateralFilter() is highly effective in noise removal while keeping edges sharp

	#https://docs.opencv.org/4.x/d4/d13/tutorial_py_filtering.html

	import cv2 as cv
	import numpy as np
	from matplotlib import pyplot as plt
	img = cv.imread('opencv-logo-white.png')
	blur = cv.blur(img,(5,5))

	import cv2 as cv
	import numpy as np
	from matplotlib import pyplot as plt
	img = cv.imread('opencv-logo-white.png')
	blur = cv.bilateralFilter(img,9,75,75)

	#Adding (blending) two images using OpenCV
	#https://docs.opencv.org/3.4/d5/dc4/tutorial_adding_images.html

	src1 = cv.imread(cv.samples.findFile('LinuxLogo.jpg'))
	src2 = cv.imread(cv.samples.findFile('WindowsLogo.jpg'))
	beta = (1.0 - alpha)
	dst = cv.addWeighted(src1, alpha, src2, beta, 0.0)

view raw examplebookmarks.py hosted with ❤ by GitHub

Color Codes

Mask for custom color range

	#https://stackoverflow.com/questions/60152862/highlight-areas-with-different-colour-image-with-the-area-that-surround-them
	import cv2
	import numpy as np

	# Load image
	image = cv2.imread(r"E:\2.jpg")

	# Color threshold
	#(hMin = 0 , sMin = 0, vMin = 105), (hMax = 179 , sMax = 255, vMax = 255)

	original = image.copy()
	hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
	lower = np.array([0, 0, 105])
	upper = np.array([179, 255, 255])
	mask = cv2.inRange(hsv, lower, upper)
	result = cv2.bitwise_and(original,original,mask=mask)

	cv2.imshow('mask', mask)
	cv2.imshow('result', result)
	cv2.imshow('original', original)
	cv2.waitKey(0)
	cv2.destroyAllWindows()


	def nothing(x):
	pass

	def findranges():
	# Create a window
	cv2.namedWindow('image')

	# Create trackbars for color change
	# Hue is from 0-179 for Opencv
	cv2.createTrackbar('HMin', 'image', 0, 179, nothing)
	cv2.createTrackbar('SMin', 'image', 0, 255, nothing)
	cv2.createTrackbar('VMin', 'image', 0, 255, nothing)
	cv2.createTrackbar('HMax', 'image', 0, 179, nothing)
	cv2.createTrackbar('SMax', 'image', 0, 255, nothing)
	cv2.createTrackbar('VMax', 'image', 0, 255, nothing)

	# Set default value for Max HSV trackbars
	cv2.setTrackbarPos('HMax', 'image', 179)
	cv2.setTrackbarPos('SMax', 'image', 255)
	cv2.setTrackbarPos('VMax', 'image', 255)

	# Initialize HSV min/max values
	hMin = sMin = vMin = hMax = sMax = vMax = 0
	phMin = psMin = pvMin = phMax = psMax = pvMax = 0

	while(1):
	# Get current positions of all trackbars
	hMin = cv2.getTrackbarPos('HMin', 'image')
	sMin = cv2.getTrackbarPos('SMin', 'image')
	vMin = cv2.getTrackbarPos('VMin', 'image')
	hMax = cv2.getTrackbarPos('HMax', 'image')
	sMax = cv2.getTrackbarPos('SMax', 'image')
	vMax = cv2.getTrackbarPos('VMax', 'image')

	# Set minimum and maximum HSV values to display
	lower = np.array([hMin, sMin, vMin])
	upper = np.array([hMax, sMax, vMax])

	# Convert to HSV format and color threshold
	hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
	mask = cv2.inRange(hsv, lower, upper)
	result = cv2.bitwise_and(image, image, mask=mask)

	# Print if there is a change in HSV value
	if((phMin != hMin) \| (psMin != sMin) \| (pvMin != vMin) \| (phMax != hMax) \| (psMax != sMax) \| (pvMax != vMax) ):
	print("(hMin = %d , sMin = %d, vMin = %d), (hMax = %d , sMax = %d, vMax = %d)" % (hMin , sMin , vMin, hMax, sMax , vMax))
	phMin = hMin
	psMin = sMin
	pvMin = vMin
	phMax = hMax
	psMax = sMax
	pvMax = vMax

	# Display result image
	cv2.imshow('image', result)
	if cv2.waitKey(10) & 0xFF == ord('q'):
	break

	cv2.destroyAllWindows()

	findranges()

view raw hsvrangemaskexample.py hosted with ❤ by GitHub

	#https://gist.github.com/BIGBALLON/cb6ab73f6aaaa068ab6756611bb324b2

	from PIL import Image, ImageOps
	import os
	from collections import defaultdict
	import cv2

	class padimages:

	def padding(self,img, expected_size):
	desired_size = expected_size
	delta_width = desired_size - img.size[0]
	delta_height = desired_size - img.size[1]
	pad_width = delta_width // 2
	pad_height = delta_height // 2
	padding = (pad_width, pad_height, delta_width - pad_width, delta_height - pad_height)
	return ImageOps.expand(img, padding)

	def resize_with_padding(self,img, expected_size):
	delta_width = expected_size[0] - img.size[0]
	delta_height = expected_size[1] - img.size[1]
	pad_width = int(delta_width // 2)
	pad_height = int(delta_height // 2)
	padding = (pad_width, pad_height, delta_width - pad_width, delta_height - pad_height)
	return ImageOps.expand(img, padding)

	imgresizer = padimages()

	images_path = r'E:\Dataset_July_22\Input'
	results_path = r'E:\Dataset_July_22\Results'

	file_names=[]
	for (dirpath, dirnames, filenames) in os.walk(images_path):
	file_names.append(filenames)
	print(file_names)

	for file_nms in file_names:
	for file_name in file_nms:
	input_path = images_path + '/' + '/' + file_name
	result_path = results_path + '/' + '/Res_' + file_name
	print(input_path)
	img = Image.open(input_path)
	width, height = img.size

	maxval = width
	if height > width:
	maxval = height

	img = imgresizer.resize_with_padding(img, (maxval, maxval))
	img.save(result_path)

view raw imagepadding.py hosted with ❤ by GitHub

Keep Thinking!!!!

July 04, 2022

Myth of Data

I don't like my job
Data is insufficient
There are no good data science use cases

Learn all coding questions, Practice and learn all ML maths, Solve all kaggle problems, Land your dream job, the data you will face the same problem

Data is insufficient

What you see in learning / kaggle is not real-world data issues

Wherever you go, bad data and incomplete data only will be there.

Keep Thinking!!!

Recommendation Systems

Bookmark of Repos for my ongoing learning

Deep Learning based Recommender System: A Survey and New Perspectives

High level thoughts

Data connectors class
SVDclass - Dataload - Algorun - Validate - Results
NMFclass - Dataload - Algorun - Validate - Results
ItemItemrecomclass - Dataload - Algorun - Validate - Results
UserUserrecomclass - Dataload - Algorun - Validate - Results

Structuring Your Project

Keras Reads

July 5th Updates

Building and comparing recommendation systems to scale using scikit-surprise (surprise library)

Sampling Methods to speed up Clustering Algorithms

Recommender System - matrix factorization algorithm.

Recommendation system with Python using Movie Lens Data Set.

Softmax DNN for Recommendation

Colab: Build a Movie Recommendation System

Collaborative Filtering

Matrix Factorization

A Recurrent Neural Network Based Recommendation System

Keep Thinking!!!!

Text to Image generation

min(DALL·E)

Sample input

Output

DALL·E Mini

paper - Zero-Shot Text-to-Image Generation

Colab

A quick thread on "How DALL-E 2, Imagen and Parti Architectures Differ" with breakdown into comparable modules, annotated with size 🧵#dalle2 #imagen #parti

* figures taken from corresponding papers with slight modification
* parts used for training only are greyed out pic.twitter.com/9zsIUq3toU
— Rosanne Liu (@savvyRL) June 25, 2022

Keep Exploring!!!

July 03, 2022

Personalized - Redefined - Recommendations - Paid Service

What if we get a customized recommendation based on our needs than based on what we are forced to see.

Workday recommendations (Articles / Videos based on interests)

Data Science (20%)
Startups (20%)
Stock markets (20%)
Travel Vlog (10%)
Paranormal Vlog (5%)
Food Vlog(5%)
Emotions / Wellness (10%)

Weekend Recommendations

Travel Vlog (20%)
Paranormal Vlog (20%)
Food Vlog(10%)
Emotions / Wellness (20%)
Music (30%)

Tired of seeing the same swiggy ads, zomato ads, irrelevant to the context ads.

Ref - Link

Keep Thinking!!!

July 30, 2022

July 29, 2022

July 28, 2022

July 27, 2022

July 24, 2022

July 21, 2022

July 20, 2022

July 19, 2022

July 12, 2022

July 11, 2022

July 10, 2022

July 08, 2022

July 07, 2022

July 06, 2022

July 04, 2022

July 03, 2022

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts