Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): January 2020

January 25, 2020

Day #321 - Image Similarity Search

I am doing poor documentation of my code in my blog. I am trying to improve my documentation abilities. This blog is based on post

There are different key components involved in the similarity search
1. Data Collection
2. Generating Feature Vectors
3. Finding Similar matches based on different approaches

Step #1 - Dataset used for data collection is from Caltech 101 dataset. This contains several different classes.

Step #2 - Feature Vector generation is based on leveraging pre-trained networks. Use Existing pre-trained networks, removed the last layer. Retain the Feature vector, the flattened layer.

Step #3 - NearestNeighbors is the algorithm used to find nearest neighbors, The options are ‘ball_tree’, ‘kd_tree’, ‘brute’ will use a brute-force search, ‘auto’

Results

Live Video Analytics – the “killer app” for edge computing!

Happy Learning!!!

January 22, 2020

Seat and Location based Pricing in Bus :)

Single Sleeper price 750 (Better Privacy / No Sharing)

Second-row single berth 700 (Can Exit Early)

Middle row single berth 650 (Reduced Privileges compared to above two)

Similar to Flight Seat Pricing :)

Keep Thinking!!!

January 20, 2020

Catching up with Machine Learning for Experienced Folks

You have a lot of tutorials on the web. You can leverage anything that works for you,
The sequence I recommend is

Stats and Maths for Data Science (Stanford / Other Youtube classes)
Applications in Each Domain (Find applicable use cases for your domain)
Python (Youtube / Python Programming sites) – Data loading, analysis, join, filter etc…
ML Models (Linear, Logistics, Decision Trees, Random Forests) - Code up in python with datasets
Classification vs Regression - Code up in Python
Experiment Labelling, handling categorical data etc…
Take up one udemy course after all above list is done (December some offers may be there) 20$ courses…

The term Data Science claims to represent multiple collaborative disciplines, which have the sole purpose of extracting meanings from data regardless of their structured/informal nature. #Infographic by @LindaGrass0 @antgrasso #DataScience #Math #Data #AI #BigData pic.twitter.com/v3VRfJPcu6
— Antonio Grasso (@antgrasso) May 28, 2020

Keep Coding!!!

Quick ride prospects in 2020

Quick ride will take up rental cabs market in 2020, Below are compelling reasons for Quickride success

Quick ride is affordable (For me its 100/- quickride vs 400/- rental cab booking 1/4th of the cost)
Quick ride timing is predictable (Delay and timings are predictable, Predictably is much higher than Rental Cab booking platforms)
Quick ride has recurring booking options (From rides we make friends)
Quick ride makes you connect with like-minded tech folks, I found friends to mentor and learning from their experiences
No peak charges (your BP is in control) like Rental cab booking platforms
No cancellation charge
More of service and less of business (This is innovative approach and understands the market)
Verified and most users are similar IT workers in Bangalore
Risk is minimal ( drunken driving / overcrowded vehicles are not there in quickride), Well maintained vehicles as its owner managed
Uber pool and Ola share failed miserably. Rather quickride created a market
Cons - Some people make quickride a business with targets to make money
25% of market share in Bangalore of OLA / Uber will be takeup by Quickride in 2020 (My prediction)
Quick ride needs to expand to other regions/countries too

All the best Quickride team, Happy QuickRide!!!

January 18, 2020

ML thought for the day

Keep Thinking!!!

January 17, 2020

Jobs that would disappear before 2030 with AI

Security Jobs - We may not see security guards in Hotels, Apartments. The proportion would come down drastically. AI for Security, Surveillance real-time alert - Record and Playback will be gone soon. Now we have Edge Analytics which can act smarter, proactive.
Automated Tolls - Manual Tool Operators would no longer be needed for vehicle details entry. Everything would be automated with detection, deduction and regulate traffic
Maids - House cleaning robots, Pet management robots, Kids Monitoring robots, Elderly assistance with robots
Basic Health Data Analysis - X-Ray Diagnosis, Report Diagnosis would be done by the system based on parameters observed. They might directly convey their findings
Customer Service - All the customer service BPO jobs would be replaced by Chatbots, Massive jobs cuts would happen in Data Entry, Customer Service Space
Automated Loans Approval - Manual validations would be replaced with ML models which can provide recommendations for Loan approval
Drivers - With more and more technology becoming mature for Autonomous driving trucks, the numbers of jobs for high way truck drivers would proportionately come down
Vehicle Repair Mechanics - With more software-powered, renewable energy powered vehicles, we may go away from traditional fuel-powered vehicles
Cooks / House Keeping jobs - A lot of tools will come in place to assist automated cooking, automated vessels cleaning
Shopper Associates - AI will power restocking, self-checkout facilities

A lot of these jobs will disappear and a lot of people would end up jobless. How are we going to upskill, reskill, balance the economy matters

Keep Thinking!!!

January 16, 2020

My Favorite Retail Ideas in NRF 2020

1. Smart Mirror

The future of retail is powered by machine learning, natural language processing, computer vision, smart robotics), extended reality (AR/VR), IoT, 3D print, chatbots, digital commerce and voice mobile assistants. #NRF2020 pic.twitter.com/mUyCJ2n7nJ
— Vala Afshar (@ValaAfshar) January 12, 2020

2. Scan and go

The future of #retail: in and out of the store in 14 seconds! #NRF2020 #AI #ML #Automation #RetailTech #DigitalTransformation #IOT https://t.co/dKPe82ihVc @SpirosMargaris
@andi_staub
@sallyeaves
@HaroldSinnott
@kashthefuturist
@jblefevre60 pic.twitter.com/20awfFuwGy
— Automeme (@automeme) January 16, 2020

3. Vision Based Inventory Tracking

The implications for retail and warehousing using #robotics are astounding! https://t.co/mctm061vzs by @techcrunch #NRF2020 #Retail #Innovation#
— SymecTech (@SymecTech) January 16, 2020

4. Walmart AI Store Vision Powered

Reinventing the grocery store experience #Retail #RetailTech #Enterprise #Tech #IoT #AI #NRF2020 #EmergingTech #Innovation #Business #FutureOfWork #TechnologyNews #DigitalTransformation pic.twitter.com/Yf1wihaVx4
— Evan Kirstel (@evankirstel) January 14, 2020

5. Mobile based checkout

Hey #Retailers: 80% of shoppers believe the #customerexperience is more important than the product or services you provide #RetailTech #NRF2020 pic.twitter.com/1yFYv3K4N6
— Evan Kirstel (@evankirstel) January 14, 2020

6. Automated Checkout

Object recognition POS uses #AI for checkout scan - booth 4801 #nrf2020 pic.twitter.com/PepD0Dda9d
— NEC (@NEC) January 14, 2020

7. Shelf Edge Camera

Incorrect pricing at shelf means loyal shoppers are leaving stores frustrated and confused. Take a tour and catch the live demo at Trax booth 5264 to see how real-time alerts ensure optimal pricing compliance in-store. #NRF2020 pic.twitter.com/zdhsS9qexg
— Trax (@TraxRetail) January 12, 2020

Retail in 2025

Every Retail Store is an Ecommerce Store
Every Offline Retail Store is a Fulfillment Centre
Every Offline Retail Store is a Warehouse
Every Offline Retail Store is a Returns Centre
Every Store will have Sensors, AR Experience, Personalized Experience
Specialty Stores, Private labels will be key
A lot of mobile-driven interfaces to search/try/buy

Happy Learning!!!

January 15, 2020

Day #320 - Preprocessing Examples

Happy Learning!!!

January 14, 2020

Day #319 - Data Story - Datavision - AI Use cases

Happy Learning!!!

Day #318 - Image Comparison Techniques

Image is a set of points (vectors). We could use different techniques for image similarity comparison. Listed below some of out of box techniques. Dlib based comparison was earlier discussed.

Haar, Extract Region

Comparison Techniques

Euclidean distance
Cosine distance
Hamming distance
Jaccard-Needham dissimilarity

Ref - Link

SSIM Approach Technique of structural similarity

Happy Learning!!!

January 13, 2020

Day #317 - Ensemble Methods

Summary of Ensemble Techniques, Bagging, Boosting code snippets

Happy Learning!!!

Day #316 - SVM Classification Examples

Happy Learning!!!

Day #315 - ML Notes - Regression

L1 (Lasso) can shrink some coefficients to zero
L2 (Ridge) shrinks all the coefficient by the same proportions but eliminates none. L2 does square a number punishes large values more than it punishes small values.

I am bad at reading. I skip content and directly focus only on what I am trying to solve. I am going to go through sci-kit documentation and try all the code snippets.
Happy Learning!!!

Concepts - WeightofEvidence, Information Value

While checking on FinTech ML projects I came across these two concepts WeightofEvidence, Information Value. I found this link intuitive and understanding.

Basically when we bucketize, within each range of buckets we can, in turn, sub-divide the other factors based on distribution. In Retail Scenario

Customers Age Group (20-30, 30-40, 40-50). Within each bucket, we can find the percentage of fraudulent customers. It may be

20-30 - 4% fradulent
30-40 - 2.5%
40-50 - 1%

This technique helps to assign possible values and decide their impact. This is my understanding. We can also infer the same based on data analysis and distribution percentages across different classes.

More Reads - Link1, Link2, Link3, Link4

Happy Learning!!!

Data-flow -> Knowledge-flow -> Future Prospects

Data to Datalake
Datalake to Collective DataInsights
DataInsights to Features
Features to Models
Models to Predictions
Predictions to Preparedness

Data -> Insights -> Predictions

Happy Learning!!!

January 11, 2020

Decode your personality - Social Media Unplugged

Google - will you everything about you. Regular search, restricted search, places visited
Gmail - Your friends and their locations, Your communication and type of person you are
Facebook - What is your social circle, your financial status from the place you live and place you work
Linkedin - Your average compensation can be estimated with your education, company and years of experience
Whatsapp - All your mood swings, emotional discussions and your relationship with the outside world
Mobile Number - Your average calls, the number of contacts in friends/blood relationships
Bank Account - Spending patterns, places you visit. Average expenses across food/shopping/travel
Uber - Where you traveled, What was your pattern for the last month

China is waking up to data protection and privacy. Here's why that matters https://t.co/Pssem2VcXW pic.twitter.com/7Rwtmbapls
— World Economic Forum (@wef) January 14, 2020

We do not have ownership of your data shared. There is no expiry date for the data collected. This data is good enough to decode a person.

Google Sued for Secretly Amassing Vast Trove of User Data

Consumer browsing history
Web activity data
Invasion of privacy and violations
Storing geolocation data with its mobile apps

Today Attended Aljazeera discussion on Cambridge Analytica Scandal. (April 8th)

Founder - BRITTANY KAISER of ownyourdata

87 million Profiles were screened in facebook
Data science to classify/label possible persuaders
Leverage all their behavioral data
Identify their interests (Climate change, national security, Refugee issues) based on their Facebook groups/feeds
Targeted Ads and convert them into positive voters / Compromise Integrity / Privacy

Sounds Scary :( :( Manipulating data

Tech Talk Link

What Swiggy knows about you?

Single vs Family, Brand Centric, More than what I could think of :(

What a Better Social Network Would Look Like

Make social networks nonprofits
Ban algorithmic amplification
Restrict personal data collection and behavioral advertising
Let a field of smaller social networks bloom
Stop putting white men in charge

New Age Social Media - My Perspective
Social media has to be more realistic to reflect on our life. Our daily thoughts only our first level circle immediate relationships need to know. Our social thoughts or tagged social should be reflected in the second-level circle. About education, professional it needs to reflect in the connected circles
Everything needs to be within the limits to avoid information overload. Today data is business and connections are the business value of the individual. In the long run, both businesses and consumers will lose value. Companies focused on rapid market share without ethical values will end up creating zombies than responsible citizens.

Awful AI Projects - Link

Keep Thinking!!!

January 10, 2020

Model Documentation and Coding Guidelines - Python

This paper was very useful. This covers Data Source, Purpose, Model Accuracy, Recommendations. The key metrics (Screenshot from the paper)

Structuring Machine Learning Projects

ML Experiment Parameters

Model Parameters
Learning Rate
Number of Epochs Run
Training Loss
Validation Loss
CPU %%
Memory %%
Disk usage

Happy Learning!!!

January 09, 2020

Day #314 - Dynamic Taxing using AI

Flat tax slabs and income tax rates are the procedures followed today. With the digital economy, extensive data collected/monitored, new metrics we need to consider dynamic taxing based on several parameters. These parameters need to be chosen from the short term / long term perspective. Depending upon industry prospects/growth / economic factors these values can be adjusted for predicting/recommending tax. We need to collect several parameters to Predict taxing numbers.

Some of the parameters we can leverage are

Manpower / Natural resources
Renewable energy sources
Direct skilled employment
Indirect employment
Contribution to Innovation
Technology adoption / Sharing for partners
Contribution for long term growth / Longevity of the business /company
Contribution to Education / R & D / IP
Greenhouse impact
Sector score
Taxing based on domain/industry
Profit margins, Balance Sheets, Supplier Balance Sheets / Company global profit margins
%% of revenue saved with Automation / Robots
%% of materials sourced/imported
Export value / quantity of items / demand
Measure and change dynamically after ever quarter

Data needs to drive the decisions. We need to be more dynamic towards tax by understanding demand/market conditions/growth prospects/ecology and economic impact. To develop sustainable growth it needs to focus on both short term and long term benefits.

Ref - Link

Keep Thinking!!!

January 08, 2020

Day #313 - Data science use cases solved in Indian Startups

I have personally trained/connected to several Indian Startups to understand their AI use cases with respect to their domain/businesses they operate. Some of the key use cases I observed are

Fintech - OCR, banking statements, documents data extraction (Computer Vision)
Textile - Similarity of items, thickness (Computer Vision)
B2B - Shop verification, re-identification (Computer Vision), Item Level Forecasting (Data)
Agritech - Ripening, Fruit quality assessment (Computer Vision)
Retail - Device failure predictions, IOT based predictions (Data)
SalesData - Sales Analytics, Cross-selling, Upselling (Data)

Kaggle was one approach to solve. I always start from domain -> data -> use cases. This learning helped me to map AI challenges / applications across domains.

This is the trend I observed in the past few years. I am hoping to train/connect more startups across other domains. Feel free to connect with me for any AI training requirements/discussions.

Happy Learning!!!

January 07, 2020

Data Insights for HelloFresh

Menu Insights

Peak seller's
Weekday trends
Top trends based on seasonality
Review based listings

Customer Insights

Top customers
Age Groups
By Gender
Average revenue per customer by age group
Recurring customer patterns
Food quality issues / incidents / patterns

Demand Insights

Locality vs Demand
Maximum Ordered Items
Peak times
Lean times
Weekend patterns
Weekday patterns
Peak hour trends

Delivery Insights

Transportation cost / time
Maximum Traffic Delay Areas
Incidents / Damages

Basic ML Use cases

Forecast on volumes of items based on historical data
Segmenting customers based on Age / Gender / Veg / Non-Veg / Cusine Choices and providing recommendations
Forecast Order Volumes and assign Delivery partners based on Projected numbers to reduce other delays
Recommending a similar item every day from other restaurants based on historical data
Balanced diet customized to need /preferences based on user choices for a week

Happy Learning!!!

Day #312 - AI for hospitals - Children's Hospitals

Medical is very interesting. I haven't connected with Hospitals. It's a very niche domain. The process again is similar to other areas like Data Collection -> Insights -> BI -> AI. A high-level overview which you can consider doing it.

Data Collection

Overview of Data Collected from Mothers, Newborns
Overview of Symptoms / Medications
Overview of Sequence of Medications / Side-Effects
Positive / Negative Cases
Finance / Insurance Related Insights

Analytics Insights (BI)

Most Observed Issues
Rarely Observed Issues
Trend of Admissions / Patterns across Cities
Financial Insights / Insurance Related
Correlation to Past Medical History to Complications

AI Use Cases

Patient historical data-based risk predictions
Monitoring for new-borns and proactive alerts
Tie-up with AI Companies for post pregnancy monitoring and alert of Kids (https://www.loveys.io/)
Use the Same Tech (https://www.loveys.io/) to monitor patients
Feedback / Sentiment Analysis from Clients

Happy Learning!!!

January 06, 2020

Day #311 - Key Notes for Airbnb ML talks

ML Use cases

Search Ranking
Smart Pricing (Demand Vs Supply)
Fraud Detection (Risk Scoring)

Link1 - Forecasting Uncertainty at Airbnb - Theresa Johnson (Airbnb)

Insights will tell how it works, Model the business based on data flow/decisions involved.

See the problems as both

Demand Problem
Supply Problem

Loved the pool cleaner theory

Demand Problem - Population, Household count, Number of cleanings
Supply Problem - Number of cleaners, Available hours

Link2 - ML Airbnb

Key Lessons
ML tool for customized prices for each night instead of flat rates. The classic combination of demand, supply, peak pricing

Link3 - Bighead: Airbnb's end-to-end Machine Learning Platform | Airbnb

Design Goals

Seamless (Easy to prototype)
Versatile (All framework support)
Consistent Environments

Tools - Redspot, BigHead, MLAutomator, Deep Thought

Slides - Link

Advanced ML Use Cases

Categorizing Listings
Experience Ranking
Object Detection
Customer Service Ticket Routing

Happy Learning!!!

January 01, 2020

Day #310 - Handle Data Imbalance, Missing Data

This post is about

Handling blank values
Handling missing values
Handling data imbalance

My request to readers, If you find this code snippets, blogs, articles helpful, please share your learning with others. We can only grow only by learning and teaching.

Happy Learning!!!

Data Science Experiment - Milk Adulteration

Data - Link

Happy Learning!!!

Stats Lessons

Happy Learning!!!

NOSQL Internals and Design Practices

Objective – The objective of this paper is to analyze NoSQL internals from RDBMS developer perspective and provide design guidelines for NoSQL Applications
Analysis
RDBMS – RDBMS came into the picture to ensure the ACID properties are maintained and there is a single version of the truth. RDBMS plays a critical role in OLTP applications (Banking, Finance, and Payment) domains.
Database design– Database design is implemented to ensure it's normalized and avoid data redundancy. Primary Keys, Indexes are created to ensure query plans use the indexes to filter required rows and fetch required results within the shortest intervals.

Query Execution – Data is typically stored in a B-Tree format. The data is organized physically in the form of clustered indexes. This is the reason search based on the primary key is quick compared to any other non-indexed columns. Database Engine implements several other operations to optimize the execution plan by leveraging indexes, statistics, and partitioning, Non-clustered indexes. Depending on the query plan join operators, sort operators are applied to produce the execution plan. The execution plan is reused if it already exists in memory.

This paper was very useful to understand OLTP Internals. Reposting notes from my blog post

WAL – Changes are written in log and committed to disk when the checkpoint is reached
Buffer Manager – cache for data fetched / recently used
Two-Phase locking – Optimistic/pessimistic locking depending on isolation levels
Concurrency control – Based on isolation levels

NoSQL Databases

Similar to above OLTP aspects, There are few papers that describe designing NOSQL apps for Read heavy / Write Heavy Apps. This paper was very useful to understand NoSQL perspective of designing apps in columnar databases

For Heavy Writes

Tall Skinny Tables
Consolidate data into single columns

For Heavy Reads

Fewer column families
Use bloom filters

There are multiple NoSQL databases (Key-Value, Document-based, Columnar Databases, etc...).

Happy Learning!!!

Day #309 - Handle Categorical Columns

Have a Great, Peaceful and Successful 2020
This post is on Handling Categorical Columns
Happy Learning!!!

January 25, 2020

January 22, 2020

January 20, 2020

January 18, 2020

January 17, 2020

January 16, 2020

January 15, 2020

January 14, 2020

January 13, 2020

January 11, 2020

January 10, 2020

January 09, 2020

January 08, 2020

January 07, 2020

January 06, 2020

January 01, 2020

About Me

What is your Expertise

Search This Blog

Git Code Repository

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts