Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

September 24, 2019

AI bubble burst

Good one, Mar 2019 article but still relevant - AI Read
Key Points

“In 40 percent of cases, we could find no mention of evidence of AI"
“Companies that people assume and think are #AI companies are probably not.”
"AI catnip to investors"

Everything starts with Data, Domain, and Value-driven use cases. AI comes after good Domain and Data Knowledge. AI is a value addition on top of the existing insights.

Today - How is my business doing (Transactional Reports)
How was my product sale X for last 3 months (BI Reports)
What sells together with X (Apriori /Recommendation)
How much I can forecast to purchase those items together based on previous historical data (Forecasting)

Everything needs to sum up to align with Data Story, Value addition to business. Without right use cases/ quality data, whatever we demo with sample data /deep networks, we will never succeed in the real world #intelligence #artificialintelligence #perspectives. Cool AI demos != Cool Real-world Implementation.

Happy Thinking!!!

September 20, 2019

Day #275 - When will #deeplearning finally die out?

Interesting answer in quora for question - "When will #deeplearning finally die out?" by Sridhar Mahadevan

Key points - Limitations/ concerns pointed out

Key point #1 - Minimizing error over a #training set, no matter how large, is not enough to solve the AI problem;
Key point #2 - The true test of a scientific theory is not its accuracy at making predictions over some #fixeddataset, but the level of #insight it gives us into a problem
Key point #3 - Human vision is far more complex than Imagenet benchmarking #datasets set of images.

#DeepLearning has a lot of proven success in Vision, Text, Numbers game. This article questions some of the #assumptions.

#AI evolving. This article gives perspective towards reality vs hype, understand the #possibilities / #limitations / #challenges of #DeepLearning.

Read Sridhar Mahadevan's answer to When will deep learning finally die out? on Quora

Happy Learning!!!

September 17, 2019

Dlib on Windows 64 bit Platform

1. Download and Install cmake from https://cmake.org/download/ (Windows 64 bit download)
2. From anancoda prompt - pip install cmake
3. Install Dlib - pip install dlib

Happy Learning!!!

September 15, 2019

The Business of Facebook

When we think everything in terms of Data. We would need to look at Data Goals, Business Goals, the story behind data.

High-Level Aggregated Metrics - What would be the data metrics in Facebook

Number of new users added
Number of new places added
Number of pics uploaded, Growth of data video, pics
Number of conversations for new users
Average time spent by users across ages, professions, country, ethnicity
Most viewed content across age groups
Most viewed contents across domains / Professions
Number of new advertisers added
Sale / Conversion / Influencers

Individual User Specific Information

Your likes/dislikes
Your background, education, ethnicity
Behavioral traits
Your app usage

Unconsciously we do reflect a lot of our interests in social media. This makes us more aligned/biased if we are suggested based on our historical data. Always keep changing your patterns of thinking, learning, experimenting. Instead of Facebook, I prefer to chose to share things in my own private blog. I don't believe pics reflect the emotions of a person. I remember the lines in Macbeth "Fair is Foul, Foul is Fair"

Your bits and bytes are tracked and recommendations are given. Always timeslice your tasks. We are here only for a limited time. Sometimes we need to create those moments by experimenting, taking our learning to the next level. This Life is precious to get carried away with selfies and pics. Look beyond and find what we can do, learn to make the world a better place.

More Reads - How Facebook got addicted to spreading misinformation

Happy Learning!!!

September 14, 2019

New Age BI = Kimball + Big Data for unstructured data + AI capabilities

Kimball and Inmon are driven off of structuring data

Inmon

Bill Inmon is centralized DW proponent
Inmon defines data warehouse as a centralized repository for the entire enterprise
Data warehouse is at the center of the Corporate Information Factory (CIF)

Kimball

Kimball defines data warehouse as “A copy of transaction data specifically structured for query and analysis”
Kimball kept getting more correct due to global corporations and their need for distributed DW
Kimball is the distributed Data Marts proponent.
Kimball defines business processes quite broadly.

Current Status

Both those concepts are currently changing based with BigData, Inmem Processing, Columnar
databases and machine intelligence
We need both tactical plus bigdata plus analytics

Today BI = DW + AI capabilities + Big Data

BI & Analytics is spread across multiple components; we cannot invest in centralized DW as systems need to be agile enough to accommodate changing data needs
Knowledge discovering is an ongoing process
Data is structured, unstructured. Strategically data is changing.
Making it smaller datamarts is more manageable

Kimball + Big Data for unstructured data + AI capabilities = New Age BI

The goal of analytics is to move away from historical to real-time recommendations

New Age #BI = #Kimball + #BigData for unstructured data + #AI capabilities

#Kimbal - Build Datamarts to slide and dice through data
#BigData - Find your digital footprint, Look at realtime trends/sentiments
#AI - Use AI to find insights from your customer, users, demands, patterns

Happy Learning!!!

September 04, 2019

Day #274 - Re_id Notes from papers / Analysis - Reidentification of person from historical data

Approach

Extract Features
Cluster to find similar faces
Approximate k-NN search

Survey on Deep Learning Techniques for Person Re-Identification
Classification Model

Using SIFT, Color Histograms
Determining the individual identity (aka class)
Image Categorization by Age / Gender and Search

Siamese Network

Learning a similarity function, which takes two images as input and expresses how similar they are.
Triplet Siamese model, Pairwise Model
Triplet models - The triplet loss function takes face encoding of three images anchor, positive and negative. Here anchor and positive are the images of same person whereas negative is the image of a different person

Face Search at Scale: 80 Million Gallery
Key Points

Represent objects with feature vectors
Employ an indexing or approximate search scheme in the feature space

Performance-oriented Design

Fast filtering step (Approximate k-NN search)
Re-ranking step (K Candidates Deep Feature Similarity)

Using Siamese Networks - Retail Use Cases

Scenario #1 – Find a person in Camera1 and Find him across all other cameras
Scenario #2 – Find a person at Entrance and Track him across in-store video
Scenario #3 – Retrain this for every +/- 10 minutes, Dynamically Track for every single customer, Retrain as Class – Query Image Scenario

Happy Learning!!!

August 31, 2019

Day #273 - Learnings from Program Errors - face_recognition

Lesson #1 - axis 1 is out of bounds for array of dimension 1
Working on Face Comparisons, Ended up error using face_recognition.compare_faces method.

This link was useful https://github.com/ageitgey/face_recognition/issues/328

bad one: face_recognition.compare_faces(encoding,person_encoding):
good one: face_recognition.compare_faces([encoding],person_encoding)

Lesson #2 - “RuntimeError: dictionary changed size during iteration” error?
Reason - Modified the dictionary while iterating the elements
Fix - Fixed it by modifying it outside the loop

A lot of simple design choices, end to end thinking keeps changing while building the solution

Datasets - Link

Happy Learning!!!

August 30, 2019

Day #272 - Clustering to find Data Insights

For different kinds of data, we need to pick the right columns to find the right insights. Most of them can be picked up with domain knowledge, Clustering perspective analysis.

For Sales Data, Clustering intent was to find sales insights. Data Sources were CustomerId, NumberOfOrders, TotalOrderValue. Clustered the same to find order value buckets. High Value, Medium, Low Value.
For Loss in Retail Store, Clustering insights to find loss patterns. Data Sources were SkuId, LossCount, LossValue. Clustered them to High Loss, Medium, Low Loss buckets.

This helps to address the key loss items and focus on proactive measures to prevent further loss. After a long time picked up R. Forgot execution command Ctrl+A, Ctrl+Enter. Every editor, tools, language have their own patterns, formats of coding.

This is the same as 'Recency, Frequency and Monetary value'. I came to know about this today (7/6/2020). Sometimes what you already implemented might be already done in some pattern :)

Recency, Frequency, Monetary Model with Python — and how Sephora uses it to optimize their Google and Facebook Ads
Find Your Best Customers with Customer Segmentation in Python
Introduction to Customer Segmentation in Python

Happy Learning!!!

August 29, 2019

Day #271 - FAISS Experiments

Examples on Basic indexes, Binary indexes, Composite indexes in FAISS

Exact Results - FlatIndex - index that can guarantee exact results. Types - IndexFlatL2 or IndexFlatIP
Somewhat Match - Clustering, then store in buckets "Flat"
conda Install on ubuntu. Version that works is - conda install faiss-cpu=1.5.1 -c pytorch -y
A similar project for store, fetch data - https://github.com/waltyou/faiss-web-service

August 28, 2019

Day #270 - Installing FAISS on Ubuntu

Happy Learning!!!

September 24, 2019

September 20, 2019

September 17, 2019

September 15, 2019

September 14, 2019

September 04, 2019

August 31, 2019

August 30, 2019

August 29, 2019

August 28, 2019

About Me

What is your Expertise

Search This Blog

Git Code Repository

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts