"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

September 24, 2019

AI bubble burst

Good one, Mar 2019 article but still relevant - AI Read
Key Points
  • “In 40 percent of cases, we could find no mention of evidence of AI"
  • “Companies that people assume and think are #AI companies are probably not.”
  • "AI catnip to investors"
Everything starts with Data, Domain, and Value-driven use cases. AI comes after good Domain and Data Knowledge. AI is a value addition on top of the existing insights. 
  • Today - How is my business doing (Transactional Reports)
  • How was my product sale X for last 3 months (BI Reports)
  • What sells together with X (Apriori /Recommendation)
  • How much I can forecast to purchase those items together based on previous historical data (Forecasting)
Everything needs to sum up to align with Data Story, Value addition to business. Without right use cases/ quality data, whatever we demo with sample data /deep networks, we will never succeed in the real world #intelligence #artificialintelligence #perspectives. Cool AI demos != Cool Real-world Implementation.

Happy Thinking!!!

September 20, 2019

Day #275 - When will #deeplearning finally die out?

Interesting answer in quora for question - "When will #deeplearning finally die out?" by Sridhar Mahadevan

Key points - Limitations/ concerns pointed out  
  • Key point #1 - Minimizing error over a #training set, no matter how large, is not enough to solve the AI problem;
  • Key point #2 - The true test of a scientific theory is not its accuracy at making predictions over some #fixeddataset, but the level of #insight it gives us into a problem
  • Key point #3 - Human vision is far more complex than Imagenet benchmarking #datasets set of images.
#DeepLearning has a lot of proven success in Vision, Text, Numbers game. This article questions some of the #assumptions.

#AI evolving. This article gives perspective towards reality vs hype,  understand the #possibilities / #limitations / #challenges of #DeepLearning.


Happy Learning!!!

September 17, 2019

Dlib on Windows 64 bit Platform

1. Download and Install cmake from https://cmake.org/download/ (Windows 64 bit download)
2. From anancoda prompt - pip install cmake
3. Install Dlib - pip install dlib

Happy Learning!!!

September 15, 2019

The Business of Facebook


When we think everything in terms of Data. We would need to look at Data Goals, Business Goals, the story behind data.

High-Level Aggregated Metrics - What would be the data metrics in Facebook
  • Number of new users added
  • Number of new places added
  • Number of pics uploaded, Growth of data video, pics
  • Number of conversations for new users
  • Average time spent by users across ages, professions, country, ethnicity
  • Most viewed content across age groups
  • Most viewed contents across domains / Professions
  • Number of new advertisers added
  • Sale / Conversion / Influencers 
Individual User Specific Information
  • Your likes/dislikes
  • Your background, education, ethnicity
  • Behavioral traits
  • Your app usage 
Unconsciously we do reflect a lot of our interests in social media. This makes us more aligned/biased if we are suggested based on our historical data. Always keep changing your patterns of thinking, learning, experimenting. Instead of Facebook, I prefer to chose to share things in my own private blog. I don't believe pics reflect the emotions of a person. I remember the lines in Macbeth "Fair is Foul, Foul is Fair"

Your bits and bytes are tracked and recommendations are given. Always timeslice your tasks. We are here only for a limited time. Sometimes we need to create those moments by experimenting, taking our learning to the next level. This Life is precious to get carried away with selfies and pics. Look beyond and find what we can do, learn to make the world a better place.

September 14, 2019

New Age BI = Kimball + Big Data for unstructured data + AI capabilities

Kimball and Inmon are driven off of structuring data

Inmon
  • Bill Inmon is centralized DW proponent
  • Inmon defines data warehouse as a centralized repository for the entire enterprise
  • Data warehouse is at the center of the Corporate Information Factory (CIF)
Kimball
  • Kimball defines data warehouse as “A copy of transaction data specifically structured for query and analysis”
  • Kimball kept getting more correct due to global corporations and their need for distributed DW
  • Kimball is the distributed Data Marts proponent.
  • Kimball defines business processes quite broadly.
Current Status
  • Both those concepts are currently changing based with BigData, Inmem Processing, Columnar
  • databases and machine intelligence
  • We need both tactical plus bigdata plus analytics
Today BI = DW + AI capabilities + Big Data
  • BI & Analytics is spread across multiple components; we cannot invest in centralized DW as systems need to be agile enough to accommodate changing data needs
  • Knowledge discovering is an ongoing process
  • Data is structured, unstructured. Strategically data is changing.
  • Making it smaller datamarts is more manageable
Kimball + Big Data for unstructured data + AI capabilities = New Age BI

The goal of analytics is to move away from historical to real-time recommendations

New Age #BI = #Kimball + #BigData for unstructured data + #AI capabilities
  • #Kimbal - Build Datamarts to slide and dice through data
  • #BigData - Find your digital footprint, Look at realtime trends/sentiments
  • #AI - Use AI to find insights from your customer, users, demands, patterns
Happy Learning!!!

September 04, 2019

Day #274 - Re_id Notes from papers / Analysis - Reidentification of person from historical data

Approach
  • Extract Features
  • Cluster to find similar faces
  • Approximate k-NN search
Survey on Deep Learning Techniques for Person Re-Identification
Classification Model
  • Using SIFT, Color Histograms
  • Determining the individual identity (aka class)
  • Image Categorization by Age / Gender and Search
Siamese Network 
  • Learning a similarity function, which takes two images as input and expresses how similar they are.
  • Triplet Siamese model, Pairwise Model
  • Triplet models - The triplet loss function takes face encoding of three images anchor, positive and negative.  Here anchor and positive are the images of same person whereas negative is the image of a different person
Face Search at Scale: 80 Million Gallery
Key Points
  • Represent objects with feature vectors 
  • Employ an indexing or approximate search scheme in the feature space
Performance-oriented Design
  • Fast filtering step (Approximate k-NN search)
  • Re-ranking step (K Candidates Deep Feature Similarity)
Using Siamese Networks - Retail Use Cases
  • Scenario #1 – Find a person in Camera1 and Find him across all other cameras
  • Scenario #2 – Find a person at Entrance and Track him across in-store video
  • Scenario #3 – Retrain this for every +/- 10 minutes, Dynamically Track for every single customer, Retrain as Class – Query Image Scenario
Happy Learning!!!

August 31, 2019

Day #273 - Learnings from Program Errors - face_recognition

Lesson #1 - axis 1 is out of bounds for array of dimension 1
Working on Face Comparisons, Ended up error using face_recognition.compare_faces method.

This link was useful https://github.com/ageitgey/face_recognition/issues/328

bad one: face_recognition.compare_faces(encoding,person_encoding):
good one: face_recognition.compare_faces([encoding],person_encoding)

Lesson #2 - “RuntimeError: dictionary changed size during iteration” error?
Reason - Modified the dictionary while iterating the elements
Fix - Fixed it by modifying it outside the loop

A lot of simple design choices, end to end thinking keeps changing while building the solution

Datasets - Link

Happy Learning!!!

August 30, 2019

Day #272 - Clustering to find Data Insights

For different kinds of data, we need to pick the right columns to find the right insights. Most of them can be picked up with domain knowledge, Clustering perspective analysis.
  • For Sales Data, Clustering intent was to find sales insights. Data Sources were CustomerId, NumberOfOrders, TotalOrderValue. Clustered the same to find order value buckets. High Value, Medium, Low Value.
  • For Loss in Retail Store, Clustering insights to find loss patterns. Data Sources were SkuId, LossCount, LossValue. Clustered them to High Loss, Medium, Low Loss buckets.
This helps to address the key loss items and focus on proactive measures to prevent further loss. After a long time picked up R. Forgot execution command Ctrl+A, Ctrl+Enter. Every editor, tools, language have their own patterns, formats of coding.

This is the same as 'Recency, Frequency and Monetary value'. I came to know about this today (7/6/2020). Sometimes what you already implemented might be already done in some pattern :)

Recency, Frequency, Monetary Model with Python — and how Sephora uses it to optimize their Google and Facebook Ads
Find Your Best Customers with Customer Segmentation in Python
Introduction to Customer Segmentation in Python

Happy Learning!!!

August 29, 2019

Day #271 - FAISS Experiments

Examples on Basic indexes, Binary indexes, Composite indexes in FAISS
  • Exact Results - FlatIndex - index that can guarantee exact results. Types - IndexFlatL2 or IndexFlatIP
  • Somewhat Match - Clustering, then store in buckets "Flat"
  • conda Install on ubuntu. Version that works is - conda install faiss-cpu=1.5.1 -c pytorch -y
  • A similar project for store, fetch data - https://github.com/waltyou/faiss-web-service 
More Reads Link1, Link2

Happy Learning!!!