"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

January 25, 2020

Day #321 - Image Similarity Search

I am doing poor documentation of my code in my blog. I am trying to improve my documentation abilities. This blog is based on post

There are different key components involved in the similarity search
1. Data Collection
2. Generating Feature Vectors
3. Finding Similar matches based on different approaches

Step #1 - Dataset used for data collection is from Caltech 101 dataset. This contains several different classes.

Step #2 - Feature Vector generation is based on leveraging pre-trained networks. Use Existing pre-trained networks, removed the last layer. Retain the Feature vector, the flattened layer.

Step #3 - NearestNeighbors is the algorithm used to find nearest neighbors, The options are ‘ball_tree’, ‘kd_tree’, ‘brute’ will use a brute-force search, ‘auto’

Results 




Happy Learning!!!

January 22, 2020

Seat and Location based Pricing in Bus :)






  • Single Sleeper price 750 (Better Privacy / No Sharing)
  • Second-row single berth 700 (Can Exit Early)
  • Middle row single berth 650 (Reduced Privileges compared to above two)

  • Similar to Flight Seat Pricing :)

    Keep Thinking!!!

    January 20, 2020

    Catching up with Machine Learning for Experienced Folks

    You have a lot of tutorials on the web. You can leverage anything that works for you,
    The sequence I recommend is
    • Stats and Maths for Data Science (Stanford / Other Youtube classes)
    • Applications in Each Domain (Find applicable use cases for your domain)
    • Python (Youtube / Python Programming sites) – Data loading, analysis, join, filter etc…
    • ML Models (Linear, Logistics, Decision Trees, Random Forests) - Code up in python with datasets
    • Classification vs Regression - Code up in Python
    • Experiment Labelling, handling categorical data etc…
    • Take up one udemy course after all above list is done (December some offers may be there) 20$ courses…
    Keep Coding!!!

    Quick ride prospects in 2020

    Quick ride will take up rental cabs market in 2020, Below are compelling reasons for Quickride success
    1. Quick ride is affordable (For me its 100/- quickride vs 400/- rental cab booking 1/4th of the cost)
    2. Quick ride timing is predictable (Delay and timings are predictable, Predictably is much higher than Rental Cab booking platforms)
    3. Quick ride has recurring booking options (From rides we make friends)
    4. Quick ride makes you connect with like-minded tech folks, I found friends to mentor and learning from their experiences
    5. No peak charges (your BP is in control) like Rental cab booking platforms
    6. No cancellation charge 
    7. More of service and less of business (This is innovative approach and understands the market)
    8. Verified and most users are similar IT workers in Bangalore
    9. Risk is minimal ( drunken driving / overcrowded vehicles are not there in quickride), Well maintained vehicles as its owner managed
    10. Uber pool and Ola share failed miserably. Rather quickride created a market
    11. Cons - Some people make quickride a business with targets to make money
    12. 25% of market share in Bangalore of OLA / Uber will be takeup by Quickride in 2020 (My prediction)
    13. Quick ride needs to expand to other regions/countries too
    All the best Quickride team, Happy QuickRide!!!


    January 18, 2020

    January 17, 2020

    Jobs that would disappear before 2030 with AI

    • Security Jobs - We may not see security guards in Hotels, Apartments. The proportion would come down drastically. AI for Security, Surveillance real-time alert - Record and Playback will be gone soon. Now we have Edge Analytics which can act smarter, proactive. 
    • Automated Tolls - Manual Tool Operators would no longer be needed for vehicle details entry. Everything would be automated with detection, deduction and regulate traffic
    • Maids - House cleaning robots, Pet management robots, Kids Monitoring robots, Elderly assistance with robots
    • Basic Health Data Analysis - X-Ray Diagnosis, Report Diagnosis would be done by the system based on parameters observed. They might directly convey their findings
    • Customer Service - All the customer service BPO jobs would be replaced by Chatbots, Massive jobs cuts would happen in Data Entry, Customer Service Space
    • Automated Loans Approval - Manual validations would be replaced with ML models which can provide recommendations for Loan approval
    • Drivers - With more and more technology becoming mature for Autonomous driving trucks, the numbers of jobs for high way truck drivers would proportionately come down
    • Vehicle Repair Mechanics - With more software-powered, renewable energy powered vehicles, we may go away from traditional fuel-powered vehicles
    • Cooks / House Keeping jobs - A lot of tools will come in place to assist automated cooking, automated vessels cleaning
    • Shopper Associates - AI will power restocking, self-checkout facilities 
    A lot of these jobs will disappear and a lot of people would end up jobless. How are we going to upskill, reskill, balance the economy matters

    Keep Thinking!!!

    January 16, 2020

    My Favorite Retail Ideas in NRF 2020

    1. Smart Mirror

    2. Scan and go


    3. Vision Based Inventory Tracking


    4. Walmart AI Store Vision Powered


    5. Mobile based checkout

    6. Automated Checkout


    7. Shelf Edge Camera



    Retail in 2025

    • Every Retail Store is an Ecommerce Store
    • Every Offline Retail Store is a Fulfillment Centre
    • Every Offline Retail Store is a Warehouse
    • Every Offline Retail Store is a Returns Centre
    • Every Store will have Sensors, AR Experience, Personalized Experience
    • Specialty Stores, Private labels will be key
    • A lot of mobile-driven interfaces to search/try/buy

    Happy Learning!!!

    January 14, 2020

    Day #319 - Data Story - Datavision - AI Use cases


    Happy Learning!!!

    Day #318 - Image Comparison Techniques

    Image is a set of points (vectors). We could use different techniques for image similarity comparison. Listed below some of out of box techniques. Dlib based comparison was earlier discussed.
    • Haar, Extract Region
    Comparison Techniques
    • Euclidean distance
    • Cosine distance
    • Hamming distance
    • Jaccard-Needham dissimilarity


    Ref - Link

    SSIM Approach Technique of structural similarity
    Happy Learning!!!

    January 13, 2020

    Day #317 - Ensemble Methods

    Summary of Ensemble Techniques, Bagging, Boosting code snippets

    Happy Learning!!!

    Day #316 - SVM Classification Examples


    Happy Learning!!!

    Day #315 - ML Notes - Regression

    • L1 (Lasso) can shrink some coefficients to zero
    • L2 (Ridge) shrinks all the coefficient by the same proportions but eliminates none. L2 does square a number punishes large values more than it punishes small values.

    I am bad at reading. I skip content and directly focus only on what I am trying to solve. I am going to go through sci-kit documentation and try all the code snippets.
    Happy Learning!!!

    Concepts - WeightofEvidence, Information Value

    While checking on FinTech ML projects I came across these two concepts WeightofEvidence, Information Value. I found this link intuitive and understanding.

    Basically when we bucketize, within each range of buckets we can, in turn, sub-divide the other factors based on distribution. In Retail Scenario

    Customers Age Group (20-30, 30-40, 40-50). Within each bucket, we can find the percentage of fraudulent customers. It may be

    20-30 - 4% fradulent
    30-40 - 2.5%
    40-50 - 1%

    This technique helps to assign possible values and decide their impact. This is my understanding. We can also infer the same based on data analysis and distribution percentages across different classes.

    More Reads - Link1, Link2, Link3, Link4

    Happy Learning!!!

    Data-flow -> Knowledge-flow -> Future Prospects


    • Data to Datalake
    • Datalake to Collective DataInsights
    • DataInsights to Features
    • Features to Models
    • Models to Predictions
    • Predictions to Preparedness

    Data -> Insights -> Predictions

    Happy Learning!!!

    January 11, 2020

    Decode your personality - Social Media Unplugged

    • Google - will you everything about you. Regular search, restricted search, places visited
    • Gmail - Your friends and their locations, Your communication and type of person you are
    • Facebook - What is your social circle, your financial status from the place you live and place you work
    • Linkedin - Your average compensation can be estimated with your education, company and years of experience
    • Whatsapp - All your mood swings, emotional discussions and your relationship with the outside world
    • Mobile Number - Your average calls, the number of contacts in friends/blood relationships
    • Bank Account - Spending patterns, places you visit. Average expenses across food/shopping/travel
    • Uber - Where you traveled, What was your pattern for the last month
    We do not have ownership of your data shared. There is no expiry date for the data collected. This data is good enough to decode a person.

    Google Sued for Secretly Amassing Vast Trove of User Data
    • Consumer browsing history
    • Web activity data 
    • Invasion of privacy and violations
    • Storing geolocation data with its mobile apps
    Today Attended Aljazeera discussion on Cambridge Analytica Scandal. (April 8th)
    Founder - BRITTANY KAISER of ownyourdata 
    • 87 million Profiles were screened in facebook
    • Data science to classify/label possible persuaders
    • Leverage all their behavioral data 
    • Identify their interests (Climate change, national security, Refugee issues) based on their Facebook groups/feeds
    • Targeted Ads and convert them into positive voters / Compromise Integrity / Privacy
    Sounds Scary :( :( Manipulating data



    Tech Talk Link

    What Swiggy knows about you?


    Single vs Family, Brand Centric, More than what I could think of :(

    What a Better Social Network Would Look Like
    • Make social networks nonprofits
    • Ban algorithmic amplification
    • Restrict personal data collection and behavioral advertising
    • Let a field of smaller social networks bloom
    • Stop putting white men in charge
    New Age Social Media - My Perspective
    Social media has to be more realistic to reflect on our life. Our daily thoughts only our first level circle immediate relationships need to know. Our social thoughts or tagged social should be reflected in the second-level circle. About education, professional it needs to reflect in the connected circles
    Everything needs to be within the limits to avoid information overload. Today data is business and connections are the business value of the individual. In the long run, both businesses and consumers will lose value. Companies focused on rapid market share without ethical values will end up creating zombies than responsible citizens.

    Awful AI Projects - Link

    Keep Thinking!!!

    January 10, 2020

    Model Documentation and Coding Guidelines - Python

    This paper was very useful. This covers Data Source, Purpose, Model Accuracy, Recommendations. The key metrics (Screenshot from the paper)


    Structuring Machine Learning Projects

    ML Experiment Parameters

    • Model Parameters
    • Learning Rate
    • Number of Epochs Run
    • Training Loss
    • Validation Loss
    • CPU %%
    • Memory %%
    • Disk usage




    Happy Learning!!!

    January 09, 2020

    Day #314 - Dynamic Taxing using AI

    Flat tax slabs and income tax rates are the procedures followed today. With the digital economy, extensive data collected/monitored, new metrics we need to consider dynamic taxing based on several parameters. These parameters need to be chosen from the short term / long term perspective. Depending upon industry prospects/growth / economic factors these values can be adjusted for predicting/recommending tax. We need to collect several parameters to Predict taxing numbers.

    Some of the parameters we can leverage are
    • Manpower / Natural resources
    • Renewable energy sources
    • Direct skilled employment
    • Indirect employment 
    • Contribution to Innovation
    • Technology adoption / Sharing for partners
    • Contribution for long term growth / Longevity of the business /company
    • Contribution to Education / R & D / IP
    • Greenhouse impact
    • Sector score
    • Taxing based on domain/industry
    • Profit margins, Balance Sheets, Supplier Balance Sheets / Company global profit margins
    • %% of revenue saved with Automation / Robots
    • %% of materials sourced/imported
    • Export value / quantity of items / demand 
    • Measure and change dynamically after ever quarter
    Data needs to drive the decisions. We need to be more dynamic towards tax by understanding demand/market conditions/growth prospects/ecology and economic impact. To develop sustainable growth it needs to focus on both short term and long term benefits.






    Ref - Link

    Keep Thinking!!!

    January 08, 2020

    Day #313 - Data science use cases solved in Indian Startups

    I have personally trained/connected to several Indian Startups to understand their AI use cases with respect to their domain/businesses they operate. Some of the key use cases I observed are
    • Fintech - OCR, banking statements, documents data extraction (Computer Vision)
    • Textile - Similarity of items, thickness  (Computer Vision)
    • B2B - Shop verification, re-identification  (Computer Vision), Item Level Forecasting (Data)
    • Agritech - Ripening, Fruit quality assessment (Computer Vision)
    • Retail - Device failure predictions, IOT based predictions (Data)
    • SalesData - Sales Analytics, Cross-selling, Upselling (Data)
    Kaggle was one approach to solve. I always start from domain -> data -> use cases. This learning helped me to map AI challenges / applications across domains.

    This is the trend I observed in the past few years. I am hoping to train/connect more startups across other domains. Feel free to connect with me for any AI training requirements/discussions.

    Happy Learning!!!

    January 07, 2020

    Data Insights for HelloFresh

    Menu Insights
    • Peak seller's
    • Weekday trends
    • Top trends based on seasonality
    • Review based listings
    Customer Insights
    • Top customers
    • Age Groups
    • By Gender
    • Average revenue per customer by age group
    • Recurring customer patterns
    • Food quality issues / incidents / patterns
    Demand Insights
    • Locality vs Demand
    • Maximum Ordered Items
    • Peak times
    • Lean times
    • Weekend patterns
    • Weekday patterns
    • Peak hour trends
    Delivery Insights
    • Transportation cost / time
    • Maximum Traffic Delay Areas
    • Incidents / Damages
    Basic ML Use cases
    • Forecast on volumes of items based on historical data
    • Segmenting customers based on Age / Gender / Veg / Non-Veg / Cusine Choices and providing recommendations
    • Forecast Order Volumes and assign Delivery partners based on Projected numbers to reduce other delays
    • Recommending a similar item every day from other restaurants based on historical data
    • Balanced diet customized to need /preferences based on user choices for a week
    Happy Learning!!!

    Day #312 - AI for hospitals - Children's Hospitals

    Medical is very interesting. I haven't connected with Hospitals. It's a very niche domain. The process again is similar to other areas like Data Collection -> Insights -> BI -> AI. A high-level overview which you can consider doing it.

    Data Collection
    • Overview of Data Collected from Mothers, Newborns
    • Overview of Symptoms / Medications
    • Overview of Sequence of Medications / Side-Effects
    • Positive / Negative Cases
    • Finance / Insurance Related Insights
    Analytics Insights (BI)
    • Most Observed Issues
    • Rarely Observed Issues
    • Trend of Admissions / Patterns across Cities
    • Financial Insights / Insurance Related
    • Correlation to Past Medical History to Complications
    AI Use Cases
    • Patient historical data-based risk predictions
    • Monitoring for new-borns and proactive alerts
    • Tie-up with AI Companies for post pregnancy monitoring and alert of Kids (https://www.loveys.io/)
    • Use the Same Tech (https://www.loveys.io/) to monitor patients
    • Feedback / Sentiment Analysis from Clients
    Happy Learning!!!

    January 06, 2020

    Day #311 - Key Notes for Airbnb ML talks

    ML Use cases
    • Search Ranking 
    • Smart Pricing (Demand Vs Supply)
    • Fraud Detection (Risk Scoring)
    Link1 - Forecasting Uncertainty at Airbnb - Theresa Johnson (Airbnb)

    Insights will tell how it works, Model the business based on data flow/decisions involved.

    See the problems as both
    • Demand Problem
    • Supply Problem
    Loved the pool cleaner theory
    • Demand Problem - Population, Household count, Number of cleanings
    • Supply Problem - Number of cleaners, Available hours
    Link2 - ML Airbnb

    Key Lessons
    ML tool for customized prices for each night instead of flat rates. The classic combination of demand, supply, peak pricing

    Link3 - Bighead: Airbnb's end-to-end Machine Learning Platform | Airbnb

    Design Goals
    • Seamless (Easy to prototype)
    • Versatile (All framework support)
    • Consistent Environments
    Tools - Redspot, BigHead, MLAutomator, Deep Thought

    Slides - Link

    Advanced ML Use Cases
    • Categorizing Listings
    • Experience Ranking
    • Object Detection
    • Customer Service Ticket Routing
    Happy Learning!!!

    January 01, 2020

    Day #310 - Handle Data Imbalance, Missing Data

    This post is about
    • Handling blank values
    • Handling missing values
    • Handling data imbalance

    My request to readers, If you find this code snippets, blogs, articles helpful, please share your learning with others. We can only grow only by learning and teaching.

    Happy Learning!!!

    Data Science Experiment - Milk Adulteration

    Data - Link



    Happy Learning!!!

    Stats Lessons




    Happy Learning!!!

    NOSQL Internals and Design Practices

    Objective – The objective of this paper is to analyze NoSQL internals from RDBMS developer perspective and provide design guidelines for NoSQL Applications
    Analysis
    RDBMS – RDBMS came into the picture to ensure the ACID properties are maintained and there is a single version of the truth. RDBMS plays a critical role in OLTP applications (Banking, Finance, and Payment) domains.
    Database design– Database design is implemented to ensure it's normalized and avoid data redundancy. Primary Keys, Indexes are created to ensure query plans use the indexes to filter required rows and fetch required results within the shortest intervals.

    Query Execution – Data is typically stored in a B-Tree format. The data is organized physically in the form of clustered indexes. This is the reason search based on the primary key is quick compared to any other non-indexed columns. Database Engine implements several other operations to optimize the execution plan by leveraging indexes, statistics, and partitioning, Non-clustered indexes. Depending on the query plan join operators, sort operators are applied to produce the execution plan. The execution plan is reused if it already exists in memory.
    This paper was very useful to understand OLTP Internals. Reposting notes from my blog post
    • WAL – Changes are written in log and committed to disk when the checkpoint is reached
    • Buffer Manager – cache for data fetched / recently used
    • Two-Phase locking – Optimistic/pessimistic locking depending on isolation levels
    • Concurrency control – Based on isolation levels
    NoSQL Databases 
    Similar to above OLTP aspects, There are few papers that describe designing NOSQL apps for Read heavy / Write Heavy Apps. This paper was very useful to understand NoSQL perspective of designing apps in columnar databases

    For Heavy Writes
    • Tall Skinny Tables
    • Consolidate data into single columns
    For Heavy Reads
    • Fewer column families
    • Use bloom filters
    There are multiple NoSQL databases (Key-Value, Document-based, Columnar Databases, etc...). 

    Happy Learning!!!

    Day #309 - Handle Categorical Columns

    Have a Great, Peaceful and Successful 2020
    This post is on Handling Categorical Columns
    Happy Learning!!!