"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

May 30, 2019

Day #255 - Analytics Use Cases

Datarobot has a good list of use cases listed. On top of it, I wanted to add on the ML models, data comments.






Ref - Link

Happy Mastering DL!!!

May 27, 2019

Day #254 - Upgrade OpenVino on Linux, Install System Studio

1. Follow steps in link

Few other Custom Additional Steps
rename existing /opt/intel to /opt/intel_bkp
mv intel intel_bkp
sudo mv old_dir new_dir_name
sudo rm -r directory_to_remove

File to set path of OpenVino
vi /home/ubuntu/.bashrc

pip install protobuf==3.6.1
pip install test-generator==0.1.1

Download form Link




System Studio Download Link


cd ~/Downloads/
mkdir intel_tools
unzip intel-sq-tools-installation-bundle-linux-linux.zip -d intel_tools

tar -xvzf system_studio.tar.gz

cd system_studio
sudo ./install.sh

5.1GB download

Post Installation - Run this script:
/opt/intel/system_studio_2019/iss_ide_eclipse-launcher.sh

First Project - Link


GC++ Hello World App


Run a Custom Model in Python

Happy Mastering DL!!!

May 22, 2019

Day #253 - My Backprop Notes

This post I was looking for sometime. Backprop for me in one Slide :)


Question #4 - The Bias (5 marks): We generally initialize the bias to random numbers larger than 0. Why? What happens if we initialize it to a value below zero? Does this affect our ability to train?

Answer -

We cannot initialize it to zero. By chain rule it will affect the derivatives and will end up in zeros only. By assigning random non-zero variables we will have derivatives available and slowly find the local maxima using gradient descent approach. 

Happy Mastering DL!!!

May 21, 2019

Day #252 - Data Science Skills

Data Science = Database + Insight  + Business Acumen + Feature Engineering  + Build your model + Deploy at Scale
  • Database  = Load, Query, Aggregate, Find, Max, Mins, Group by different Dimensions
  • Insight  = Convert the numbers of highs and Lows into Why Insights questions
  • Business Acumen  = Finding the right use case that balances data availability and business expectations
  • Feature Engineering  = Convert all your Insight Skills into feature variables
  • Build your model = Build the model and evaluate it
  • Productionization  = Recommend Spark / Whatever options scalable to deploy it
These are my personal lessons working on projects. Look at data science from a big picture. Learn and master every skill continuously. Learning never ends.

Happy Mastering Data Science!!!

May 17, 2019

Day #251 - Customer Churn Modelling in Simple terms

Scenario
  • Customer A buys 2 Beers, 1 bottle Whiskey every Wednesday of Week
  • If he does not buy on Wednesday he will come on Friday to pick up 2 Beers, 1 bottle Whiskey
  • If the customer is out of town he would not buy
With ML let see
  • If customer usually comes within minimum 3 days and maximum 5 days
  • If customer does not turn up after 5 days we need to find reason why
  • Is there some other shop nearby where customer prefers to buy
  • Does the custom buy the same thing online with discounts
We infer and quickly observe if there is a change in pattern to see if we are losing the customer. This is what churn modelling in simple terms.

Happy Mastering Data Science!!!

Day #250 - Context and Preferences aware recommendations

Its is very important to have a feedback loop to provide better recommendations
Basic Recommendation
  1. Buy Product A
  2. Usual ML recommendation is bought together A,B,C. Sold together A,C
This is great everyone will do it. If I dislike the product A, How do we know
Feedback based Recommendation
  1. Buy Product A
  2. Read Custom Low Rating for A
  3. Look for people who don't like A what they bought
  4. Recommend those instead of A
Returns based Recommendation
  1. Customer Buys Product A
  2. Custom returns Product A
  3. Look for people who don't like A what they bought
  4. Recommend those instead of A
Season based Recommendation
Considering customer preferences for each season, brands picked up for each season, Consider them into recommendations

Brand based Recommendations
Customer affinity for a certain brand because of size, variety etc

Happy Building Your Data Story!!!

May 14, 2019

Day #249 - Porting Base faster_rcnn_inception_v2_coco_2018_01_28 Model to OpenVino

  • This model was downloaded from link - http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz
  • Openvino says this model is supported - This link contains models supported. https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_TensorFlow.html#inpage-nav-2-1
  • Execute command in Python Console - 
C:\Intel\openvino_2019.1.133\deployment_tools\model_optimizer\python mo_tf.py --data_type=FP32 --tensorflow_object_detection_api_pipeline_config "C:\faster_rcnn_inception_v2_coco_2018_01_28\pipeline.config" --tensorflow_use_custom_operations_config "C:\Intel\openvino_2019.1.133\deployment_tools\model_optimizer\extensions\front\tf\faster_rcnn_support.json" --input_model "C:\frozen_inference_graph.pb"

Porting in OpenVino format is successful, However, I did encounter other issues in custom training. Will share learning's after finding a solution.

Happy Mastering DL!!!

Redirect output to log file Windows


> File_to_capture_logs Setting to redirect both console and error logs (2>&1)

Example
Test.bat > E:\Runout_log.txt 2>&1

Happy Learning!!!

May 11, 2019

Use AI with Caution

  • AI can predict but no replacement for Good Customer Service
  • Feature engineering identifying insights from historical data and translating into meaningful features
  • Data imbalance can be handled by upsampling, downsampling, data augmentation but remember good data plus basic model can beat data imbalance
Happy Learning!!!

May 10, 2019

Day #248 - Custom Yolo Detection Lessons on Windows


  1. Two versions of Yolo, darkflow-master vs Darknet. Darknet - written in C and CUDA, Darkflow - YOLO on TensorFlow
  2. Darknet Annotations .txt format
  3. Darkflow Annotations xml format
  4. After Custom Training Error in model - #IOError: [Errno 2] No such file or directory: 'labels.txt'
  5. Edit file in path of D:\darkflow-master\darkflow-master\darkflow\defaults.py - Provide full paths of the file location for labels, checkpoint
How to train YOLOv3 on Google COLAB to detect custom objects (e.g: Gun detection)

Happy Mastering DL!!!

Day #247 - Yolo Compile Error - LINK : fatal error LNK1158: cannot run 'rc.exe'

While trying to build Yolo encountered Error - Yolo Compile Error - LINK : fatal error LNK1158: cannot run 'rc.exe'

The missing executables and files copying from C:\Program Files (x86)\Windows Kits\8.1\bin\x86
 Fixed the issue. Below two files


Useful StackOverflow answer
Happy Mastering DL!!!

May 09, 2019

Day #246 - Updating OpenVino Latest Version Install - Windows

Build date - 24 Apr 2019

This was quick as all pre-requisites were done earlier

1. Extract and Install to C:\Intel
2. Open Anaconda Shell and activated environment console
3. Run the C:\Intel\openvino_2019.1.133\bin\setupvars.bat
4. Run All install pre-requisites in folder -  C:\Intel\openvino_2019.1.133\deployment_tools\model_optimizer\install_prerequisites
I tried - install_prerequisites.bat, install_prerequisites_tf.bat

5. Goto Demo Folder C:\Intel\openvino_2019.1.133\deployment_tools\demo and Run demo_squeezenet_download_convert_run.bat




Happy Mastering DL!!!

May 06, 2019

AI - Key Papers - Timelines


  • Object detection (Redmon et al, 2015)
  • Image captioning - Neural Image Caption Generation with Visual Attention, 2015
  • Random walks in latent space - (Alex Radford, 2015)
  • Semantic segmentation (Long et al, 2015)
  • Auto-captioning (2015)
  • Autonomous cars (NVIDIA, 2016)
  • Future simulation - (Finn et al, 2016)
  • Neural machine translation - (Google’s Neural Machine Translation System, 2016)
  • Drug design and response prediction (Gomez-Bombarelli et al, 2016)
  • Impersonation by encoding-decoding an unknown face. - (Kamil Czarnogórski, 2016)
  • Image super-resolution - (Ledig et al, 2016)
  • Reinforcement learning (Mnih et al, 2014)
  • Segmentation (Hengshuang et al, 2017)
  • Pose estimation (Cao et al, 2017)
  • Music composition (NVIDIA, 2017)
  • Geometric matching (Rocco et al, 2017)
  • Instance segmentation (He et al, 2017)
  • Scene understanding - (Wu et al, 2017)
  • Transfer learning from synthetic to real images - (Inoue et al, 2017)
  • Strategy games (Deepmind, 2016-2018)
  • Speech synthesis and question answering (Google, 2018)
  • Image generation (Karras et al, 2018)
  • Real-time object detection (Redmon and Farhadi, 2018)
  • Sequence Problems
    • Sequence classification - Sentiment Analysis, Activity Recognition, DNA Sequence classification, action selection
    • Sequence Synthesis - Text Synthesis, Music Synthesis, Motion Synthesis
    • Sequence to Sequence Translation - Speech Recognition, Text Translation, POS tagging
    • Generative models - Image and content generation - DRAW: A Recurrent Neural Network For Image Generation, Pizel RNN, ALI

Happy Mastering DL!!!

Data story of Taxi Booking apps

This data story is my personal experience using Taxi Booking apps. I use both Ola / Uber. Some of the common observations. I have tried to outline the data flow, Reporting use cases, ML use cases involved based on my understanding and usage.

Observations using App
  • On booking cab request we can see
  • Vehicle type and expected time
  • SLA to reach the destination 
  • Real-time message processing, notifying, accepting and notifying rider and driver partner (stream processing, segmentation, notification, acceptances)
  • Display stats of driver during the trip
Pain points Observed
  • You book a trip at x price. The trip gets canceled by the driver. Now when you book again peak price is applied 
  • My personal experience drivers more comfortable with cash payments 
  • Reluctant to switch on AC by mileage conscious drivers
  • Target driven. I have spoken to driver partners driving non stop 24hrs to meet targets 
  • I never had a great experience using cab pooling. It took me 2x time most cases unless if it is an odd time
Data collected 
  • Trip details
  • Passenger details
  • Fare details
  • Ratings of driver and passenger 
  • Cab bookings at each location point
  • Find maximum long routes, maximum booking points location
  • Find Maximum booking time across airports, bus stop, Railway stations 
  • Driver partner ride and earning details 
  • Data available at the city level, Area level (Slide / Dice)
  • Review / Rating / Feedback on Cancellation
Data at customer Level
  • Trip Details by Each Customer. Expenditure at the customer level
  • Since location is shared they can identify Office, Home, Restaurants, Malls, Airports, Railway stations 
ML use cases for customers / Booking
  • Segment people using services based on trip distance, number of trips, trip expense
  • Classify people in terms of potential weekend travelers, shopper, Stay at home person 
  • Recommend areas for peak pricing
  • Recommend timing for peak pricing
  • Recommend peak pricing with the highest probability of conversion (A/B testing)
  • Predict top 10 cab pickup points and order numbers considering historical data seasonality
  • Predict customer churn
  • Promotions based on segmenting customers (High Value, Medium, Low Spending Customers)
  • A lot of scope vision apps to do audio based analytics, classic drowsiness detection, distraction, use of the mobile phone ( custom object detection models)
  • NLP on Customer feedback / Sentiment Analysis
ML use cases for driver partner
  • Predict driver churn
  • Predict the number of trips for next week and set target accordingly 
  • Predict the nearest area where the probability of booking higher for driver partner
  • Predict Acceptance Rate for a Route based on Driver preferences derived from historical data
Promotions
  • Promotions and recommendations for eateries
  • Promotion for a pass for customers 
Data collected from the vehicle (If it is fitted with sensors to collect data) - Car Manufacturers and Ride Sharing App Partnerships - 'Data Access' to understand
  • Access to data which can be used to build predictive models, deep learning models for training Autonomous driving decisions
  • Real-time data pipeline for sensors, devices, software, vision data for building models customized for Indian Conditions
  • Access to Components Utilization patterns for different vehicles running in different Regions / State
  • All this data will help in building Connected Cars, Training better models for better Data-Driven Decisions
  • Driving conditions vs vehicle performance in those road conditions
Other Factors / Emerging Competitors

Quick ride has come up, which is also sharing the same space of ride-sharing apps but for a different segment. Quick ride is more economical, predictable with recurring rides.

Customers, Driver partners would have an android based smartphone. Google has all the information available to give a cab-sharing app like a social platform. If Google is going to monetize for sharing traffic details, congestion then it will also get significant revenue for the provider

Autonomous vehicles - Robo taxis is a distant dream for our country. If such a thing happens I am afraid about an alternate career for driver partners. Change is the only permanent thing that never changes

Updated May 28/2020






I have tried to outline certain data stories I observed using Taxi Booking apps. Your comments and feedback welcome!!!.