Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): April 2019

April 30, 2019

Data story behind Food delivery Apps

Since I use food delivery apps heavily. Both swiggy, ubereats. My views and reflections of data story/measures/ machine learning use cases from these applications

My observations based on application use. Ubereats highlights below activities based on historical data collected

Previously ordered restaurants
Previously ordered items highlighted
Review based listings
Projecting estimated delivery times

I personally face challenges while trying to shift to a low-calorie diet as recommendations are more tuned for past orders.

Recommending a similar item every day from other restaurants based on historical data
No option to set preferences for the coming week - Balanced diet customized to need /preferences based on user choices for a week
Fold quality issues exist no matter how good review ratings are

I have worked on real-time systems, reporting and moved to AI. Now we have all tools to query data in motion, historical data and future data forecast. This view provides a complete end to end perspective to understand data, numbers. Some of below metrics/ measure overlap across transactions/ historical data / ai

Key Metrics / Measures

Average Order delivery time at different times (Morning / Lunch / Evening / Holidays / Weekends)
Average Order Order pickup time at different times
Order acceptance rate
Clicks/ conversions
A/B experiment and conversions
Payments type vs orders
Average menu browsing time
Frequently searched items across days / restaurants / seasons
Predict order delays using Traffic data
Peak seller's
Top customers
Weekday trends
Top trends based on seasonality

Data science use cases

Forecast on volumes of items based on historical data
OCR, Recommendation at User Level / Sold together items
Deep learning for automated food classification, tagging
Segmenting customers based on Age / Gender / Veg / Non-Veg / Cusine Choices and providing recommendations
Forecast Order Volumes and assign Delivery partners based on Projected numbers to reduce other delays

Tech Talk -

Everything that is measurable can be managed, monitored, improved. There have to be more quality aspects to be integrated as we risk ourselves trusting rating for better quality. Hope quality bar keeps improving and story evolves into another version customizing based on personal diet plans and choices. Happy finding the data story behind these food delivery businesses Apps !!!

April 28, 2019

Day #245 - Exploratory Data Analysis Goals

Find Insights, Turn it into Why-questions
Seek Surprises sudden peaks, lows, Harness it into How Questions
Plot data in different dimensions Month / Year / Sales, Find Insights in every perspective

Learn the story behind the numbers!!!

Happy Mastering DL!!!

Innovation Session Notes

Today was interesting. Attended Innovation Session. Very inspiring, motivational session. Thanks Akshay Cherian.

Communicating to brain
Communicating with emotion

Different types of Questions

Know How
Know When
Know What
Know Why

Great Learnings / Habits

Listening at different levels
Insights learnt
Find Insights, Turn it into question
Seek Surpises, Harness it
Insights + States of Flow loop on each other
We can perform when it keeps us exciting and not overwhelming
Creativity is presence of constraints
Create disproportionate value
Solve by doing, Comfortable with failure
It will hurt a bit if you are doing something meaningful

Books

Geography of Genius
States of Flow Assessment

Steps

List, Cluster, Reorder
Start Challenge vs Idea
Turn Top barriers into Questions
Rate Idea Valuable / Simple
Turn Ideas into Steps
Review Ideas
Create and Share output

Fresher Tips

Work for free until they see value

Challenges / Problems / Opportunities

Redefinition of problem
Think from Celebration
Pissed of is better than passion
Treat it as a Games
Breakthrough tools
Breakthrough environment
Don't define the problem by Single word
Meeting for Evolution not Evaluation of ideas
Reduce perceived Risk of Sharing

Futuristic

Don't operate from crisis to crisis
Anticipate and prepare for next-gen risks
Plan a budget for time and money
Manage time and invest time differently
Learn a combination of skills

Key Lessons

Like how we pay Monthly Bills, How much did you budget your time and money for learning, upskill or coding. I felt slap on face with this question.
If someone is in a role where you want to be. Look at his previous role and see what he has done to move to that role
If someone is successful see the pattern, practice they follow don't put it as luck, motivation. Emulate them

Sharing my Earlier Work

Why it fails

Will definitely include this game for our team building. 😂

The same outcome with communication when channel down the hierarchy. pic.twitter.com/MgEWsEGzP7
— Alvin Foo (@alvinfoo) November 23, 2020

Happy Learning!!!

April 27, 2019

Day #244 - Data Annotation Guidelines

Quality of Images and capturing significant traits like styles / shapes / colors
Object to annotate captured from nearest possible view (Best Possible Angle)
Impact of poor background light / night and too far images. Discard low quality / poor miniature of objects (Occurs in edges of image)
Handling partial objects
When Annotating multiple objects the class imbalance factors between them, Fix before training. Analyzing Number of Objects, Occurrences - Distribution for Sampling balances
Check for Data set impacts for Daylight / Night and annotate / train / build model accordingly

Guideline - The object under training occupies the center spot / nearest closest better view to know the side / front or reasonably good amount of features like color / styles

Good data / quality data is as important than the model / approach we take.

Happy Mastering DL!!!

April 26, 2019

Day #243 - Retail Analytics Opportunities

Notes from Recently Attended Session
Instore Retail

Store Managers
Associates usage
Online / Offline Users

Customer Experience

Optimize layout
Improve promotion effectiveness
Shopper Journey Outcome

Store Performance

Segmentation
Forecasting

Store Managers

People Person
Genuine interest for customers, supervising, nurture ideas
People Skills, Sales Skills, Management Skills

Store Manager

Stocking
Delegate Activity
Customer Service
Sales projections
Readiness

As Required

Forecasting
Staffing
Hiring
Associate Performance

Challenges

Difficult Customers
Personal issues
Customer Expectations
Not get yelled at

Sale

Conversation that ends in a transaction
Know better, focus on training people
Product knowledge

Apple Selling Philosophy

A - Approach (Welcome Approach)
P - Probe - Needs
P - Provide Solutions
L - Listen Concerns / Issues
E - End with farewell / Invitation

Power Hours - Most Sale time / peak hours

Sales Split by hour
Sales Split by Week
Labour vs Traffic Approach
%% of sales at that Hour

Types of Retailers

Malls - Retailers - Pantaloon / Shoppers Shop
Individual Stores - Kirana Stores
Online Sales - Amazon / Flipkart
Chain of Stores - Pothys / Chennai Silks (Have own sourcing units) - Brand Conscious

Power Centers

Maximum Value Proposition
Competitive prices
Volume & competitive price

Factors to Setup Stores

Demographics
Income levels
Spending group
Frequency of spending
Family Area
Single People
Average basket size
Average shopper duration time
Cultural aspects

Models / Recommendations

Model for income group
Model for domain
Model for age group

Happy Retailing!!!

April 20, 2019

Artificial Intelligence (AI) Podcast - Lex Fridman

Excellent talk with deep tech conversations, principles and thought process. Some of the questions, interesting lines I liked from the podcast

AI Assisted driving for a safer and better world
Dream of Autopilot - Autonomy revolution
Design Choices - Instrument Cluster, Display, Sensor suites
Display - Health check on vehicles perception of reality
Inputs - Camera, Radar, Ultrasonics, GPS
Information rendered into vector space with lane lines, traffic lights
Vector space re-rendered on display for people to understand the system
Considered parts / Uncertainties - Road Segmentation, Vehicle detection, object detection other techniques underlying
Debug Views - Augmented Vision with boxes, labels, Visualizer vector space representation from all sensors
Technical Aspects, Neural Network, Data, Hardware to allocate resources
Data - Vast amounts (12 ultrasonic sensors, GPS, IMU), 400K cars on the road
The massive inflow of data
Full self-driving computer development in progress
Cameras at FULL frame rate, FULL Resolution
Driving - Learn from Edge Cases
Autopilot disengagements - Aspects / Ideas
Take over for convenience / Optimal spline for traversing the intersection
Navigate complex intersection
Lane change based /freeway/highway interchange
Automatically overtake slow cars
Exit freeway
Full Self Driving Computer in Production
Tesla is Appreciating Asset
Navigate Parking Lots
Metric (Incidents per mile)
Assess the probability of a crash, injury, permanent injury, death
Video of faces/body
Moving from Elevator support to Automatic Elevator
Body Pose, Cognitive Load
Camera-based driver monitoring
More reliable than human then Driver Monitoring won't help much
Operational Design Domain
Instrument Cluster Display, Capabilities
Neural Net - Basic Bunch of Matrix Math
Learn both on valid and invalid data
What is a car
What is definitely not a car
Key ideas for Artificial General Intelligence
Tesla Goal - World's best Self Driving Vehicle
AI will convince to fall in love with it

Happy Mastering DL!!!

April 16, 2019

Day #242 - Working with labelmg

Download labelmg from link

Key things

Open Directory, Create Bounding Box, Label the object, Save the bounding box
XML will be generated with the coordinates

Happy Mastering DL!!!

Day #241 - Tensorflow on CPU - Object Detection

	Tensorflow on CPU
	===================

	Follow all steps in previous article,
	#Not Needed CUDA

	Step 2 - Cleanup Tensorflow
	=============================
	#Had to manually goto folder remove all packages named tf, tensorflow, tensorboard
	#C:\Users\XXXXXX\AppData\Local\Continuum\anaconda3\envs\tflow\Lib\site-packages
	pip install --ignore-installed --upgrade tensorflow
	conda install jupyter
	conda install scipy

	Step 3 - Custom Training
	=========================
	Goto Link https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
	Download http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz

	C:\Tensorflow1\models\research\object_detection\faster_rcnn_inception_v2_coco_2018_01_28

	Download code from https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10
	Extract to C:\Tensorflow1\models\research\object_detection\TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10-master

	Replace in C:\Tensorflow1\models\research\object_detection
	Delete file in C:\Tensorflow1\models\research\object_detection\training, C:\Tensorflow1\models\research\object_detection\inference_graph, C:\Tensorflow1\models\research\object_detection\images (test and training label xls files)

	Step#4 - Command
	==================
	cd C:\Tensorflow1\models\research

	protoc --python_out=. .\object_detection\protos\anchor_generator.proto .\object_detection\protos\argmax_matcher.proto .\object_detection\protos\bipartite_matcher.proto .\object_detection\protos\box_coder.proto .\object_detection\protos\box_predictor.proto .\object_detection\protos\eval.proto .\object_detection\protos\faster_rcnn.proto .\object_detection\protos\faster_rcnn_box_coder.proto .\object_detection\protos\grid_anchor_generator.proto .\object_detection\protos\hyperparams.proto .\object_detection\protos\image_resizer.proto .\object_detection\protos\input_reader.proto .\object_detection\protos\losses.proto .\object_detection\protos\matcher.proto .\object_detection\protos\mean_stddev_box_coder.proto .\object_detection\protos\model.proto .\object_detection\protos\optimizer.proto .\object_detection\protos\pipeline.proto .\object_detection\protos\post_processing.proto .\object_detection\protos\preprocessor.proto .\object_detection\protos\region_similarity_calculator.proto .\object_detection\protos\square_box_coder.proto .\object_detection\protos\ssd.proto .\object_detection\protos\ssd_anchor_generator.proto .\object_detection\protos\string_int_label_map.proto .\object_detection\protos\train.proto .\object_detection\protos\keypoint_box_coder.proto .\object_detection\protos\multiscale_anchor_generator.proto .\object_detection\protos\graph_rewriter.proto

	python setup.py build
	python setup.py install

	cd C:\Tensorflow1\models\research\object_detectiona
	jupyter notebook object_detection_tutorial.ipynb

view raw tensorflow_cpu.txt hosted with ❤ by GitHub

Finally was able to train Custom Object Detection

Notes on Custom Object Detection (Notes - Link )
Step #1 - Define Inputs - Specify files in TFRecord file format
Step #2 - Configure Train_config. Key Values are

Model parameter initialization.
Input preprocessing.
SGD parameters.

Step #3 - fine_tune_checkpoint should provide a path to the pre-existing checkpoint, To speed up the training process, it is recommended that users re-use the feature extractor parameters from a pre-existing image classification or object detection checkpoint
Step #4 - SGD - hyperparameters for gradient descent
Step #5 - Evaluator Config)

To get reasonable mAP@IoU scores for object detection API:

1. Try varying the Intersection over Union (IoU) threshold, e.g 0.2-0.5 and see if you get an increase in average precision. You would have to modify matching_iou_threshold parameter in object_detection/utils/object_detection_evaluation.py

2. Try different evaluator classes (the default one is EVAL_DEFAULT_METRIC = 'pascal_voc_detection_metrics'). If you are training on Open Image Dataset it makes sense to use open_images_V2_detection_metrics

3. Check your eval config file and increase the number of examples used in the evaluation set, e.g.

eval_config: {
num_examples: 20000
num_visualizations: 16
min_score_threshold: 0.2
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 1
}

4. Train the object detector for more iterations
5. Check current mAP against reported metrics (e.g. COCO mAP@IoU=0.5)

Step by Step: Build Your Custom Real-Time Object Detector - Link
Detectron2 Train a Instance Segmentation Model
Installing the Tensorflow Object Detection API

For custom object training BMW has shared their opensource framework. It is a packaged version of the complete object detection setup. (Yolo / TensorFlow this is good set of tools)

https://github.com/BMW-InnovationLab

I haven't experimented with it. This is a good place to leverage the setup as common tool. This was released few months back. I am working in my windows setup for a while.

Happy Mastering DL!!!

April 15, 2019

Day #240 - Setting up Tensorflow GPU on windows 10

The post is based on the session link . I have made few changes and updates on the same

	Reference links - https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md

	Step #1 Environment Creation
	==============================
	conda create -n tensorflowgpu python=3.6.3 anaconda

	Step #2 - Activation
	=====================
	activate tensorflowgpu

	Deactivation
	==============
	deactivate tensorflowgpu

	Getting rid of it
	===================
	remove -n tensorflowgpu --all
	conda env remove --name tensorflowgpu
	conda info --envs

	Step 3 - Package Setup
	========================
	pip install Cython
	pip install contextlib2
	pip install pillow
	pip install lxml
	pip install jupyter
	pip install jupyter notebook
	pip install matplotlib
	pip install protobuf
	pip install pycocotools
	conda install git
	pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
	pip install pandas
	pip install opencv-python
	pip install jupyter
	conda install spyder
	conda install tensorflow-gpu
	pip install tensorboard==1.12.2

	Step 4 - CUDA Setup
	=====================
	Install Visual Studio required components
	https://docs.openvinotoolkit.org/latest/_docs_install_guides_installing_openvino_windows.html
	Reinstall NSight for Visual Studio 2017

	Step 5 - Model Setup
	=======================
	Download https://github.com/tensorflow/models and unzip to C:\Tensorflow1\models

	cd C:\Tensorflow1\models\research

	Step 6 - Setting up Object Detection
	=======================================
	for /f %i in ('dir /b object_detection\protos\*.proto') do protoc object_detection\protos\%i --python_out=.

	Step 7 - Setting Config Paths
	==================================
	SET PYTHONPATH=C:\Tensorflow1\models\research\slim;C:\Tensorflow1\models;C:\Tensorflow1\models\research

	SET PATH = %PATH%;PYTHONPATH

	ECHO %PATH%

	Command
	cd C:\Tensorflow1\models\research

	Execute Command
	protoc --python_out=. .\object_detection\protos\anchor_generator.proto .\object_detection\protos\argmax_matcher.proto .\object_detection\protos\bipartite_matcher.proto .\object_detection\protos\box_coder.proto .\object_detection\protos\box_predictor.proto .\object_detection\protos\eval.proto .\object_detection\protos\faster_rcnn.proto .\object_detection\protos\faster_rcnn_box_coder.proto .\object_detection\protos\grid_anchor_generator.proto .\object_detection\protos\hyperparams.proto .\object_detection\protos\image_resizer.proto .\object_detection\protos\input_reader.proto .\object_detection\protos\losses.proto .\object_detection\protos\matcher.proto .\object_detection\protos\mean_stddev_box_coder.proto .\object_detection\protos\model.proto .\object_detection\protos\optimizer.proto .\object_detection\protos\pipeline.proto .\object_detection\protos\post_processing.proto .\object_detection\protos\preprocessor.proto .\object_detection\protos\region_similarity_calculator.proto .\object_detection\protos\square_box_coder.proto .\object_detection\protos\ssd.proto .\object_detection\protos\ssd_anchor_generator.proto .\object_detection\protos\string_int_label_map.proto .\object_detection\protos\train.proto .\object_detection\protos\keypoint_box_coder.proto .\object_detection\protos\multiscale_anchor_generator.proto .\object_detection\protos\graph_rewriter.proto

	Step 8 - Demo
	==============
	cd C:\Tensorflow1\models\research\object_detection
	jupyter notebook object_detection_tutorial.ipynb

view raw windows_tensorflow_gpu.txt hosted with ❤ by GitHub

Happy Mastering DL!!!

Day #239 - Home Depot Retail Data Science Cases

Key Lessons

45% percent of online orders picked up from stores
Data Science for better search, recommendation, personalization
Product search - similar product search with images
Personalization - bought together, sold together
Weather, Seasonality, Trends
Segmentation by product, division
Crowd behavior iin-store(Retain-store level analytics)
Relevancy of Search Engine

Happy Mastering DL!!!

April 12, 2019

Day #238 - Working with coco dataset

coco - common objects in context

Installation Steps https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md

Packages

pip install Cython

pip install contextlib2

pip install matplotlib

pip install pycocotools

pip install scikit-image

pip install --upgrade scikit-image

Demo Code (Minor Changes)

	#https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoDemo.ipynb
	#Windows
	#Python 3+ Environment
	import matplotlib.pyplot as plt
	from pycocotools.coco import COCO
	from pycocotools.cocoeval import COCOeval
	import numpy as np
	from skimage import io

	anntype = ['segm','bbox','keypoints']
	anntype = anntype[1]
	prefix = 'person_keypoints' if anntype =='keypoints' else 'instances'
	print('Demo for %s'%anntype)

	#Download Annotations from http://images.cocodataset.org/annotations/annotations_trainval2014.zip
	#Download to location C:\datadir
	#Initialize coco ground truth API
	datadir = r'C:\datadir\annotations_trainval2014'
	datatype = 'val2014'
	annfile = '%s/annotations/%s_%s.json'%(datadir,prefix,datatype)
	coco = COCO(annfile)

	#display categories
	cats = coco.loadCats(coco.getCatIds())
	nms = [cat['name'] for cat in cats]
	print('COCO categories: \n{}\n'.format(' '.join(nms)))

	nms = [cat['supercategory'] for cat in cats]
	print('COCO super categories: \n{}\n'.format(' '.join(nms)))

	catids = coco.getCatIds(catNms=['person','dog','skateboard'])
	imgids = coco.getImgIds(catIds=catids)
	imgids = coco.getImgIds(imgIds=[324158])
	img = coco.loadImgs(imgids[np.random.randint(0,len(imgids))])[0]

	I = io.imread(img['coco_url'])
	plt.axis('off')
	plt.imshow(I)
	plt.show()

	plt.imshow(I)
	plt.axis('off')
	annids = coco.getAnnIds(imgIds=img['id'],catIds=catids,iscrowd=None)
	anns = coco.loadAnns(annids)
	coco.showAnns(anns)
	plt.show()

view raw cocodataset.py hosted with ❤ by GitHub

Happy Mastering DL!!!

April 11, 2019

Day #237 - Working on Car Detection - OpenVino

Detailed steps are mentioned in link

	https://github.com/intel-iot-devkit/smart-video-workshop/tree/master/object-detection
	Detailed steps are mentioned in https://github.com/intel-iot-devkit/smart-video-workshop/tree/master/object-detection

	Part I
	========
	sudo mkdir -p /opt/intel/workshop/
	sudo chown ubuntu.ubuntu -R /opt/intel/workshop/
	cd /opt/intel/workshop/
	git clone https://github.com/intel-iot-devkit/smart-video-workshop.git
	export CV=/opt/intel/workshop/smart-video-workshop/

	Part II
	========
	mkdir -p mobilenet-ssd/FP32

	cd /opt/intel/computer_vision_sdk_2018.5.455/deployment_tools/model_optimizer
	cd /opt/intel/workshop/smart-video-workshop/object-detection
	so ./downloader.py –name mobilenet-ssd
	cd /opt/intel/computer_vision_sdk/deployment_tools/model_optimizer/
	sudo python3 mo.py --input_model /opt/intel/computer_vision_sdk_2018.5.455/deployment_tools/model_downloader/object_detection/common/mobilenet-ssd/caffe/mobilenet-ssd.caffemodel -o /opt/intel/workshop/smart-video-workshop/object-detection --scale 256 --mean_values [127,127,127]
	cd /opt/intel/workshop/smart-video-workshop/object-detection
	sudo apt-get install libgflags-dev
	/home/ubuntu/code/open_model_zoo-master/demos/build/intel64/Release/lib
	eval echo "~$ubuntu"
	/opt/intel/computer_vision_sdk/inference_engine

	PartIII
	=========
	#Edit MakeFile Contents
	all:
	g++ -fPIE -O3 -o tutorial1 --std=c++11 main.cpp -I. \
	-I/opt/intel/openvino/opencv/include/ \
	-I/opt/intel/computer_vision_sdk/inference_engine/include/ \
	-I/opt/intel/computer_vision_sdk/inference_engine/include/cpp \
	-L/opt/intel/computer_vision_sdk/inference_engine/lib/intel64 -linference_engine -ldl -lpthread \
	-L/opt/intel/openvino/computer_vision_sdk/opencv/lib -lopencv_core -lopencv_imgcodecs -lopencv_imgproc -lopencv_highgui -lopencv_videoio -lopencv_video -lgflags -I/opt/intel/computer_vision_sdk/inference_engine/include -I/opt/intel/computer_vision_sdk/inference_engine/samples/ -I./ -I/opt/intel/computer_vision_sdk/inference_engine/samples/common/format_reader/ -I/opt/intel/openvino/computer_vision_sdk/opencv/include -I/usr/local/include -I/opt/intel/computer_vision_sdk/inference_engine/samples/thirdparty/gflags/include -I/opt/intel/computer_vision_sdk/opencv/include -I/opt/intel/computer_vision_sdk/opencv/include/cpp -I/opt/intel/computer_vision_sdk/inference_engine/samples/extension -L/opt/intel/computer_vision_sdk/inference_engine/bin/intel64/Release/lib -L/opt/intel/computer_vision_sdk/inference_engine/lib/ubuntu_16.04/intel64 -L/opt/intel/workshop/smart-video-workshop/object-detection/lib -L/opt/intel/computer_vision_sdk/opencv/lib -ldl -linference_engine -lopencv_highgui -lopencv_core -lopencv_imgproc -lopencv_videoio -lopencv_imgcodecs -L/opt/intel/workshop/smart-video-workshop/object-detection/lib

	make
	./tutorial1 -i /home/ubuntu/code/Cars.mp4 -m /opt/intel/workshop/smart-video-workshop/object-detection/mobilenet-ssd.xml

view raw gistfile1.txt hosted with ❤ by GitHub

Happy Mastering DL!!!

April 09, 2019

Day #236 - Papers on Person Re-Identification

Paper #1 - Camera Style Adaptation for Person Re-identification

Key Lessons

Person Reidentification - Given Query Person, Retrieve person from multiple sources
Challenges - Resolution, Environment, Illumination
Camera Style Adaptation Approach - unsupervised, camera-invariant property

Techniques

Input image pairs are partitioned into three overlapping horizontal parts respectively, and through a siamese CNN model to learn the similarity of them using cosine distance

Paper #2 - SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC
Techniques

Kalman filtering in image space and frame by frame
Kalman filter with constant velocity motion

Paper #3 - In Defense of the Triplet Loss for Person Re-Identification
Techniques

A plain CNN with a triplet loss

Triplet Loss
Key Lessons

Look at Anchor, Distance with Positive Example, Distance with Negative Example
3 Images at a time Anchor, Positive, Negative Image
APNN
d(A,P) = 0.5 Set Margin to achieve it for positive / negative
L(A,P,N) = Max(||f(A)-f(P)||^2 - ||f(A)-f(N)||^2 + Alpha)
Chosing Triplets Randomly
Map Training Set into Triple

Example - Link1, Link2

Happy Mastering DL!!!

Day #235 - PyTorch developer conference part 1

Session #1 - Engineering Practices for Software 2.0
Key Lessons

New Programming Paradigm for Neural Networks
SGD writes code in weights of neural network
Tune Dataset, Tune model architecture, Tune the optimization
NN in Tesla for Autopilot

Best Practices for 2.0 Stack

Test Driven Development Workflow - Test set manually created, clean, Carefully curated test set
CI Workflow - Automate build - Unit Tests - Automate Deployment
Dataset is part of code - Automate Neural Network Training Jobs - Compile into Weights - Automate Deployments
Timestamp your data
Mono-repos in practice

Session #2 - Applied Deep Learning
Key Lessons

Many Research Projects use PyTorch
Pytorch - Simple, Extensible, Fast

Projects

Deep Learning SuperSampling - New GPU, Realtime better graphics
NN for super resolution
DL for real time graphics
Inpainting. http://research.nvidia.com/inpainting
Image and Video Synthesis - https://github.com/NVIDIA/vid2vid, Create videos with temporal consistency
Frame prediction, Optical flow, Historical data, Predict Sampling Kernel
Wavenet - Model for generating audio samples
Pytorch extension Apex for mix precision training

Session #3 - NLP Transfer Learning
Key Lessons

Making more general NLP Systems
Related tasks tend to help each other
Decanlp.com

NLP Projects

Question Answering
Machine Translation
Summarization
Sentiment Classification
Semantic Role Labeling
Semantic Parsing
Commonsense Reasoning

Techniques

Transfer Learning
Weight Sharing
Zero Shot Learning
Data Augmentation
Domain Adaptation
Multi-task learning

Approach

Seq2seq model
Classification, Extraction, Generation
Domain Adaption
Some ZeroShot

Sesson #4 - Deep Universal Probablistic Programming
Key Lessons

Pyro - Probablistic Programming Language
Modern Bayesian ML methods
NN for modelling and inference
Universal, Scalable, Flexible and minimal
3 Layer Architecture with Probablistic Programming interface
Inference Algo on top of library
Stochastic Variational Inference

To be continued from 00:55:00 rest of Session

Happy Mastering DL!!!

April 06, 2019

How I evaluate data science candidate?

Different business problems solved and their ML lessons learned, Deep Dive on Implementation, Algo used, Features Evaluated
Data pipeline set up and challenges faced
How do you keep track of new papers / evaluating and learning different frameworks
How much do you code on a daily basis for work / personal learning
Ability to bring different perspective/techniques solving problems

The field is evolving on a daily basis. We need passionate, curious learners and experimentation mindset!!!

April 05, 2019

Day #236 - Save Keras in Tensorflow pb format

This project was useful for conversion from Keras to Tensorflow pb format

Command
python keras_to_tensorflow.py --input_model="path/to/keras/model.h5" --output_model="path/to/save/model.pb"

Example
python keras_to_tensorflow.py --input_model="D:\\classification_3.h5" --output_model="D:\\classification_3.model.pb"

Happy Learning!!!

2.0 Lifestyle Skills

To survive we need a newer set of skills and a better awareness about yourself

Building Culture of Learning
Training and Experimenting Mindset
Emotional and Communication Skills
Fail and Learn Mindset
Balance Attitude dealing with Depression, Life Struggles

Happy Finding Yourself!!!

Finding Great Candidates

Communicate at the simplistic level
Create end to end experiments than certifications
Rely on passion, Consistent learning and good team players
Look for people who intend to make a change, consistent performance matters
Move out of puzzles, programs. Project or a prototype that requires reasonable design, code, use cases, an end to end implementation matters
Puzzles and program can find a good coder but doing end to end projects requires more skills than just coding
People who share what they learn can impact a change in culture than people who work in silos
Great Skills takes years, Passionate about technology to see how it evolves matters

Happy Learning!!!

April 04, 2019

Day #235 - Audio Analysis


	#pip install librosa
	#pip install python_speech_features
	#librosa with python_speech_analysis
	#Credits - https://github.com/librosa/librosa/issues/573

	import librosa
	import python_speech_features
	from scipy.signal.windows import hann

	n_mfcc = 13
	n_mels = 40
	n_fft = 512 # in librosa, win_length is assumed to be equal to n_fft implicitly
	hop_length = 160
	fmin = 0
	fmax = None
	#https://librosa.github.io/librosa/generated/librosa.feature.mfcc.html

	# y - Audio Time series
	# sr - Sampling Rate

	y, sr = librosa.load(r'E:\Audio_Analytics\test_data\1_street_music.wav')
	#sr = 16000 # fake sample rate just to make the point

	# librosa
	#n_mfcc: int > 0 [scalar], number of MFCCs to return

	mfcc_librosa = librosa.feature.mfcc(y=y, sr=sr, n_fft=n_fft,
	n_mfcc=n_mfcc, n_mels=n_mels,
	hop_length=hop_length,
	fmin=fmin, fmax=fmax)

	#https://python-speech-features.readthedocs.io/en/latest/
	# python_speech_features
	# no preemph nor ceplifter in librosa, so setting to zero
	# librosa default stft window is hann
	#winlen – the length of the analysis window in seconds. Default is 0.025s (25 milliseconds)
	#winstep – the step between successive windows in seconds. Default is 0.01s (10 milliseconds)
	#nfilt – the number of filters in the filterbank, default 26.

	#Returns: A numpy array of size (NUMFRAMES by numcep) containing features. Each row holds 1 feature vector.
	mfcc_speech = python_speech_features.mfcc(signal=y, samplerate=sr, winlen=n_fft / sr, winstep=hop_length / sr,
	numcep=n_mfcc, nfilt=n_mels, nfft=n_fft, lowfreq=fmin, highfreq=fmax,
	preemph=0, ceplifter=0, appendEnergy=False, winfunc=hann)

	print(list(mfcc_librosa[:, 0]))
	print(list(mfcc_speech[0, :]))

view raw Audio_example.py hosted with ❤ by GitHub

Happy Mastering DL!!!

April 03, 2019

Day #234 - NLP with Deep Learning | Winter 2019 | Lecture 1

Key Lessons

Get better in finding words that make them feel less alone
Writing is ability to communicate knowledge, Knowledge sent to places
Writing is 5000 years old
Meaning - Expression for Idea, Art, Writing

Use NLTK for Synonyms and Hypernyms
Wordnet fine distinction between senses of word
Words represented as one hot vectors
Building word similarities tables to map to similar words
Dense Vector - Word Embeddings Representation

Word2vec

Framework for learning word vectors
Every word represented by vector
c - center word, o - context outside word
Calculate the probability

Similarity between words the orange part
Exp turn positive or negative into number

Maths

Calculus Chain Rule
Vector Dot product
Multivariate calculus

Happy Mastering DL!!!

April 02, 2019

Day #233 - Tensorflow 2.0 notes

Summary of Notes

Adopted Keras for high level API, tf.keras
Common Pieces for - layers, models, optimizers
Keras - Pythonic and Easy to learn
For Larger Scale data, Estimators used - For Fault Tolerance
Estimators are powerful machines, All estimators moved to keras
1.0 - No Session, 2.0 Eager mode
Graphs even in eager context
Eager execution is a way to train a Keras model without building a graph
One set of Optimizers, Full Serializeable
Losses consolidated into single set
RNN layers update in Tensorflow, Unified RNN layers
Tensorboard for Performance profiling, Model performance
tf.distribute.Strategy API - Designed to handle many distribution architectures (Multi-Gpu)

To Update
pip install -q tensorflow==2.0.0-alpha0

Happy Mastering DL!!!

Day #233 - Pytorch Examples

	#Credits - https://github.com/hunkim/PyTorchZeroToAll/blob/master/05_linear_regression.py

	import torch
	from torch.autograd import Variable

	x_data = Variable(torch.Tensor([[1.0],[2.0],[3.0]]))
	y_data = Variable(torch.Tensor(([2.0],[4.0],[6.0])))

	class Model(torch.nn.Module):
	def __init__(self):
	#Constructor
	super(Model,self).__init__()
	self.linear = torch.nn.Linear(1,1) #One in and One out

	def forward(self,x):
	#Variable of input data
	#Variable of output data
	y_pred = self.linear(x)
	return y_pred

	#Initialize model
	model = Model()

	#Loss Function and Optimizer
	criterion = torch.nn.MSELoss(size_average=False)
	optimizer = torch.optim.SGD(model.parameters(),lr=0.01)
	total_loss = 0
	#Training Loop
	for epoch in range(500):
	y_pred = model(x_data)

	#compute and print loss
	loss = criterion(y_pred,y_data)
	print(epoch,loss.item())

	#Zero gradients
	optimizer.zero_grad()
	loss.backward()
	optimizer.step()

	#After training
	hour_var = Variable(torch.Tensor([[4.0]]))
	y_pred = model(hour_var)


	print('Predict after training',4,model(hour_var).data[0][0])

view raw Pytorch_LR.py hosted with ❤ by GitHub

	#Credits - https://www.youtube.com/watch?v=wbJJudn-Xn0&list=PLX5lD3sNR32CELTjbVRNMUCUakEO1Lu0H&index=2
	#https://github.com/hunkim/PyTorchZeroToAll/blob/master/10_1_cnn_mnist.py

	import torch
	import torch.nn as nn
	import torchvision.transforms as transforms
	import torchvision.datasets as datasets
	from torch.autograd import Variable
	import torch.optim as optim

	#Load our dataset
	train_dataset = datasets.MNIST(root=r'C:\Intel\Data',train=True,transform=transforms.ToTensor(),download=True)
	test_dataset = datasets.MNIST(root=r'C:\Intel\Data',train=False,transform=transforms.ToTensor(),download=True)

	batch_size=100
	epochs = 10

	#Dataset Iterable
	train_load = torch.utils.data.DataLoader(dataset = train_dataset, batch_size = batch_size, shuffle = True)
	test_load = torch.utils.data.DataLoader(dataset = test_dataset, batch_size = batch_size, shuffle = False)

	print(len(train_dataset))
	print(len(test_dataset))
	print(len(train_load))
	print(len(test_load))

	#model class
	class NET(nn.Module):
	def __init__(self):
	super(NET,self).__init__()
	#First Layer
	#Grey - One Channel
	#Same Padding Input Size = Output Size
	#Same Padding = (Filter-1)/2
	self.cnn1 = nn.Conv2d(in_channels=1,out_channels=8,kernel_size=3,stride=1,padding=1)
	self.batchnorm1 = nn.BatchNorm2d(8)
	#Relu
	self.relu = nn.ReLU()
	self.maxpool1 = nn.MaxPool2d(kernel_size=2)
	#After max pool feature map 28/2 = 14
	self.cnn2 = nn.Conv2d(in_channels=8,out_channels=32,kernel_size=5,stride=1,padding=2)
	#Output remains 14
	self.batchnorm2 = nn.BatchNorm2d(32)
	self.maxpool2 = nn.MaxPool2d(kernel_size=2)
	#Feature map = 14/2 = 7
	#3277 = 1568
	#(Input + output) / 2
	# Arbitrary to choose
	self.fc1 = nn.Linear(in_features=1568,out_features=600)
	# Randomly Disables some Neurons
	# Probability of Drop out 0.5
	self.dropout = nn.Dropout(p=0.5)
	self.fc2 = nn.Linear(in_features=600,out_features=10)

	def forward(self,x):
	out = self.cnn1(x)
	out = self.batchnorm1(out)
	out = self.relu(out)
	out = self.maxpool1(out)
	out = self.cnn2(out)
	out = self.batchnorm2(out)
	out = self.relu(out)
	out = self.maxpool2(out)
	# (batch size, 1568)
	# 100 x 1568
	out = out.view(-1,1568)
	out = self.fc1(out)
	out = self.relu(out)
	out = self.dropout(out)
	out = self.fc2(out)
	return out

	model = NET()
	print(model)
	loss_fn = nn.CrossEntropyLoss()
	#Optimizers
	optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)

	iteration = 0
	correct_nodata = 0
	correct_data = 0
	#Run for One iteration and check
	for i,(inputs,labels) in enumerate(train_load):
	if iteration==1:
	break
	inputs = Variable(inputs)
	labels = Variable(labels)
	print("for one iteration, this is what happens:")
	print('Input Shape:',inputs.shape)
	print('Labels Shape:',labels.shape)
	output = model(inputs)
	print('Outputs Shape:',output.shape)
	_, predicted_nodata = torch.max(output,1)
	print('predicted shape',predicted_nodata.shape)
	print('predicted tensor', predicted_nodata)
	correct_nodata += (predicted_nodata==labels).sum()
	print('correct predictions',correct_nodata)
	_, predicted_data = torch.max(output,1)
	print('predicted shape',predicted_data.shape)
	correct_data += (predicted_data==labels).sum()
	print('predicted tensor', predicted_data)
	print('correct predictions',correct_data)
	iteration+=1

	iter = 0
	for epoch in range(epochs):
	for i,(images,labels) in enumerate(train_load):
	iter +=1
	images = Variable(images)
	labels = Variable(labels)

	optimizer.zero_grad()
	outputs = model(images)
	loss = loss_fn(outputs,labels)
	loss.backward()
	optimizer.step()

	#Test the model every 100 iterations
	if (i+1)%100 ==0:
	correct = 0
	total = 0
	for images, labels in test_load:
	images = Variable(images)
	output = model(images)
	_,predicted = torch.max(outputs.data,1)
	total += labels.size(0)
	correct += (predicted==labels).sum()
	print('total',total)
	print('correct',correct)
	accuracy = float(100.*(float(correct/total)))
	#print('iteration:{}, train loss: {}, test accuracy: {}%'.format(iter,loss.data[0],accuracy))
	print('Iteration')
	print(iter)
	print('Loss')
	print(loss.item())
	print('Accuracy')
	print(accuracy)
	print('Done!')

view raw pytorchmnist.py hosted with ❤ by GitHub

Happy Mastering DL!!!

Day #232 - Kafka + Spark Integration - Big Data Setup - Part I

Experimenting with Kafka and Spark using Pyspark

Example 1 - Kafka Publish - Consume

	#KAFKA Producer
	#Function Definition
	from kafka import KafkaConsumer, KafkaProducer
	def connect_kafka_producer():
	_producer = None
	try:
	_producer = KafkaProducer(bootstrap_servers=['ip-XX-XX-XX-XX:9092'], api_version=(0, 10))
	except Exception as ex:
	print('Exception while connecting Kafka')
	print(str(ex))
	finally:
	return _producer
	#Kafka Publish message function
	#Function Definition
	def publish_message(producer_instance, topic_name, key, value):
	try:
	key_bytes = bytes(key, encoding='utf-8')
	value_bytes = bytes(value, encoding='utf-8')
	producer_instance.send(topic_name, key=key_bytes, value=value_bytes)
	producer_instance.flush()
	print('Cab Booking Request published successfully.')
	except Exception as ex:
	print('Exception in publishing message')
	print(str(ex))
	#Producer and send messages for a topic
	#Function Invocation
	kafka_producer = connect_kafka_producer()
	for i in range(1,10):
	for j in range(1,10):
	print(i)
	print(j)
	message = str(i) + ',' + str(j)
	print(message)
	publish_message(kafka_producer, 'cab_request', 'UberGo',message)
	#Read messages from topic
	from kafka import KafkaConsumer
	consumer = KafkaConsumer(bootstrap_servers='ip-XX-XX-XX-XX:9092', auto_offset_reset='earliest')
	consumer.subscribe(['cab_request'])
	print(consumer.partitions_for_topic('cab_request'))
	for message in consumer:
	print (message)

view raw Kafka_Publish_Consume.py hosted with ❤ by GitHub

Example 2 - Kafka Publish - Spark Consume

	#KAFKA Producer
	#Function Definition
	from kafka import KafkaConsumer, KafkaProducer
	def connect_kafka_producer():
	_producer = None
	try:
	_producer = KafkaProducer(bootstrap_servers=['ip-XX-XX-XX-XX:9092'], api_version=(0, 10))
	except Exception as ex:
	print('Exception while connecting Kafka')
	print(str(ex))
	finally:
	return _producer
	#Kafka Publish message function
	#Function Definition
	def publish_message(producer_instance, topic_name, key, value):
	try:
	key_bytes = bytes(key, encoding='utf-8')
	value_bytes = bytes(value, encoding='utf-8')
	producer_instance.send(topic_name, key=key_bytes, value=value_bytes)
	producer_instance.flush()
	print('Cab Booking Request published successfully.')
	except Exception as ex:
	print('Exception in publishing message')
	print(str(ex))
	#Producer and send messages for a topic
	#Function Invocation
	kafka_producer = connect_kafka_producer()
	for i in range(1,10):
	for j in range(1,10):
	print(i)
	print(j)
	message = 'Publish to spark from kafka' + str(i) + ',' + str(j)
	print(message)
	publish_message(kafka_producer, 'cab_request', 'UberGo',message)

view raw Kafka_Publish.py hosted with ❤ by GitHub

	import os
	import time
	import sys
	os.environ['PYSPARK_DRIVER_PYTHON'] = '/usr/bin/python3.6'
	os.environ['PYSPARK_PYTHON'] = '/usr/bin/python3.6'

	from pyspark import SparkContext, SparkConf
	from pyspark.streaming import StreamingContext
	from pyspark.streaming.kafka import KafkaUtils
	from pyspark.sql import SparkSession
	from pyspark.streaming.kafka import TopicAndPartition

	os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.1 pyspark-shell'
	conf = SparkConf().setAppName("Kafka-Spark")
	sc = SparkContext.getOrCreate(conf)
	#2 batches
	stream=StreamingContext(sc,2)
	print(sc.version)
	kafkaBrokers = {"metadata.broker.list": "ip-XX-XX-XX-XX:9092"}
	topic = "cab_request"
	kafkastream = KafkaUtils.createDirectStream(stream, [topic],kafkaBrokers)
	lines = kafkastream.map(lambda x: x[0])
	messagedata = kafkastream.map(lambda x: x[1])
	lines.pprint()
	messagedata.pprint()
	stream.start()
	stream.awaitTermination()

view raw SparkSubscribe.py hosted with ❤ by GitHub

Happy Learning!!!

April 01, 2019

Day #231 - Evaluating Existing Pytorch - ReId - Models

On Ubuntu System

Download Market 1501 Dataset - Link
Download Code from Link
Comment CUDA References

Run the code

Happy Mastering DL!!!

April 30, 2019

April 28, 2019

April 27, 2019

April 26, 2019

April 20, 2019

April 16, 2019

April 15, 2019

April 12, 2019

April 11, 2019

April 09, 2019

April 06, 2019

April 05, 2019

April 04, 2019

April 03, 2019

April 02, 2019

April 01, 2019

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts