"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

June 30, 2020

A Survey of Deep Learning Techniques for Autonomous Driving

A Survey of Deep Learning Techniques for Autonomous Driving
Key Summary
  • Data Sources - Cameras, radars, LiDARs, ultrasonic sensors, GPS units and/or inertial sensors
AI for different components
  • Perception and Localization - Segmentation
  • High-Level Path Planning  - Road Ahead / Turns 
  • Behavior Arbitration, or low-level path planning - Steering
  • Motion Controllers

Recurrent Neural Networks (RNN) are especially good in processing temporal sequence data, such as text, or video streams
Long Short-Term Memory (LSTM) [17] networks are non-linear function approximators for estimating temporal dependencies in sequence data
Reinforcement Learning using Partially Observable Markov Decision Process (POMDP)
formalism
  • Perception and Localization - Segmentation
  • Object detection and recognition, semantic segmentation
  • Tesla® tries to leverage on its camera systems, whereas Waymo’s driving technology relies more on Lidar sensors4
  • Waymo - 5 Radars, 8 Camera's
  • Tesla - 8 Cameras, 12 ultrasonic sensors, one forward-facing radar
Driving Scene Understanding
  • Single-stage detectors do not provide the same performances as double stage detectors but are significantly faster.
  • The main disadvantage of using a LiDAR in the sensory suite of a self-driving car is primarily its cost
Semantic and Instance Segmentation
  • Drivable area, pedestrians, traffic participants, buildings
  • SegNet, AdapNet, Mask R-CNN
Localization
  • Localization algorithms aim at calculating the pose (position and orientation) 
  • The structure of the environment can be mapped incrementally with the computation of the camera pose - Simultaneous Localization and Mapping (SLAM)



Happy Learning!!!

June 29, 2020

Webinar - Keynotes - From Notebook to Kubeflow Pipelines



Key Notes
  • Kubeflow - Deployment of ML workflow on Kubernetes
  • Different companies different platforms, Leverage Kubeflow as a deployment platform
  • ML Aspects of Deployment
  • CI / CD requirements, Iterate and push code to production
  • Complexities in ML, Diverse tools Notebook / IDE, Notebook to pipeline, CUJ - Critical user journey
  • Kubeflow pipelines. Data versioning, snapshots, Tools - Kale, Arrikto (Data Management)
  • Deploying Challenges, Composable pipeline, Single steps for different hardware


Kubeflow, MiniKF. Choose Deployment Name in Cloud Console, Kubeflow-kale




Code - Link

Happy Learning!!!

June 28, 2020

Computer Vision

Whatever you can detect you can track
Whatever you can detect you can count
Whatever you can count you can find the trends and patterns
Data to insights and uncover newer perspectives 

Happy Learning!!!

Weekend Reads - Detectors - Single / Two Stage Analysis - Papers

Paper #1 - Optimizing the Trade-off between Single-Stage and Two-Stage Deep Object Detectors using Image Difficulty Prediction

Key Notes
Two-stage detectors - Mask R-CNN
  • Stage 1 - Region Proposal Network to generate regions of interests
  • Stage 2 - Pipeline for object classification and bounding-box regression Comments - Highest accuracy rates, typically slower
Single-stage detectors - YOLO (You Only Look Once) and SSD (Single Shot MultiBox Detector)
Treat object detection as a simple regression problem by taking an input image and learning the class probabilities and bounding box coordinates.
Comments -  lower accuracy rates, but are much faster

Implementation
  • Step 1 - YOLO works by dividing each image into a fixed grid, and for each grid location, it predicts a number of bounding boxes and confidence for each bounding box
  • Step 2 - The confidence reflects the accuracy of the bounding box and whether the bounding box actually contains an object (regardless of class)
  • Step 3 - YOLO also predicts the classification score for each box for every class in training
Novelty 
  • Image difficulty predictor
  • Easy images are sent to the faster single-stage detector
  • Hard images are sent to the more accurate two-stage detector
Image difficulty predictor. We build our image difficulty prediction model based on CNN features and linear regression with ν-Support Vector Regression

This is a very practical approach to chose between models vs execution time vs accuracy


Paper #2 - Light-Head R-CNN: In Defense of Two-Stage Object Detector

Two-Stage Detectors -  like Faster RCNN The two-stage detector divides the task into two steps:
  • The first step (body) generates many proposals, 
  • The second step (head) focuses on the recognition of the proposals.
  • In order to achieve the best accuracy, the design of the head is heavy
  • They first propose potential object locations in an image—region proposals—and then apply a classifier to these regions to score potential detections
  • Earlier sliding window approaches ran into scaling problems
Faster R-CNN [28] introduces Region Proposal Network (RPN) to generate proposals by using network features
Feature Pyramid Networks (FPN) [19], which exploits inherent multiscale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids

Novelty
  • Light-head design to build an efficient yet accurate two-stage detector
  • Large-kernel separable convolution to produce “thin” feature maps with small channel number (α × p × p is used in our experiments and α ≤ 10). 
  • Our Light-Head R-CNN builds “thin” feature maps before RoI warping. RPN (Region Proposal Network) is a sliding-window class agnostic object detector that use features from C4
  • Non-maximum suppression (NMS) is used to reduce the number of proposals

Paper #3 - RetinaMask: Learning to predict masks improves state-of-the-art single-shot
detection for free
  • Single Shot Detector Applications -  embedded vision applications, self-driving cars, and mobile phone vision
  • Two-stage detectors significantly beat single stage detectors on the speed-vs-accuracy tradeoff on a standard desktop/workstation + high-end GPU configurations
Novelty
  • Novel instance mask prediction head to the single-shot RetinaNet detector
  • Self-adjusting loss function that improves robustness during training 
  • Smooth L1 and Self-Adjusting Smooth L1

Keep Thinking and Happy Learning!!!

June 24, 2020

Tips to Hack Face Recognition

  • Data Privacy Issues / Rights to images
  • Faces have light and dark regions
  • Manipulate Cheeks, Forehead, Chin, Lips
  • Manipulate light and dark areas
  • Different Hairstyles, Makeups, Different patches 
Successful Hack Examples








Anti-pattern approach

Keep Thinking!!!

June 15, 2020

Research paper reads - Video Pipeline architecture

Paper #1 - VideoPipe: Building Video Stream Processing Pipelines at the Edge

Key Notes
  • Challenges are resource limitations, computationally expensive machine learning algorithms
  • Edge Devices have limited battery, processing power, network latency
  • Optimize Models - Model quantization and model compression
Edge Analytics Challenges
  • Support heterogeneous devices
  • Support high frame rates
  • Avoid delays perceivable by the user
  • Video frames will flow through the modules
  • Pipeline processing distributed across devices
  • Different services/models - object detection, face detection, activity recognition, and object tracking
Message Transfer Protocol
  • ZeroMQ - high-performance asynchronous messaging library
  • Images that are passed between devices are encoded/decoded and transferred using ZeroMQ
  • Evaluation Metrics - Time for Load Frame, Pose, Activity Detect, Total Duration
Time Analysis
  • Frames per second
  • Encode / decode
  • Model Loading
  • Model Inference
  • Result
  • Network Latency
  • Round Trip Time
Alternative Solutions
  • Vigil
  • VideoEdge
  • Chameleon
  • GStreamer
  • Gstreamer drawback - did not support video processing pipelines across multiple devices
Paper #2 - Enabling Scalable Edge Video Analytics with Computing-InNetwork
Requirements
  • Adaptive to real-time video content
  • Leveraging real-time feedback from the consumer
Configuration parameters
  • Resolution
  • Frames rate
  • Object detector
Periodic reprofiling to decide on optimal parameters. Driver video streaming by server-side logic

Paper #3 - EdgeEye - An Edge Service Framework for Real-time Intelligent Video Analytics
Key Notes
  • Deployable components with minimal effort
  • Optimized inference engines
  • GStreamer for pipeline
  • Kurento is an open source WebRTC [4] media server
More Reads
Networked Cameras Are the New Big Data Clusters
Edge Enhanced Deep Learning System for Large-scale Video Stream Analytics
Chameleon: Scalable Adaptation of Video Analytics
SIAT: A Distributed Video Analytics Framework for Intelligent Video Surveillance
Webrtc on rasberry pi, gstreamer
Webrtc streams for surveillance
aiortc is a library for Web Real-Time Communication (WebRTC) and Object Real-Time Communication (ORTC) in Python

Keep Thinking!!!

Interesting Project - Microsoft Project - Rocket

Project Rocket‘s goal is to democratize video analytics

Project Link
Git Project codebase - Link
Pipeline Details - Link

Different Pipeline options
  • Smart Camera
  • Encode Data
  • Decode Data
  • Run detections (Yolo / Yololite / Tensorflow)
  • Stored in Azure Cloud

  • Lightweight models run on edge devices
  • Heavy Models run on cloud
  • Images Encoded and Passed as API Call to cloud

A lot of steps mentioned. The same approach is possible with Rasberry PI as well.

Happy Learning!!!

June 13, 2020

Day #336 - Image Encoding / Decoding Python


Happy Learning!!!

Dreams

There is a constant battle for
Doing what I like to do vs Delivering what my job needs vs Moving further pursuing my dreams

Even If I run a startup I need to change according to the customer needs and projects. Build your dream product from the customer perspective and differentiate it from existing competitors and make a dent in the market.

Sometimes you need to believe in your dream when no one believes, Money is important but not more than your passion. 
Build on your passion vs Save more money vs Add more memories to Life.

Sometimes I wonder its better to build on a vision than working on projects. Vision can translate into a bigger goal but with multiple stakeholders involved translating a project into a vision is not the end goal of a project.

Keep Thinking!!!

June 12, 2020

Interesting Startup greendeck.co

Interesting Startup  - greendeck 

Price analytics is a very interesting area. Setting the right price based on available inventory, demand, supply, seasonality are multiple aspects. The end goal is maximum profits and optimal pricing.

The question then boils down to
  • Monitor competitor products prices
  • Monitor competitor product availability
  • Assess market demand at the SKU level
Web Scrapping is illegal. No free lunch, Then how prices are monitored. 
  • Bots
  • Crawlers
  • Simulate with random IPs to simulate virtual users
  • Selenium
Different pricing strategies are
  • Competitive pricing
  • Predatory pricing
  • Volume pricing
  • Seasonal pricing
  • Scarcity / Pandemic Pricing (COVID made me realize)
Interesting startup greendeck. Bunch of young minds solving this problem with AI.

My assessment of the key things involved in implementation. The implementation would be specific for a category/segment. The subtasks at SKU level would be
  • Pick Selection - Top 100 products
  • Seller pages
  • Scrap the data
  • Find all competitive SKUs
  • Monitor the pricing on a daily basis
Our area of interest is limited to competitors, vendors, and finding optimal pricing for profits.
  • Data Collection
  • Data Insights / Trend Analysis
  • Analysis of product pricing
  • Competitor pricing analysis
  • Correlate findings
  • AI / ML for price recommendations / Optimization Problems
  • Configurable Rules drive pricing
As long as you can optimally recommend a price and make more profits/sales in different A/B experiments you can measure the increase in profits with/without pricing analytics engine.

If we think from the customer perspective we will put the customer first and technology after user experience. Hard to balance both the views. When you balance you will beat your customer expectations.







Interesting Reads
Keep Thinking!!! 

June 09, 2020

Mastering Technology vs Solving Use Cases

If you think computer vision is object detection then whole software development is cruds. Everything in software development is around create-read-update-delete-search. Solve a business use case. Learn the required technical skills along the way. Think and solve for the customer as a customer. The solution that solves customer needs will succeed not just the tech path. Technical skills without customer focus will end up mastering technology without solving customer problems. Average technical skills and good customer focus can get the required revenue to hire or upgrade the product in due course. Business needs technology it's not the other way.

Keep Thinking!!!

June 07, 2020

Weekend Learning - Image Searching - Papers

Paper #1 - Learning Fine-grained Image Similarity with Deep Ranking
Key Notes
  • Extract features like Gabor filters
  • HOG
Triplet Architecture
  • Query image, positive image, and negative image
  • A triplet contains a query image, a positive image, and a negative image
  • Positive image is more similar to the query image than the negative image
  • Meaningful and discriminative triplets
  • A triplet characterizes the relative similarity relationship for the three images.
  • The deep ranking model employs a triplet-based hinge loss ranking function to characterize fine-grained image similarity relationships
Person reid for keras
Person re-ID baseline with triplet loss
Image similarity using Triplet Loss 
BUILDING A REVERSE IMAGE SEARCH ENGINE

Paper #2 - Evaluation of Distance Measures for Feature based Image Registration using AlexNet
Key Notes - Distance measures
  • Cityblock distance: Measures the path between the pixels based on four connected neighbourhood.
  • Euclidean distance: Most commonly used metric to find the difference, calculates the square root of the sum of the absolute differences between two feature points
  • Cosine distance: Finds the normalized dot product of the two feature points
  • Minkowski distance: Is a generalization of Euclidean Distance
  • Correlation distance: The correlation of feature two points, p and q, with k dimensions
  • Cosine dissimilarity measure, followed by correlation, consistently gives better matching and registration across images of various deformations

Imagesimilarity

from scipy import signal
cor = signal.correlate2d (im1, im2)

Cosine similarity in Python
Image Retrieval (via Autoencoders / Transfer Learning)
artificio: A suite of computer vision deep learning algorithms
Image similarity using Deep CNN and Curriculum Learning
Forensic Similarity for Digital Images
Geometric Image Correspondence Verification by Dense Pixel Matching
A Fast Compression-based Similarity Measure with Applications to Content-based Image Retrieval
Convolutional neural network architecture for geometric matching

Happy Learning!!!

The Personalization behind push notifications

Few months back I was tasked for one assignment - Optimal time to send push notifications for customers. I was looking at segmenting the shared datasets and figuring out trends/patterns to drive recommendations accordingly. Today after reading some related articles I got a better perspective on the ML behind push notifications.

Some Key learnings are
  • Time Zone based Analysis
  • Delivery time vs Notification Open rate duration
Optimal Time
  • Recommend Optimal time based on user app engagement patterns
  • Personalized times based on app usage history
Data Insights / Analysis
  • Windowing of Sent Time
  • Windowing of Read Time
  • Windowing on Count of Emails sent
  • 70% customers received more than 5 emails
  • 30% customers received less than 5 mails
  • Develop Model with < 5, > 5 to predict Predict Average Read time for different windows 
  • Combination of rules based, linear regression, classification based approach
ML Variables
  • Users Login time
  • Users Engagement Window period (Early Morning / Lunch / Evening)
  • How many times user checks the App
  • Earliest login time in a day
  • Login time during weekends
  • Message check duration between weekday vs weekend
  • Respose for different product categories / age groups 
  • New categories - News / Travel / Games category based analysis
  • Region 
  • Location
  • Career / Handset Type
  • Age / Gender
  • Android vs Ios Based Analysis
More Reads
Personalized push notifications enabled by artificial intelligence
What is the best time to send push notifications?
Insights from Analyzing 1.5 Billion Push Notifications

Happy Learning!!!

June 06, 2020

Learning Notes - Action Recognition - Part II

Paper #1 - Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection

Key Notes
  • Extraction of local spatio-temporal features followed by temporal modeling
Spatio-temporal feature extraction
  • Sample consecutive frames
  • Optical flow for temporal modeling
  • Dense Trajectory (IDT), Motion History Image (MHI)
Network Architecture
  • Bi-directional LSTM
  • Spatial-temporal CNN (STCNN) with Segmentation models
  • Temporal convolutional networks (TCN)
  • Temporal deformable residual networks (TDRN) 
Different Convolution Strategies
  • Standard convolution - The standard convolutions use the box, unchangeable shape of the filters
  • Dilated convolution - Dilating the filter means expanding its size filling the empty positions with zeros.
  • #out = Conv2D(10, (3, 3), dilation_rate=2)(input_tensor)
  • Deformable convolution - he deformable convolutions learn the filter shapes and adjust shapes to the most frequent cases
Implementation
  • Downsampled to 6fps
  • Frames were resized to 224x224 and augmented using random cropping and mean removal
  • Each video snippet contained 16 frames after sampling
Key Notes
  • Generative Adversarial Network (GAN) to generate exact joint locations from noisy probability heat maps
  • Detection classification is applied to a continuous sequence of videos of multiple activities
  • Generative adversarial network (GAN) to produce potential body joint locations in an unsupervised manner
Features
  • Optical flow (OF) and feature matching
  • Picking from shelf vs putting back
  • Joint location estimation results using GAN-based approach.
  • Actions - Reach, Retract, Hand in, Insp. Product, Insp. Shelf
  • Fashion Dataset Keypoint detection similar approach can be leveraged here too

Key Notes
  • Temporal Convolutional Networks (TCNs)
  • Two types of TCNs 
  • First, our EncoderDecoder TCN (ED-TCN) only uses a hierarchy of temporal convolutions, pooling, and upsampling but can efficiently capture long-range temporal patterns.
  • Second, Dilated TCN uses dilated convolutions
Code Temporal Convolutional Networks
More Reads
An introduction to ConvLSTM
Keras Convolutional LSTM network
Dense-Optical-Flow
Anomaly Detection in Videos using LSTM Convolutional Autoencoder
Attention Based CNN-ConvLSTM for Pedestrian Attribute Recognition

June 04, 2020

Planogram Insights

What are the different ways to think about Product assortment / Planogram? 

At Store Level
  • Key selling items
  • Key profit-making items
  • Volume vs Quantity vs Price
  • Seasonality
  • Brand consciousness
  • Buyer / Consumer Analysis
Historical data
  • Inventory data
  • Sales data
  • Sale patterns 
Assortment Insights
  • Category management
  • Size Decisions
  • Omnichannel distribution
SKU level ML variables
  • SKU
  • Seasonal
  • dimensions
  • Brand
  • Size
  • Colour
  • Price
  • Units sold
  • TimePeriod
  • Seasonallypurchased
  • Frequentlypurchased
  • NewProductCustomerPreference
  • PricingStrategy - (Promotional, Differential, Product Line, Psychological, Competitive, Premium, Optimal, Bundle, Penetration)
  • PriceHistory
  • Stocklevels
  • ProductVisibility
  • FulfillmentMethod
  • ReturnRate
More Products / Reads
How Data-driven Floor Planning Reinforces Merchandising Strategies
What Is Assortment Planning?
Assortment Planning
Assortment Planning
ShelfLogic
mi9Retail
visualretailing
Smartdraw
Repricer
Aptos

Keep Thinking!!!

June 03, 2020

Reading Notes - Clothing Segments / Fashion Related Papers

Paper #1 - Implementation of Real Time Dress-up System based on Image Blending
Key Notes
  • Dominant colors based segmentation method
  • Selected dress is scaled and rendered to fit with the subject’s body
Implementation
  • Face Detection, Torso Detection
  • Segmentation of region of cloth
  • Dress resizing
  • Dress blending
Key Approaches
  • Pixel clustering and region merging
  • K-means algorithm is used to cluster all the pixels values
Dress-up Algorithm
  • Segmentation and replacing pixel values of frame by the pixel values of input dress image

As of 2020, With segmentation, we can detect the upper body, lower body. We can use the target dress up algorithm to replace the new design as needed.

Paper #2 - Getting the Look: Clothing Recognition and Segmentation for Automatic Product Suggestions in Everyday Photos 
Key Notes
  • Detect the clothing classes present in the query image
  • Image retrieval techniques to retrieve visually similar products belonging to each class found present
  • Pose estimations, with the body parts depicted as colored boxes
  • Clothing segmentation with Fashionista dataset
  • Normalized binary mask for each segment
  • For texture we used Local binary patterns (LBPs)
  • For each search, we get the k most similar products
  • Approximate Gaussian Mixture (AGM) clustering algorithm

Paper #3 - Animated Image Cloth Segmentation
Key Notes
  • K-means clustering of image
  • Segmenting the images using Gabor filters for the textured regions 
Texture based segmentation
  • Input Image, Gabor Filtering, Gaussian smoothing Clustering, Segmented Image
More Reads
BACKGROUND FOREGROUND SEGMENTATION METHODS IN ANALYSIS OF LIVE SPORT VIDEO RECORDINGS

Paper - Image Based Virtual Try-on Network from Unpaired Data
Key Notes
  • Synthesize images of multiple garments
  • online shopping does not enable physical try-on
Novelty
  • Inexpensive data collection and training process
  • Online optimization capability
Steps
  • Image-to-image translation network called pix2pix, that maps images from one domain to another (mouth open/closed, beard/no beard, glasses/no glasses, gender)
  • Shape context
  • Determine how to warp a garment image to fit the geometry 
  • Convolutional geometric matcher
Networks
  • Shape Generation
  • Appearance Generation
  • Optimization
Models
  • Pose Estimation
  • Segmentation
  • GAN is used to warp the reference garment onto the query person image (SwapNet) swaps entire outfits between two query images using GAN
  • DensePose network
Keep Thinking!!!