"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

March 29, 2020

Corona Stats - As of March28th

Data Source - Link (As of March28th data)
Case Stats and Growth Trend
Start Date - 2019-12-31
  • Day 69 - 102133
  • Day 81 - 213258
  • Day 84 - 305270
  • Day 87 - 417061
  • Day 89 - 528019
Summary
  • 1st 100K - 69 Days
  • 2nd 100K - 12 days
  • 3rd 100K - 3 days
  • 4th 100K - 3 days
  • 5th 100K - 2 days

Death Stats and Trend
  • Day 1 - 2019-12-31 
  • Day 76 - 5407
  • Day 83 - 11251
  • Day 86 - 16365
  • Day 88 - 20991
Summary
  • First 5K - 76 Days
  • Second 5K - 7 Days
  • Third 5K - 3 Days
  • Fourth 5K - 2 Days

Case Distribution by Country


Fatality



I hope we get through this challenge and recover soon. With global lockdown measures hope we observe downwards trend in the coming weeks.

Good Read - Response to COVID-19 in Taiwan Big Data Analytics, New Technology, and Proactive Testing

Key Points
  • Specific approaches for case identification, containment, and resource allocation
Databases Leveraged
  • Immigration and customs database for travel to Risk Areas
  • Health insurance database for proactively seeking out patients with severe respiratory symptoms 
Risk Categorization
  • Low risk (no travel to level 3 alert areas) 
  • Higher risk (recent travel to level 3 alert areas) 
Inference
  • Real-time alerts during a clinical visit based on travel history and clinical symptoms to aid case identification
  • Tracked through their mobile phone from Self Quarantine

Key Summary Points (Implementation)
  • Avoid partial solutions
  • Learning is critical
Key Summary Points (Lessons Learnt)
  • Extensive testing 
  • Proactive tracing
  • Home diagnosis
  • Monitor and protect health care and other essential workers
Corona Perspectives (July 5th 2020)

Covid Cycle
Unlock Cycle
IT Impact
Carefully we need to plan, bride the gap to address the gaps in the economy, unorganized sectors, poor performing domains. Hope the new normal provide more innovation and newer job opportunities

From NPTEL Lecture Link







Webinar 2 - Link






Keep thinking!!! 

March 21, 2020

Corona impact in Retail

Essentials, Medical and Food supplies, and eCommerce will have a spiked up demand. Clothing / Fashion / Toys / Luxury brands /Smartphone and non-essentials will have an impact leading to reduced sales / temporary closure of stores.

Business Impact
  • Reduced Store Traffic
  • Revisit on Sales Forecasts
  • Temporary Closure of poorly performing stores
  • Supply chain / Manufacturing Delays / Reduced Demands
Alternatives
  • Omni Channel Support
  • Contactless delivery
  • Equip Store associates with Sufficient safety procedures
  • More Sanitizing efforts for store associates/customers
  • A shift for e-commerce mode
  • Use offline data for online personalization
  • Stock up / Align towards products in demand (Healthcare / Medical / Essentials etc)
It will take time to recover and reset/fix the entire supply chain, manufacturing, overcomes direct/indirect job loss, economic impact. Hoping it will be handled well and things will come back to normalcy soon.

These Struggling Retailers May Suffer Their Final Blow From The Coronavirus Lockdown
Challenges
  • Rent and other fixed costs of running
  • Employee Salaries
  • Cash on their balance sheet 
Be Positive!!!
Keep Thinking!!!
Practice Social Distancing!!

Retail AI Landscape

Survey of Retail Landscape, Use cases, Startups




Keep Thinking!!!

March 20, 2020

Day #334 - Lessons Learnt in evaluating SQL 2016 Performance Features

Sharing my lessons on proposing SQL In-Memory table implementation for the product I worked with. I worked with Sunil Agarwal from the SQL product team to evaluate the features, benefits, migration approach, etc.



Happy Learning!!!

SQL 2019 - Interesting Features

SQL 2019 - Interesting Features (Link)

I have a lot of bias for SQL Server. Some SQL 2019 features are awesome. The things I liked are
Query heterogeneous databases with Polybase (Polybase feature was there in 2016 too but the databases supported is not as many as I see now)
  • Polybase provides in SQL Server 2019 through a concept called an EXTERNAL TABLE.
  • External tables are just like SQL Server tables except SQL Server only stores the metadata of the table definition
  • Polybase uses ODBC drivers to connect to sources such as Oracle, Teradata, MongoDB, and SQL Server.
  • Support for SQL, NoSQL
  • Support for ML Engine
  • Support for HDFS
These are promising features. Obviously, there will be some product limitations in the early stages.
Highlights
  • Support for unstructured data
  • Heterogenous database support
  • Schema on read is achieved with external tables
  • SQL wrapper to query both different databases / unstructured data
  • Integration with HDFS
  • ML APIs / Visualization features
Very good move to accommodate / position SQL as an Integration Database engine for heterogenous/unstructured/structured data

Happy Learning!!!

Interesting Product - Intelligent shopping cart

Another interesting product - an intelligent shopping cart

Key features are
  • instore navigation
  • store promotions
  • product suggestions
  • scans and weighs products
  • displays a running tally of purchases 
  • pay on the spot with the cart
Technical Implementation - Link
  • Product Scan (Barcode / RFID)
  • pay/card swipe attached
  • UI to display items scanned / list products
AI Solutions
  • Object Detection 
  • Weight + Object detection for counting
Cons / Concerns
  • Cost of the cart/maintenance 
  • Accuracy of items detected 
  • Accuracy of the count of items for smaller products 
Keep Thinking!!!

March 19, 2020

Day #333 - Deep Learning Guidelines

CI / CD, DL frameworks, Buy vs Develop are different sets of challenges. The more you learn, the more you feel you have a lot to learn :). Learning / doing/debugging/testing everything is part of learning. Keep going!!!

Different levels of learning are required for a different set of challenges.
  • Mastering Keras vs Pytorch vs Tensorflow 
  • Knowing Advanced features of Data Pipelines / Porting in Edge Devices
  • Building end to end the flow of Edge Analytics -> Data Consolidation -> Reporting
  • Deployment of this overall end to end solution
  • Accuracy / Understanding real-world challenges and next incremental  steps
This link provides a good guideline 

The ML tools landscape is very useful



Key Notes
Step #1 - Data
  • Data Storage
  • Data ETL Process (Workflow / Async Process)
  • Data Labelling (Raw Data -> Modelled)
  • Data Versioning

Step #2 - Development / Traning
  • DL Frameworks
  • Source code management
  • Store & Retrieve Results
  • Distributed Training

Step #3 - Deployment
  • Build Tools
  • Web Deployments
  • Monitoring predictions
  • Edge Devices / Custom Hardware Deployment
DL Frameworks


Key Notes
  • Caffe - C++ based (Fintech used Caffe)
  • Tensorflow - Google (Mobile, JS, Scalable Deployment) - Abstraction - Computational Graph
  • Keras - Wrapper on Tensorflow
  • PyTorch - FB product
ML Code Management for Training / Deployment / Serving







Key Lessons
  • Training System (Model Development)
  • Production System (Ready to use Model, Setup)
  • Serving System (Web App or anything that serves model)
On all these three levels there is a certain set of tests run to validate every layer - Train / Model / Production Serving Tests

Infrastructure (Buy vs Build)




Deep Learning Optimization






Data Versioning



Key Lessons
  • Unversioned Data (file system) (L0)
  • Version with a snapshot - Daily data (L1), Data backup with Date
  • A mix of assets and code (L2), JSON or any other labeled storage 
  • L3 - Specialized solution - DVC, Pachyderm, Quill  
Training Neural Nets: a Hacker’s Perspective
Common Coding Mistakes
  • The incorrect shape of tensors
  • Preprocessing inputs incorrectly
  • Incorrect loss function
  • Numerical computation errors (NaN)
Troubleshooting Deep Neural Networks
Troubleshooting Deep Neural Networks

Happy Learning!!!

Distributed Systems - Session #3 - Aurora

Sometimes I felt not connected to the session. Needs a lot of focus and patience to stay connected and focused :)


Key Summary points
  • Amazon early offering EC2
  • Rented out VMs to customers
  • VMM (Virtual Machine Monitors) that run/manage EC2 instances
  • EC2 good for stateless web servers
  • S3 - Scheme for storing large chunks of data (Periodic Snapshots)
  • Disks for EC2 instances - Fault Tolerance (EBS)
  • EBS (Elastic Block Store) - Looks for EC2 instances as it is a harddrive
  • Databases on EBS sends a large volume of data over the network
  • Amount of writes on Network Storage System
  • CPU / Disk space consumption
  • EC2 / EBS are in same availability zone
  • Transaction & Crash Recovery
  • Transaction (Sequence of operations / commands / atomic / ex- bank transfer money between accounts)
  • Reads page from disk
  • Make Changes in local cache
  • Then write changes to disk
  • Log entries describe the transaction
  • Three log records - Modify Operation, Old Value, New Value
  • Aurora is based on MySQL
  • RDS (Database replicated in multiple availability zones)
  • All the transactions mirrored to other databases (EBS Servers)
  • Multiple copies managed and updated to keep everything in sync
  • Read / Write Quorum will overlap 
  • Voting does not work to read from which server
  • These systems have version numbers
  • Readers takes the ones with highest version number
  • Split database into replicas
  • Data Sharding
  • Data across protection groups
Happy Learning!!!

March 18, 2020

Distributed Systems - Session #2




I paused it a lot as I didn't really get involved much but finally managed to complete it.

Key Lessons
  • Go lang examples for threading, locking, RPC, Typesafe and memory safe, Garbage Collected
  • Threads - Tools to manage concurrency in programs
  • Stacks are within address space of the program
  • I/O Concurrency - Overlapping of progress of different activities wait ing / executing
  • Parallelism - Parallelize CPU / IO cycles / routines
  • Process is a single program / single address space. Inside process there are multiple threads
  • Process -> memory area -> routines sit inside the process
  • Process implemented by the operating system
  • Thread challenges - Sharing data
  • Mutex / Locks for shared data
  • Data Access - Managing Locks / Deadlocks / Starvation / Blocking
  • Channels (Go Lang) - Send data between threads
  • WaitGroup, Sync.Cond
  • Webcrawlers design for parallel processing using threads
  • Handling concurrency / multiple parallel threads / optimum network capacity utilization
  • Remember doing SSIS ETL parallel tasks for Data pull
A multi-threaded Web crawler implemented in Python
Crawler
Multi-Threaded Crawler in Python

Happy Learning!!!

Staying updated in Data science - My 5 Lessons

  1. Reddit, tweets, LinkedIn follows news, analytics blogs, links, Lex Fridman interviews, Stanford / MIT / Cornell updated courses
  2. Look at Kaggle kernels, understand feature variables, newer features build. Learn domain-specific findings
  3. Read research papers and try to look for techniques in video/text/ audio projects which you can reapply
  4. Look at Github examples and code them in your free time. This help to know coding practices/ best practices
  5. A lot of industry-specific products we can find by digging deep on AI technology and product landscape. Top 100 AI companies, AI product blogs, etc..
Teach, blog in different mediums. This helps to learn, gather different perspectives.  If you have observed technology and know the underlying pattern/architecture you can better connect the product, purpose, and applications of the tool.
During ML interviews I did find most interviewing folks 6 to 7 years younger than me. I came from DB BI to the AI world. It's a good feel to continue code, coach, teach a younger set of folks.

Good Read (Link)
Reading Research papers

Happy Learning!!!

Data Perspectives

Different perspectives to decide on choosing the right database?
  • Strict data types - Schema on write
  • Schemaless data - Schema on read
  • Read-only immutable data
  • Eventually consistent data
  • Dirty read vs Committed data
  • Multi-version concurrency control
  • Replicate data based on logs
  • Replay committed logs
  • Data sharding
  • High reads consistent data - RDBMS
  • High writes low reads - HBase, Cassandra
  • Document-based storage - Mongodb, Couchdb
  • CAP, ACID Properties
Things I Wished More Developers Knew About Databases

Almost similar and deep-dive techniques from the tweet conversation
  • Read heavy vs write heavy. Insert vs updates. Vaccuuming
  • Replication or not, transaction logging, why indexes matter, performance tuning, i/o scheduler, unicode, gender isn't binary
  • Locks, cache effects, isolation levels
  • IO bound vs network bound especially in the situation of replication, scaling strayegy, concurrency vs distributed.
  • Materialized views, and the dangers of invalidating them unexpectedly.
  • Connection pool, scaling techniques to handle distributed application / system, improve performance, optimization of query etc.
  • I'd be interested in how this applies to a distributed system. Concurrency (specifically MVCC), connections, DB threading, backpressure handling
  • Disk storage implementation and optimization

Keep Thinking!!! 

Analytics Leaders

There are three types of Leaders in my perspective
  1. Technical Leaders  - Coming up with new strategies/solutions, publishing papers, case studies. They propel/push the limits of tech to the next level. It takes time, effort to analyze, perform experiments and publish the observations.
  2. Business Leaders - Able to find business use cases that can be solved with AI. Mapping relevant AI use cases for business/domains
  3. Practitioner Leaders - Early Adopters to apply the techniques in solving the business problems, experimenting and leveraging different techniques, papers and newer approaches to solve business problems.
Keep Thinking!!!

SQL Performance Tuning & Coding Guidelines

This vacation was useful to find some of my prior work / presentation. Sharing some of my Earlier SQL Performance Tuning Slides I did my Balmukund from SQL Product Support Team.






Happy Learning!!!

March 16, 2020

stitchfix Blog Post - This post provides Data Strategy for Data Science

This post  provides insights into Data Science Strategy in stitchfix

Problem Solving Approach (Use Cases - Data - Models)
  • Step #1 - Business Use Cases -> Finding Relevant Data -> Providing Data with ETL 
  • Step #2 - Data - Multiple Models
  • Step #3 - API to consume results and use data for decision making
Key Lessons
  • Availability of Raw Data
  • Building ETL for data updates
  • Data Pipelines for Feature Engineering
  • Different Data Science Algos for Algorithms
  • Data Science uses cases driven from the business context
Data Demands
  • Raw Data Access (Pull Everything to a Data lake)
  • Data updates / Deletes (Data lake updates with events)
  • Feature variables (Custom ETL to select, transform data from raw data)
Experimentation
  • A / B Testing
  • Validating with real-time results
  • Ongoing correction of models
Connecting Data and Science
  • Overlapping functions with Domain, Data and Data Science Knowledge
  • A lot of Experimentation
Algorithms (Data Science use cases)
  • Style Recommenders (Recombining Attributes from existing styles adding feedback), Developing Design with a certain set of attributes
  • Warehouse Assignment (Shipping cost, shipping time, inventory match)
  • Inventory Forecast (Demand, Unit Price, Total Cost, Ordering Cost, Carrying cost, Season, Recently emailed etc)
  • Fashion Design Algorithms
  • Buying Algorithms
  • Engagement Algorithms
  • Messaging Algorithms
  • Capacity Optimization
  • Assignment Optimization
  • Network Optimization
  • Visitor Qual Algorithms
  • Latent Size Algorithms
  • Latent Fit Algorithms
  • Batch Picking Algorithm
  • Global Optimizations
  • Pick Path Algorithm
  • Virtual Warehouses
  • Sizebreak Algorithms
  • Planning Algorithms
  • Assortment Algorithms
  • Replenishment Algorithms
Use Case Categorization
  • Customer Context - Style Recommenders, Fashion Design Algorithms, Latent Size Algorithms, Latent Fit Algorithms
  • Retailer Context - Business Use Cases (Inventory Forecast, Replenishment Algorithms)
  • Warehouses Use Cases - Assignment Optimization, Allocation 
  • Clients Use Cases - Style recommendations, Demand Predictions
  • Optimize Supply Chain - Warehouse Assignment, Pick Path Algorithm
Data Science - Algorithm Demands
  • Assortment Algorithms - Apriori / Market Basket Analysis
  • Targeting Algorithms - Recommendations 
  • Replenishment Algorithms - Forecasting
  • Allocation Algorithms - Resource Allocation
  • Virtualized Warehouses - Demand Forecasting
Key Lessons
  • Data Science Use cases in Retail Space
  • Data Science Use cases in Supply Chain
  • Data Science Use cases in Fashion, Ecommerce Segments
  • Data Lake Strategy for Data Science
  • Bird's Eye view for picking right use cases
Keep Thinking!!!

March 15, 2020

Interesting Product - https://www.glisten.ai/

Techcrunch posted on this article. A great example for Computer Vision in Fashion. A very niche product idea. In fact, I am doing my prototypes for a similar idea :)

My Analysis of Models / Approach Involved
  • Multi-Label Detection - Clothes Combination
  • Landmark Detection - Bounding boxes for Upper Body, Lower Body
  • Pattern Detection - Use the extracted boxes and detect patterns
  • Color Detection - Find the maximum color present in the detected portion
  • Gender Detection Models - Extract Face, Identify Gender
  • OCR - Scan the text content, search for product attributes for Scanned product
A very interesting niche ML Idea :)

Happy Learning!!!

March 13, 2020

How to Track Possible Secondary and Tertiary Contacts of Infected Corona Patient

Tracking Potential Patients
  • Identify places visited based on google history, GPS Tracking, Rides opted of Infected Patients
  • Identifying their movements mapped to mobile signals, Nearest Mobile signals, This will also highlight potential mobile numbers in the vicinity
  • Continuously monitoring the key factors, screening in regular intervals of Secondary and Tertiary Contacts
  • Large Scale Screening / Complete Lockdown / Ban are travel are the only possible options to control the pace of virus 
Analyze COVID death rates
  • Death rates reported from 2019 Jan to June 2019
  • Death rates reported from 2020 Jan to June 2020
  • %% increase in the death rate
  • Number of reported COVID deaths
  • Number of non-COVID deaths
  • Match by age factor/ gender factor
To know the exact impact we need to compare, analyze by different dimensions and identify the insights 

Keep Thinking!!!

Computer Vision checklist for Security Camera

  • Safety - Data should be safe even in case of theft. Opt for Network storage, not local storage
  • Sight - The angle of placement for maximum coverage is more important. It should align with already developed computer vision models to reuse them. Camera viewing angle, are very different from the data that the existing models are trained on, we need more sophisticated and powerful algorithms to compensate for these shortcomings and overcome the challenges. 
  • Availability - Alert mechanism should be there in case if it has any outage due to network/power. Leverage alternate options (battery / wifi / sim card)
  • Models - Deploy both image processing, edge analytics, simple to complex models. Real-world situations need more than one model / algo to validate
  • Scalability - Throughput needs to be there, avoiding duplicate images, detecting only when there is a change of state, sending inferences of edge analytics. Send optimal data not all the data
Leverage models built for the real world - Link

Keep Thinking!!!

March 12, 2020

Distributed Systems - Session #1

Key Notes
  • Storage, Big Data, File Sharing
  • The infrastructure that requires more than one computer
  • High Performance, Parallelism
  • Fault Tolerance - Two computer does the same things. One Fails another picks up. Availability / Recoverability, Replication
  • Systems are inherently physically distributed
  • To achieve security goals
  • Handle unexpected failure patterns (Partial Failures)
  • Challenges are Concurrency, Partial Failures
  • Academic Curiosity -> Real-world Examples
  • Lectures, Research papers for ideas, implementation details, labs, exams
  • Map Reduce - Map Function on each of the input files, Obvious Parallelism available. The output is a list of Key-Value Pairs. Maps -> Intermediate Output -> Reducers. Collects all instances, all maps. 

Happy Learning!!!

March 11, 2020

Day #332 - Dress Color Detection

After evaluating a few projects, This Gitproject was helpful  - link

Approach
1. Extract RGB composition for input images
2. Use pre-trained samples are available for White, Black, Red, Green, Blue, Orange, Yellow and Violet
3. KNN to compute nearest Color Match between 1 and 2

Input Image -

Detected color is: red

Happy Learning!!!

March 10, 2020

Amazon building Retail Stores AI Tech

Amazon Journey
  • Amazon used its lessons learned in building scalable infra to launch AWS
  • Amazon build complete cloud stack / big data tools for different domains
  • Amazon leveraged AI with Alexa, Retail Stores cashier-less stores 
Retail Stores with AI - Cashierless stores this is a product by itself. Now Amazon is pitching this as a Retail Disruptive product. This will compete against traditional RFID, People Counting software provides like Sensormatic, Checkpoint. Amazon will get more insights. This will be a good testbed to disrupt the retail sector.

Keep watching this space https://justwalkout.com/

Data Collected
We only collect the data needed to provide shoppers with an accurate receipt. Shoppers can think of this as similar to typical security camera footage. Shoppers enter the store with a credit card, grab what they want and just walk out - it's that easy.

Possibilities?
  • Unique Shopper Identification
  • Shopper identification from Credit Card / SSN
  • Face Identification
  • Product Identification / Product Tracking
Will people still be working in stores with Just Walk Out technology?
Yes. Retailers will still employ store associates to greet and answer shoppers' questions, stock the shelves, check IDs for the purchasing of certain goods, and more - their roles have simply shifted to focus on more valuable activities.

Inference
  • Elimination of Cashiers
  • 10~15% Cashiers maybe there per store, It will result in cost savings
  • Customer Assistants for Product queries / Search (Chatbots may compliment them)
What Alternative Options to Challenge this move? Retailer AI To-Do List?
  • Build your own Retail AI portfolio
  • Collaborate with Other Partners(Google / Microsoft / Nvidia)
  • Build Data Expertise, Long term plan for Edge Analytics
  • Reduce Investments on RFID, Proportionately increase AI investments 
  • Invest in AI for Inventory, Traffic, Loss Prevention Solutions
Keep Thinking!!!

AI - Social cause Use Cases - Papers / Approach / Tech Analysis

Aggression prediction at Rehabilitation Centres (Video + Audio Analytics)
War crimes Analysis from Satellite Images (Video Analytics)
Suicide hotline automated call analysis and forwarding (Audio Analytics)


AI for social cause - AGDC: Automatic Garbage Detection and Collection
Key Summary
  • Object detection to identify objects
  • Distance and robot movement estimation
  • Robotic arm with Gripper

LOW-COST DEEP LEARNING UAV AND RASPBERRY PI SOLUTION TO REAL TIME PAVEMENT CONDITION ASSESSMENT
Key Summary
  • Detected Items
  • Alligator cracking
  • Block cracking
  • Edge crack
  • Pothole
  • Longitudinal crack
  • Transverse crack
  • Weather and raveling
Techniques
  • RNN, CNN, Faster R-CNN, YOLO, SSD + MobileNet V1 
Images
  • Training and test images were collected using UAV, Infrared thermography camera, and a handy mobile camera
Tools
  • Bulk Rename Utility ("Bulk Rename Utility,"2019) 
  • PixResizer (Groot, 2019) 
  • LabelImg software ("LabelImg," 2018)  
SSD model shows more suitability for android and Raspberry pi applications 


Keep Thinking!!!

March 09, 2020

Day #331 - Database Learning Cards

Database Learning Cards










Happy Learning!!!