Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): July 2021

July 30, 2021

Setup Mac Days - Day #1 - Installation

Download vscode and copy and run locally
Install latest anaconda spyder
Install office 365
Install docker on mac for Intel Chip
Get acquainted with Terminal in mac
Terminal - Spyder one click
Install gcp cloud sdk kit ./install.sh

Copy to Applications is something new.

Build, Push, Deploy and run :)

Windows screws up with docker / kubernetes. Need to have one more work environment to get more gcp learning on the go plus more personal projects....

So many build failures before getting it right, Lessons :)

Keep Coding!!!

July 26, 2021

Business - Technology - Passion

Business Problems

Know how it works
How technology helps
Who are pioneers in the space
What is a basic business process, advanced
Who are the stakeholders

Technology Problems

What aspects tech solves (Data / Reporting / Ordering)
What tools to pick considering the scale
What POCs you need to work on
What are different smaller tasks (Data Schema / Transactions / Reporting)
Data to services development
Bring the big picture in course of time

Passion

Everything in Life relationships/jobs we will not get 100% we like
You need to look at the positive side of things until you get to know the business + tech landscape to apply
Everything has a melting point, Long list of experiences will lead to big decisions to decide how it fits in your perspective
Think towards the end to end possible solutions/patent opportunities with a mix of research + prototype + code
Great solutions are a collection of simple ideas + good to have improvements + learning from market/tech incorporating things that makes a difference + constantly adding small improvements

Keep Going!!!

July 24, 2021

Picking up new areas of Learning for Building Solutions

Every time when I am tasked with something new I look of for below list for knowledge gathering

Find evidence / learning from past work
Read through books to understand the fundamentals
Pick / Parse interesting youtube videos and build your understanding
Experiment code snippets wherever possible
Build a broader context of understanding
Refine it further based on latest trends
Look for research papers in the topic
Build a version of solution / approach with your own learning

Keep Going!!!

Leadership Perspectives

Empower every leader in the team their areas/ownership/autonomy to build the best
Ensure there are no territorial borders in solutions. Every team is open-minded and willing to discuss on best solutions
Build the storyline from the customer perspective, What works best for the customer, Are we thinking in that direction
A team does not win because of one-star performer only but delivering the best from everyone makes the team a self-performing team
When teams know their purpose, goals they will perform superior to your process and tools
Learning pillar, technology pillar, customer pillar, working product - Everything is a blend of all these pillars

Good Notes from Leadership Session

Diverse Thinking
Humility to re-learn
Come out of Intellectual arrogance
Able to discuss contrarian views
Street Kid lessons - Persistence Pays, Talk to decision-maker, Do not talk about money, More money to be taken from this person
Creativity and Innovation needs a psychologically safe environment
Promote different thinking, Promote diversity
Wisdom - Ability to hold two contrarian ideas
Supplement bookish knowledge with on-ground knowledge
Artificial stimulants for equal participation

Keep Thinking!!!

Growth oriented learning

What is bad? - I can only do it
What is learning? - He has done it well, Let me solve in my way
Sometimes people will give you knowledge by not sharing the key areas or stressing less on key areas and more on the rest of the focus
Intentions will stand out in the long run
Everyone can learn everything, Be Genuine, Be King, Stay True!!!

Sometimes Live without anything for yourself, Let me kill my ego and build a bigger perspective...

Evolution of best practices

Certain things you have done experimented proven best practices
You probe certain implementations from your failures to validate the ideas and propose
You read up / reference on similar problems solved in the domain
As always it is a mix of code/read/share / build and experiment what best works
Keep an open mind and balance of learning/coding to get things quick to customers and keep improving it

Keep Learning!!!

July 22, 2021

How quick to learn ?

Know your end goal, What you want to accomplish
Know where to lookup for good short lessons (Github / Blogs / Books / Youtube video)
Follow the path few sources
If all steps fail, take a break and come back again. Sometimes we need a break to get a new perspective
Save your steps - Navigation links / Commands
Reach out to StackOverflow / friends who are experts in that area
Document your working Steps
Share it to your wider audience in your books/blogs
Everyone has a way of doing things / When you learn from some source you also need to be a learning source to someone else

A solution can be built in multiple ways, You must have a working skeleton of your thoughts before you look to optimize it.

Keep Learning!!!

July 21, 2021

Lets build a product

Learn the required skills
Design for scalability
Learn from mistakes
Stay focused for six months
Win or Lose let's face it
Make it bootstrapped
Code and Learn one step at a time
Make some wins, failures, smiles, and emotions

At some point in time, you would have touched every domain with some of the other problems. The bigger picture you get with different domains vs your familiarity with current technology vs quickly identifying the right opportunities is important to build a successful product.

When you are best with Technology, Domain you need to build the product. Early stages of your career you focus on technology, in the Later stages you focus on business. There is a time where you are good at both. Use that time and build your idea.

At some point in life everything you read worked, discussed will help you connect with problems from different domains/areas.

Let's Keep Learning!!!

July 18, 2021

Edge Deployment Optimization thoughts

Deploy lite weight models. Deploy Quantized models
Minimal edge processing, Detailed cloud processing
Message loss prevention with Queues and async processing
Transfer only selected frames instead of videos
Offline video upload to cloud vs Real-time selected image upload for real-time notifications

Keep Thinking!!!

July 17, 2021

RetailVisionWorkshop2021 Notes

Links

Key Notes

Physical stores are becoming digital
Products more easily searcheable
Better experiences at stores
Minimize loss of sales

Use cases

Product Detection Challenges
Pricing challenges based on data

RetailVisionWorkshop2021 Pricing Challenge - Ehud Barnea

Key Notes

Price from bounding boxes

Remove promotion content and read price content

Country differences
Winning Solution

Dataset Features

RetailVisionWorkshop2021 - Gang Hua

360 degree camera to scan everything in store
3D construction of motion structure reconstruction
Shelf detections in 360 cameras
Identify Shelf level information
Optimal robot position to capture shelf images
Create Digital twin duplicate product

Assortment planning for online vs offline

06 RetailVisionWorkshop2021 - Sean Bell

Detection, Features, Character Embedding

Large scale embedding for product recognition

Loss Functions

Anchor image
Distances corresponding to same product
Same vs Different products
ArcFace Loss
Every product has centres
Compare anchors and centres

Combination of Vision + Word Embedding for product categorization

GeM Pooling

Feature map at top of Network
Average over spatial dimensions

Product Recognition

07 RetailVisionWorkshop2021 - Aviv Eisenschtat

Dynamic Shelf Reality
New Visual designs of products

Combination of techniques
Similar products
Product Category
Clustering for similar images

Keep Collecting Ideas!!!!

Keep Learning!!!

ML Lessons from Production Implementation

Good Article Link. The summary is very good

For each lesson, I have added my personal observations for few points.

1. Subject matter experts have as much impact as data scientists

Fact - "much of the challenge is getting the right data."
Add-on - "much of the challenge is getting the right data and creating right insights / correct observations / Finding hidden patterns with domain knowledge / look beyond data what drives it"

2. The first iteration is always on the labeling taxonomy - "In vision projects having right labeled data becomes essential for detection, extraction, analysis etc.."

3. The ROI on fast feedback is huge - rapid prototyping and de-risking of projects. - "People lose confidence without seeing the value realization. Getting business involved early and understand their KPI, measure to analyze the impact of ML solution is key for the success of the project"

4. ML tools should be data-centric but model-backed - "It's a tradeoff to learn domain vs ML vs DevOps vs New tools in markets. Often end customers do not see ML as a standalone item, They get together with their existing data warehouse, You need to be practical to pick the tools which make it less complicated to integrate the current environment build a successful use case."

#datascience #analytics #domainknowledge

Keep Thinking!!!

July 15, 2021

Pandas Dataframe Query Lessons

Keep Thinking!!!

July 11, 2021

Next Reading To-do List

Reading to-do list never ends, Learn, Code, Experiment and add own learning's.

Keep Thinking!!!

Big Picture Needs Bigger Perspectives

Big Picture needs Big perspectives

How you manage data vs Know the flows
How much you understand data
How much you avoid data duplication
How much you have data lineage
How much you have data privacy handled
How decentralized, flexible, and updated records are present

Getting complete knowledge goes beyond just collecting, streaming, storing data. Every insight, domain knowledge matter.

MLops, feature Store tools - “When all you have is a hammer, everything starts to look like a nail.” Learn domain before using tools. Kaggle vs Real-world data both are different.

Keep Thinking!!!

Data Ownership - Data Understanding

Database Developer - Designs schema in context of performance, index, tracking
BI Developer - Designs Schema in terms of running aggregations, Reports, Tracking, and Tracing Updates
Machine Learning Engineer - Understands features, picks the relevant ones for Machine learning Algos
MLops - Builds a feature store pipeline to get all the data
Security Engineer / Data Engineer - Plays the role of making data PII, Runs before data pipeline

Reality

With so many perspectives, How do all these folks have the same data understanding?
How many versions of data we will keep
Where is data dictionary or rolling updates shared and updated
Leverage OLAP as ML Feature store, Do not complicate with multiple layers of data, versions etc..

My Perspective - Not every best practice may solve everything, We can still have decentralized DBs with a balance of OLTP vs OLAP, Feature store, Data governance can still be handled by decentralized storage. Having too many data management tools will lead to different perspectives.

Most conferences are far from reality. Their internal practices may be totally different than the projected practices. Take these conferences with a bit of PR pitch. If everything is so easy we would have seen the different levels of tech maturity.

Keep Thinking!!

Products are built to fail.

In many ways underestimate the impact of domain knowledge. Can we have one forecasting algorithm for

Retail Product Sales
Oil Sales
Stocks Predictions
Car Sales

If everything can be built just by one algorithm we would need to close all ML shops in a month. We underestimate domain knowledge and believe fancy tech and tools will have the ability to read and give all the fine-tuning.

Keep Going, Sometimes tech does not understand business, and products are built to fail.

Knowledge is

Mapping business to tech to support futuristics ways of new business changes
Making it flexible to scale, port, migrate
Think Business first, Scale next, Tech at last

What is the new learning format

Domain understanding - Technology evolves faster than we think. New forms of business evolve
Data understanding - Know the type of data - speed / slow data
Research paper - Insights / Blogs - Look for Leaders in the space and their tech stack, Look for research papers and insights
Model development / Model implementation

Keep Thinking!!!

July 10, 2021

Technology learning

Technology learning - Sometimes we overrate what we don't know. The fundamentals remain the same. Many times we do not connect past learning's. Many times Spark, SQL Server lessons we look through conceptually, examples, Implementation. Making data immutable RDDs etc..I liked this comparison - "Keep in mind spark uses memory much in the same way as sql server uses the buffer pool by storing frequently used objects in memory it reduces overall I/O and improves performance in large joins, sort and aggregates contrast this with a traditional hadoop based architecture which relies heavily on writing data out to disk between steps." Every concept technical maps as an advancement or some sort of limitation which existed in place. We need more connected learnings!!!

July 08, 2021

Computer Vision Lip Reading - Use Case Analysis

Paper #1 - Computer Vision Lip Reading

Key Notes

Extract Face, Extract Lips / Mouth area
Depth map with an MS Kinect sensor
Dlib based face landmarks
Deep network trained for numbers detection

Paper #2 - Deep Learning for Lip Reading using Audio-Visual Information for Urdu Language

Key Notes

A sequence of T frames is used as input, and is processed by 3 layers of STCNN, each followed by a spatial max-pooling layer
Explore as words, Digits

Lip Reading Datasets

Lipreading Demo by Convolutional Neural Network, Link2

More Reads

HLR-Net: A Hybrid Lip-Reading Model Based on Deep Convolutional Neural Networks
Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory

Keep Thinking!!!

Forecasting Notes

Paper #1 - Time Series Forecasting Principles with Amazon Forecast

Types of Forecasting

Long term - Strategic
Short term - Operations day to day business
Promotions - Seasonal based
Impact of price, promotion on sales numbers

Key parameters in Retail

Sku, Timestamp, units sold at sku level
Sku metadata - color, department, size
Price data - Price at that point in time
Promotional information of sku
Instock or purchased product

Could do at each SKU Level for sales forecast

Forecast (Target) - Units sold = (Day of week) + WeekendFlag + PromotionalFlag + IsSeasonalProduct + IsTop10SellerForseason + IsTop10inOnlinechannel + IsForAllAgegroups + IsforOld + IsforTeens + IsLowAlcholic + IsAllweatherItem + Weatherofday + ProductPriceontheDay + IsthereBundleOffer

Additional Insights of time

‘Year’, ‘Month’, ‘Week’, ‘Day’, ‘Dayofweek’, ‘Dayofyear’, ‘Is_month_end’, ‘Is_month_start’, ‘Is_quarter_end’, ‘Is_quarter_start’, ‘Is_year_end’, and ‘Is_year_start’.

Data Insights

Aggregate sales by week, day, quarter, holidays, weekends

Handling Missing Data

Zero filling
NaN

The weighted quantile loss (wQuantileLoss) calculates how far the forecast is from actual demand in either direction as a percentage of demand on average in each quantile

For the p10 forecast, the true value is expected to be lower than the predicted value 10% of the time

For the p90 forecast, the true value is expected to be lower than the predicted value 90% of the time

Models

Arima
prophet
DeepAR+
Vector Autoregressive Moving Average with eXogenous regressors model

Link #2 - Time series forecasting

Forecast multiple steps:

Single-shot: Make the predictions all at once.
Autoregressive: Make one prediction at a time and feed the output back to the model.

Evaluation of Time Series Forecasting Models for Estimation of PM2.5 Levels in Air

July 04, 2021

One Liners, Concepts, Slowly Changing Dimensions

SCD Summary

Sometimes one link is good enough to summarize

Type 1 - Overwrite previous value
Type 2 - Add new row, Deactive old record, activate new one
Type 3 - Add new attribute - Activation Data / Effective Date
Type 4 - Add History Table

Docker - Docker is a tool designed to make it easier to create, deploy, and run applications by using containers

Kubernetes - Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services

Docker vs VM

In Docker, the containers running share the host OS kernel
A Virtual Machine, on the other hand, is not based on container technology. They are made up of user space plus kernel space of an operating system

More Reads

Kubernetes cheatsheet

Keep Simplifying Concepts!!!

July 02, 2021

Learning vs Knowing vs Experimenting Vs Measure of Skills

A project work X needs 10 different things

4 Things you worked in multiple projects, You know how it works
3 things you did a hello world and you know basics
3 things you read up stack overflow and fill the gaps

The goal is to get a working implementation of the idea. You know few things but didn't deep dive. You implemented few things and did a deep dive as you worked on it in multiple projects.

We may not master all 10 or remember all 10, We cannot wait to master all 10 to build our idea. The measure of knowledge is the ability to experiment, build, it's not just familiarity with all 10 tools or technology. Time to change the perspective we look at skills.

Keep Thinking!!!

July 30, 2021

July 26, 2021

July 24, 2021

July 22, 2021

July 21, 2021

July 18, 2021

July 17, 2021

July 15, 2021

July 11, 2021

July 10, 2021

July 08, 2021

July 04, 2021

July 02, 2021

About Me

What is your Expertise

Search This Blog

Git Code Repository

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts