Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): August 2021

August 31, 2021

Building Resilient Supply Chains with AI - Webinar Notes

Key Notes

Covid-19, Squez Canal
New markets, New products
Demand for more WFH essentials
Long term forecast and short term finetuning

Demand Sensing - Adapting to fluctuations
No Historical data
Excel Models, Rules-based models
History keeps changing every few months

Macro economic factors
Consumer price index
Producer price index
Unemployment claims
Mobility Data
Inflation

Country - Dept - Store - Sku - Sales
Department - Product - Delivery Time Delay
Deploying all this models, Real time deployments
SEIRD Models
SEIRD Model Paper
Epidemic analysis of COVID-19 in China by dynamical modeling

H2O.AI also seems to be one more kubeflow customization

Google Vertex also seems a combination of google AutoML + Kubeflow Customization

Keep Exploring!!!

August 30, 2021

Vision - Webinar Notes - Life of Vision Project - OpenCV - Labelmg - Object Detection - Tensorflow Lite - KfServing - MySQL

Some talks summarize all of your work, plus talk on those aspects which you did implicitly part of your work. Nice Talk to connect with Vision Work :)

Key Notes

Workflow for Image Solutions
Dataset preparation
Data Annotation
Training / Benchmark
Pre-trained + Transfer Vs Custom Model
Metrics for benchmarking

Data Collection

Data set type
Streaming or Image
Data formats
Single image / frames
Video - Frame - Feed model
Image Resolution / Frame rates sampling
Reduce frame rate to support more streams
Preprocessing work
Crop noisy areas
Select areas of interest
Data Generation
Data Augmentation
Simple techniques including vision tricks - rotation, transformation, different angles
GAN / Synthetic data generation techniques

Data Annotate

Annotate / Review
Validate with SME
Bounding boxes / Segmentation / Labels
Single / Multiple objects / Classes
Occlusion, Light, settings
Partially available surfaces
Fine-grained annotation or not
Data set representation against bias
Coverage of possible classes
Models for Day time vs Night time

Model Training

Segmentation / Custom Detection
Post-processing
Transfer Learning

Model Optimization

Prune / Quantize
Inference Engines
CPU / GPU / FPGA

Benchmark

Testing on Deployable hardware
Number of endpoints
Load vs Response
Re-annotate / Re-train
Ensemble or Single Model

Deploy

Edge vs Cloud
Edge Server - Lite weight models
Address based on the constraint, workloads for edge devices
Hybrid approach both edge + cloud
Model interface with application
Storing Results in DB
Real-time notification or just store

Model Monitoring

Monitoring for data/accuracy of detections
Pick low accuracy results / retrain them
Capture when confidence is less than 50%
Continuous re-learning

End to End platform for this

Keep Connecting the Dots!!!

August 25, 2021

Anamoly Reads

Ref - Link

Summary

Point Anomalies - Value is far outside the entirety of the data set
Conditional Outliers - With respect to context, Same value may not be anamoly in another time
Collective Outliers - Set of 1 or more points that deviate from dataset

Telemanom

Paper - Machine Learning for Time Series Anomaly Detection

Key Notes

Clustering methods do not require the data to be labeled, making it a good fit for our unsupervised task. Very sensitive to outlier data points

Two-Step Process

The number of clusters can be set to 2 (one anomalous and one normal)
Summarized by taking averages across an interval of one hour
Rolling Window Sequences

On the Nature and Types of Anomalies: A Review of Deviations in Data

Patent - FAST AUTOMATED DETECTION OF ( 56 ) SEASONAL PATTERNS IN TIME SERIES DATA WITHOUT PRIOR KNOWLEDGE OF SEASONAL PERIODICITY

Key Notes

Calculate Automatic correlation based on timeseries values
Identify local maxima
The seasonal trend identification module
Data store for Normal data, Anamoly data
Scoring module
Human in loop feedback system

Sklearn Models for Supervised Anomaly Detection. Some popular scikit-learn models for supervised anomaly detection include:

KNeighborsClassifier
SVC (SVM classifier)
DecisionTreeClassifier
RandomForestClassifier
Interquartile Range
Isolation Forest
Median Absolute Deviation
K-Nearest Neighbours

August 22, 2021

The Dark side of Analytics

Interesting read - When algorithms dictate your work: Life as a food delivery ‘partner’

This applies to all aggregators - OLA, UBER, etc..

Key Perspectives

The Trap

The illusion of guaranteed income while the variable incentives seem attractive initially
Incentives riddled with a bunch of terms and conditions

Chasing Dreams

Competition to be top performers
Physical and mental costs of driving to complete those targets
Average of 10 hours of being on the road
A job that demands a large amount of time

Eating outside has costed my health very badly. In a way, this is not a sustainable business model. Until 35 your body will not show any problems. The effects will come up after 35.

This business model is a toll on workers and they are not prepared for any better next job. Doing a good job vs Live your day vs Work for a better tomorrow.

Sustainable business models vs Profitable business models vs Inclusive social growth is always a question.

I am scared if the algorithm starts measuring my time

How long have you looked at PC
How many lines of code written
How many functionality worked

Life is too short to see everything through the same lens.

Keep Thinking!!!

Transformer - Let's relearn

These topics come and on and off. I was able to catch up with sliding windows, CNN, RNN, LSTM. Then a bit of Transformers and also how does it work in vision too :)

AI / ML won't let us feel guilty you have to still learn the basics.

Paper - Attention Is All You Need

Key Lessons

Representation of the sequence
Intra-attention of sequence order
Encoder-decoder structure
Encoder - a sequence of continuous representations
The decoder then generates an output sequence (Positional encoding)
Multi-Head Attention consists of several attention layers

Unofficial Walkthrough of Vision Transformer

Image is also pixels, learning pixel representations then the same encoding / decoding can be applied.

Transformers for Image Recognition at Scale

Key Notes

Input image as a sequence of image patches, similar to the sequence of word embeddings
The Vision Transformer treats an input image as a sequence of patches
ViT can learn features hard-coded into CNNs (such as awareness of grid structure)
Image classification with Vision Transformer
Image classification with Vision Transformer

AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

Split an image into fixed-size patches
Linearly embed each of them
Add position embeddings
Feed the resulting sequence of vectors to a standard Transformer encoder

Do Vision Transformers See Like Convolutional Neural
Networks?
pdf: https://t.co/5Yz5F2PZwO
abs: https://t.co/bpHO2rOYDv

find striking differences between the two architectures, such as ViT having more uniform representations across all layers pic.twitter.com/0KT0KE16f9
— AK (@ak92501) August 20, 2021

Do Vision Transformers See Like Convolutional Neural Networks?

Lower half of ResNet layers are similar to around the
lowest quarter of ViT layers
Highest ViT layers dissimilar to lower and higher ResNet layers.

Keep Thinking!!!

August 21, 2021

Man vs Machines

Interesting Read - Link

Perspectives from Stylist

We knew from the beginning we were teaching the algorithm
In recent months, sending out boxes of clothing that were entirely selected by the algorithm
It’s like we’re constantly making the algorithm better by fixing their mistakes

Phases of this Progress

Stylist recommendation (Observed by ML)
Model Training based on recommendation (Learnt by ML)
The model recommends, stylist finetunes it (ML Leads vs Stylist Finetunes)

Other reasons are also cited - changes in flexible working hours, minimum working hours, etc..

I have the same perspectives about the Tesla humanoid robot, How many jobs it can automate/eliminate. Machines that will never get tired/frustrated.

The interesting debate lives vs cost savings vs empathy. Hope we build machines balancing all these aspects.

Keep Thinking!!!

August 20, 2021

Observations and Perspectives

Sometimes commonsense work better than data science

Best practices are #point-in-time best practices with respect to the #data, #technology, #time, availability of #skills, time to fix solutions. It is easier to create but difficult to #evolve from where we are to what we want to become.

Data Science #Skills = #DiversityThinking = #DatabaseSkills + #DomainPerspectives + #DataScience thinking hat + Possibilities with available #data #bigpicture

Keep Questioning!!!

August 18, 2021

Reads - Forecasting

Paper #1 - Hierarchical forecasting with a top-down alignment of independent level forecasts

Key Notes

Deep learning forecasting approach N-BEATS for continuous-time series on top levels
Tree-based algorithm LightGBM for the bottom level intermittent time series
Split the time series into the non-zero component and stochastic component
The lowest level of the hierarchy exhibits a strong intermittent pattern
Upper hierarchy levels contain forecastable components such as the trend or seasonality aggregated by the lowest level

Paper #2 - Hierarchical Dynamic Modeling for Individualized Bayesian Forecasting

Key Notes

Models for personalized forecasting should be

able to incorporate predictor information such as price and promotions,
adaptable to time-varying trends, regression effects and unforeseen temporal changes,
interpretable and open to intervention by users and downstream decision makers,
fully probabilistic to properly characterize forecast uncertainties and allow formal model and forecast assessment under multiple metrics,
adapted to hierarchical settings, and amenable to automated, computationally efficient sequential learning and forecasting.

We define three household groups based on total items purchased over the course of the 112 weeks:

Household Group 1: high spending and purchasing households
Household Group 2: moderate spending and purchasing households
Household Group 3: lower spending and purchasing households

Household Group, Proportion Return, Mean Spend, Median Spend, SD Spend

More Reads

Optimal Combination Forecasts on Retail Multi-Dimensional Sales Data

hts: An R Package for Forecasting Hierarchical or Grouped Time Series

Keep Thinking!!!

August 16, 2021

Forecasting Reads - Research papers - Retail

Paper #1 - An industry case of large-scale demand forecasting of hierarchical components

Key Notes

Demand forecasting system of electronic components in manufacturing

Algos leveraged - 1) Adaboost, 2) ARIMAX, 3) ARIMA, 4) Bayesian Structural Time Series (BSTS), 5) Bayesian Structural Time Series with a Bayesian Classifier (BSTS Classifier), 6) Ensemble of Gradient Boosting (Ensemble), 7) Ridge regression (Ridge), 8) Kernel regression (Kernel), 9) Lasso, 10) Matrix Factorization from section VII (MF), 11) Neural Network (NN), 12) Poisson regression (Poisson), 13) Random Forest (RF), 14) Support Vector Regression (SVR).

Techniques on 1) data pre-processing, (2) prediction, and (3) model selection
Symmetric Mean Absolute Percent Error (SMAPE) serves to evaluate the performance of the models

Paper #2 - Learnings from Kaggle’s Forecasting Competitions

Key Notes

High-frequency series at weekly, daily, and sub-daily levels
Frequency data in the form of weekly, daily and hourly data
Three full seasonal periods were required at each frequency i) complex vs. simple models, ii) crosslearning, iii) prediction uncertainty and iv) ensembling
Walmart Store Sales and the Rossmann competitions
Sales by store/department/week and store/day
Forecasts of unit sales being required by product/store/day

Data Preprocessing

Set NA or Negative values to zero.
Remove time series with all zero values.
Remove leading zeros.
To calculate the feature vectors, we use the R package feats
Apply principal components for dimensionality reduction using the prcomp algorithm

Most of the top performers used ensembles of global XGBoost models to create forecasts, but a few of them did include local XGBoost models as part of their ensemble
Holidays and promotion, turned out to be essential for obtaining high performance in this competition
Global ensemble models outperform local single models

Feature Extraction

Day of Week
Weekend
IsHoliday
Ispromotionday
IsMonthEnd
IsyearEnd
IsQuarterEnd
IsLocalHoliday
WeekofYear
Wolling Window
Average of 2 - 3 - Weeks
Moving Average Numbers
Mean Every 2 Weeks
Incremental Differences Everyday
Adding Averages / Means - Weekly Average, Daily Average

Paper #3 - An Empirical Analysis of Feature Engineering for Predictive Modeling

Following sixteen selected engineered features:

Counts
Differences
Distance Between Quadratic Roots
Distance Formula
Logarithms
Max of Inputs
Polynomials
Power Ratio (such as BMI)
Powers
Ratio of a Product
Rational Differences
Rational Polynomials
Ratios
Root Distance
Root of a Ratio (such as Standard Deviation)
Square Roots
Counts - count engineered feature counts the number of elements in the feature vector that satisfies a certain condition
Statisticians have long used logarithms and power functions to transform the inputs to linear regression

Paper #4 - VEST: Automatic Feature Engineering for Forecasting

sku,wkno,saleqty
cluster and forecast
DWT - Dynamic Time Wraping Metric for clustering timeseries

For offline Retail Stores my list of feature variables

Store level stats

Date
StoreId
Items in Store
Traffic Count
Holiday / Festival
Number of Item Categories
Weather
Out of Stock Items
Cost of Products, The price value of SKU
Promotional Offers / Seasonal Information
Weather Information on Store on that Day
Store operational timings
Store Labour Details

Data Product Thinking

With 20% more restock of this item, It might reduce 10% out of Stock, 5% improvement in Traffic (Instead of blind forecast provide a collated recommendation)
With 20% reduction in tomorrow traffic, corresponding items or % of Sale Can be presented
Multiple models will run behind these decisions to generate the recommendations

More Reads

Keep Thinking!!!

August 15, 2021

Measure Experience from Projects / Domain / Versatility / Evolving Perspectives

Career perspective at different stages

10 years of Working on the Same project = 10 years of Experience?
10 years Experience in 1 domain vs 5 domains (Multi-domain exposure)
Ability to Translate Bird's eye view to Prototype
Map prototype to Implementation tasks
Work on the storyline in mind than just near focus tasks
Code with Clarity vs Code with limited visibility
Code with Domain knowledge vs Code and refactor based on domain knowledge
Partner with Business and Work vs Work and Rework again for business
Familiarity of Technology vs Visibility of use cases vs Clarity of implementation vs Ability to explain in an implementation perspective
Map trends vs Current architecture vs Time vs Focus
Always keep thinking from multiple diverse perspectives

Keep Thinking!!!

Interesting Reads - Books H1

Some of the books I was able to review in the last 6 months. We need to revise again and again and experiment.

Bird's eye view
30 K Perspective
20K Perspective

I am still learning. I take time to build my perspective. Experience is a mix of learning, doing, knowing, connecting with industry experts. Always be open to learning - unlearn - relearn.

Books List for future reference

O'Reilly - A Practical Introduction to Supply Chain
O'Reilly - Agile Conversations
O'Reilly - AI Blueprints
O'Reilly - Architecture Patterns with Python
O'Reilly - Beautiful Code
O'Reilly - Bioinformatics Programming Using Python
O'Reilly - Breaking Out: How to Build Influence in a World of Competing Ideas
O'Reilly - Building Evolutionary Architectures
O'Reilly - Change Your Life with CBT
O'Reilly - Cloud Analytics with Microsoft Azure - Second Edition
O'Reilly - Communicate to Influence: How to Inspire Your Audience to Action
O'Reilly - Data Governance: The Definitive Guide
O'Reilly - Data Lake Analytics on Microsoft Azure: A Practitioner's Guide to Big Data Engineering
O'Reilly - Design Thinking for Training and Development
O'Reilly - Designing Data-Intensive Applications
O'Reilly - Digital Supply Networks: Transform Your Supply Chain and Gain Competitive Advantage with Disruptive Technology and Reimagined Processes
O'Reilly - Empathy (HBR Emotional Intelligence Series)
O'Reilly - Exam Ref AZ-303 Microsoft Azure Architect Technologies
O'Reilly - Fluent Python, 2nd Edition
O'Reilly - Focus (HBR Emotional Intelligence Series)
O'Reilly - Fundamentals of Supply Chain Theory, 2nd Edition
O'Reilly - Graph Algorithms
O'Reilly - Hands-On Vision and Behavior for Self-Driving Cars
O'Reilly - How Stitch Fix uses human-in-the-loop machine learning for personalization
O'Reilly - How to Persuade and Influence People: Powerful techniques to get your own way more often
O'Reilly - Influence and Persuasion (HBR Emotional Intelligence Series)
O'Reilly - Influence in Action: How to Build Your Conversational Capacity, Do Meaningful Work, and Make a Powerful Difference
O'Reilly - Kubernetes in Action
O'Reilly - Learning Python, 4th Edition
O'Reilly - Linear Programming and Resource Allocation Modeling
O'Reilly - Logistics Management
O'Reilly - Machine Learning Design Patterns
O'Reilly - Metaheuristics for Logistics
O'Reilly - Nature-Inspired Optimization Algorithms
O'Reilly - Practical Git: Confident Git Through Practice
O'Reilly - Practical Machine Learning for Computer Vision
O'Reilly - Practical MLOps
O'Reilly - Practical Statistics for Data Scientists, 2nd Edition
O'Reilly - Purpose, Meaning, and Passion (HBR Emotional Intelligence Series)
O'Reilly - Python: Master the Art of Design Patterns
O'Reilly - Resilience (HBR Emotional Intelligence Series)
O'Reilly - Self-Awareness (HBR Emotional Intelligence Series)
O'Reilly - Success in Programming: How to Gain Recognition, Power, and Influence through Personal Branding
O'Reilly - Supply Chain and Logistics Management Made Easy: Methods and Applications for Planning, Operations, Integration, Control and Improvement, and Network Design
O'Reilly - Supply Chain Management and its Applications in Computer Science
O'Reilly - Supply Chain Management For Dummies
O'Reilly - Supply Chain Optimization through Segmentation and Analytics
O'Reilly - The Azure Cloud Native Architecture Mapbook
O'Reilly - The Cloud-Based Demand-Driven Supply Chain
O'Reilly - The Science of Influence: How to Get Anyone to Say "Yes" in 8 Minutes or Less!, Second Edition
O'Reilly - Visual CBT: Using pictures to help you apply Cognitive Behaviour Therapy to change your life

Bookmarks for future reference!!!

Keep Thinking!!!

Research Paper Reads - Logs Monitoring

Ideas need a birds-eye view of the landscape to understand existing work. Papers are the only way to understand that. Bookmarking few notes for future reference

Paper #1 - Log-based software monitoring: a systematic mapping study

Key Notes

The Lifecycle of Log

Possible components would be Elasticsearch, Logstash, and Kibana
Kibana provides an interface for visualization, query, and exploration of log data

LOGGING - 1) empirical studies on logging practices, (2) requirements for application logs, and (3) implementation of log statements
LOG INFRASTRUCTURE - (1) log parsing, and (2) log storage.
LOG ANALYSIS: : (1) anomaly detection, (2) security and privacy, (3) root cause analysis, (4) failure prediction, (5) quality assurance, (6) model inference and invariant mining, (7) reliability and dependability, and (8) log platforms
Log Parsing - “textual similarity” between the log messages.
Each log is converted to a binary vector, with each element representing whether the log contains that keyword
Transformer - TEMPLATE2VEC (as an alternative to WORD2VEC) to represent extracted templates from logs and LSTMs to learn common sequences of log sequences

Root Cause Analysis

By correlating log messages and resource consumption, their
approach builds relationships between changes in resource consumption and application events.
They propose a technique based on the correlation of console logs and resource usage information to link jobs with anomalous behavior and erroneous nodes.

Failure Prediction

Utilize system logs to predict failures by mining recurring event sequences that are correlated

Paper #2 - Multi-Source Anomaly Detection in Distributed IT Systems

Key Notes

Three categories-modalities: metrics, application logs, and distributed traces
Word frequencies and metrics derived from the logs (e.g TF-IDF)
Decompose the trace in its building blocks, the events/spans, and predict the next span in the sequence

Paper #3 - LogBERT: Log Anomaly Detection via BERT

Key Notes

LogBERT leverages the Transformer encoder to
model log sequences and is trained by novel self-supervised tasks to capture the patterns of normal sequences.

Baselines

Principal Component Analysis (PCA) [19]. PCA builds counting matrix based on the frequency of log keys sequences and then reduces the original counting matrix into a low dimensional space to detect anomalous sequences
One-Class SVM (OCSVM) [14]. One-Class SVM is a well-known one-class classification model and widely used for log anomaly detection [5,16] by only observing the normal data.
IsolationForest (iForest) [7]. Isolation forest is an unsupervised learning algorithm for anomaly detection by representing features as tree structures.
LogCluster [6]. LogCluster is a clustering based approach, where the anomalous log sequences are detected by comparing with the existing clusters.
DeepLog [2]. DeepLog is a state-of-the-art log anomaly detection approach.
DeepLog adopts recurrent neural network to capture patterns of normal log sequences and further identifies the anomalous log sequences based on the performance of log key predictions.
LogAnomaly [23]. Log Anomaly is a deep learning-based anomaly detection approach and able to detect sequential and quantitative log anomalies.

Paper #4 - A Survey on Automated Log Analysis for Reliability Engineering

Log event sequence: A sequence of log events recording system’s activities.
Log event count vector: A feature vector recording the log events occurrence

Analysis Insights / Thoughts

The query for selected values vs Bulk Upload of Data
Usage patterns segmented for weekday/weekend / Trading hours
Usage patterns across different time zones
Usage patterns across different sections of applications
Number of ad-hoc queries
Restrict bulk upload to certain timezones / non-peak hours
Two-stage commit - Upload and commit at a later stage

Big picture Notes

Limit users to App Access during peak hours (5 calls during peak hours)
Limit users to App Access during peak hours (10 calls during non-peak hours)
Refer to replicated data in case of data that has stop-gap 5 hours delay
Pagination of results
Cache/reuse of results
Identify maximum reported errors
Patterns of errors over a weekday
User login activities and queries
User value - Application usage vs Revenue
User Action predictions
Take top 100 users, Plot the sequence of usage and see common flow/patterns

Diagnosis Perspective

What is the blocking that happens between
Page load query vs Search query
Search query vs Data upload query
Data upload vs Report download query
Measure potential data conflicts that cause issues

Prediction Algorithms for User Actions

Execution model

Understand problem statement
Understand data sources
Understand data access / permissions
Frame NLP / Data / User level details
Initial Analysis Scope
Application Understanding
Connects / Feedback

Diagnosis

User based - Create APIs / Read APIs / Update APIs - Simple / Bulk / Delete APIs - Single / Bulk
Do we track at UserId, Numberofcalls,Avgtime
Nature of transactions - Realtime vs Reporting vs Bulk inserts vs Bulk Updates
API calls across day by time
%% Mix of workflow and common tables mapped / accessed by them - Time dimension added for pattern
A,B,C @ Time T1
A,B at Time T2

More Reads

Keep Thinking!!!

August 14, 2021

Contributions / Efforts / Building Teams

After years when we look back and see beyond the company, roles, projects, What is the impact or what we have learned or the teams we worked with, how do we remember it.

Years you work vs Contributions
Years you work vs Learning over time
Years you work vs Mutual Win / Win projects
Years you work vs Going beyond the comfort zone
Years you work vs Self Realization
Years you work vs Meaningful Work
Years you work vs Influence over Peers
Years you work vs Giving back to the community
Years you work vs Sharing your learnings

Everything is part of building a high-performing team. Great teams come from trust, learning, picking the best ideas, and experimenting in the available time.

The only thing constant is finding/becoming a better version of myself compared to yesterday. Just in time answers vs forgotten memories vs Who you are vs What you learnt vs What you lost in the making is a never-ending quest to understand ourselves.

Keep Going!!!

August 12, 2021

Efforts / Learning / Skills / Dimensions

To File a patent - Domain Knowledge + ML Knowledge + Uniqueness of Patent + Business Benefits
To do a MVP - Study competitive products + Assess ML models + Pick relevant data + Get the ROI / Outcome
To Scale a Model - Study Cloud / Deployment architectures + Scalability + Monitoring aspects
To Teach Something - Getting into specifics, Move from practitioner to researcher lens to get insights
To get Ideas - Check blogs, newsletters, books

Everything counts to the dots of ideas, Everything connects in long run. Many things are not measurable with profiles but the impact is visible in the right team mix and focus. Coding after gathering required knowledge vs Searching for code and idea while you code both reflects your experience

Thinking is idea / logic / reading / listening. Coding is the method to connect those dots. A lot more things are there which we do not even think about when we write about our skills.

Keep Going!!!

Data Science Skills

Data Science Skills = Diversity Thinking = Database Skills + Domain Perspectives + Data Science thinking hat + Possibilities with available data

Bridging Birds Eye + Available Features + Expectations from customers is a refinement of picking the best of ideas, quick experimentation. Practitioner's perspective to apply multiple context/patterns is ongoing learning.

Myth of Expertise

Big data - Current Trends
NLP - Current Trends
Vision - Current Trends
GAN - Current Trends
Transformers - Current Trends

Knowing vs Doing

Prototypes in Current Trends
Projects with Current Trends
Go-Live with Current Trends

Knowing vs Doing vs Delivering vs Connecting the Dots is a different Skill.

I look to solve a selected set of problems with four hats

Database Developer/ BI perspectives
Data Science Perspective
ML Algos Perspectibes
Customers Perspective what makes sense

It is a mix of all these thoughts that makes up a use case. One Skill does not make things happen. Unfortunately, these don't stand up in your profile but are even harder to measure relevant skills vs practical implications

Keep Thinking!!!

August 10, 2021

Tech Talk - Causal Inference in Data Science From Prediction to Causation

Key Notes

Games Prediction - Higher Activity logins, Friends
It could be another way, They play games and make friends
How do we increase the activity?
Different segments of people - Games to Friends, Ask Friends and play games
Observational metrics - Be mindful of hidden causes

Measure versions of algos - A/B Testing
Impact of Algo on different types of people
Lower activity - Higher CTR
CTR for different segments of users
Segment people and see the behavior of each segment with experiments
Combination of Experiments / Conversions / Measure of it
ML Recommendations
Split groups into different selections of the same category
The choice for new Algos - Frequent Buyers
Choice of old Algos - Low-frequency Buyers
Purchase behavior trend over years
Purchase behavior of new buyers
Experiment - Conversions - Alerts (Forecast vs Actuals)

Frameworks

Causal graphical models
Potential outcome framework

What would have happened if you did that?
What would have happened if you had not done that?

Evaluate existing systems

Old recommendations vs New Recommendation
Measure forecast deviations against actuals qualitatively

Feedback loop informs current best algo!!!

Keep Thinking!!!

August 31, 2021

August 30, 2021

August 25, 2021

August 22, 2021

August 21, 2021

August 20, 2021

August 18, 2021

August 16, 2021

August 15, 2021

August 14, 2021

August 12, 2021

August 10, 2021

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts