Windows screws up with docker / kubernetes. Need to have one more work environment to get more gcp learning on the go plus more personal projects....
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
What aspects tech solves (Data / Reporting / Ordering)
What tools to pick considering the scale
What POCs you need to work on
What are different smaller tasks (Data Schema / Transactions / Reporting)
Data to services development
Bring the big picture in course of time
Passion
Everything in Life relationships/jobs we will not get 100% we like
You need to look at the positive side of things until you get to know the business + tech landscape to apply
Everything has a melting point, Long list of experiences will lead to big decisions to decide how it fits in your perspective
Think towards the end to end possible solutions/patent opportunities with a mix of research + prototype + code
Great solutions are a collection of simple ideas + good to have improvements + learning from market/tech incorporating things that makes a difference + constantly adding small improvements
At some point in time, you would have touched every domain with some of the other problems. The bigger picture you get with different domains vs your familiarity with current technology vs quickly identifying the right opportunities is important to build a successful product.
When you are best with Technology, Domain you need to build the product. Early stages of your career you focus on technology, in the Later stages you focus on business. There is a time where you are good at both. Use that time and build your idea.
At some point in life everything you read worked, discussed will help you connect with problems from different domains/areas.
For each lesson, I have added my personal observations for few points.
1. Subject matter experts have as much impact as data scientists
Fact - "much of the challenge is getting the right data."
Add-on - "much of the challenge is getting the right data and creating right insights / correct observations / Finding hidden patterns with domain knowledge / look beyond data what drives it"
2. The first iteration is always on the labeling taxonomy - "In vision projects having right labeled data becomes essential for detection, extraction, analysis etc.."
3. The ROI on fast feedback is huge - rapid prototyping and de-risking of projects. - "People lose confidence without seeing the value realization. Getting business involved early and understand their KPI, measure to analyze the impact of ML solution is key for the success of the project"
4. ML tools should be data-centric but model-backed - "It's a tradeoff to learn domain vs ML vs DevOps vs New tools in markets. Often end customers do not see ML as a standalone item, They get together with their existing data warehouse, You need to be practical to pick the tools which make it less complicated to integrate the current environment build a successful use case."
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
How decentralized, flexible, and updated records are present
Getting complete knowledge goes beyond just collecting, streaming, storing data. Every insight, domain knowledge matter.
MLops, feature Store tools - “When all you have is a hammer, everything starts to look like a nail.” Learn domain before using tools. Kaggle vs Real-world data both are different.
Database Developer - Designs schema in context of performance, index, tracking
BI Developer - Designs Schema in terms of running aggregations, Reports, Tracking, and Tracing Updates
Machine Learning Engineer - Understands features, picks the relevant ones for Machine learning Algos
MLops - Builds a feature store pipeline to get all the data
Security Engineer / Data Engineer - Plays the role of making data PII, Runs before data pipeline
Reality
With so many perspectives, How do all these folks have the same data understanding?
How many versions of data we will keep
Where is data dictionary or rolling updates shared and updated
Leverage OLAP as ML Feature store, Do not complicate with multiple layers of data, versions etc..
My Perspective - Not every best practice may solve everything, We can still have decentralized DBs with a balance of OLTP vs OLAP, Feature store, Data governance can still be handled by decentralized storage. Having too many data management tools will lead to different perspectives.
Most conferences are far from reality. Their internal practices may be totally different than the projected practices. Take these conferences with a bit of PR pitch. If everything is so easy we would have seen the different levels of tech maturity.
In many ways underestimate the impact of domain knowledge. Can we have one forecasting algorithm for
Retail Product Sales
Oil Sales
Stocks Predictions
Car Sales
If everything can be built just by one algorithm we would need to close all ML shops in a month. We underestimate domain knowledge and believe fancy tech and tools will have the ability to read and give all the fine-tuning.
Keep Going, Sometimes tech does not understand business, and products are built to fail.
Knowledge is
Mapping business to tech to support futuristics ways of new business changes
Making it flexible to scale, port, migrate
Think Business first, Scale next, Tech at last
What is the new learning format
Domain understanding - Technology evolves faster than we think. New forms of business evolve
Data understanding - Know the type of data - speed / slow data
Research paper - Insights / Blogs - Look for Leaders in the space and their tech stack, Look for research papers and insights
Technology learning - Sometimes we overrate what we don't know. The fundamentals remain the same. Many times we do not connect past learning's. Many times Spark, SQL Server lessons we look through conceptually, examples, Implementation. Making data immutable RDDs etc..I liked this comparison - "Keep in mind spark uses memory much in the same way as sql server uses the buffer pool by storing frequently used objects in memory it reduces overall I/O and improves performance in large joins, sort and aggregates contrast this with a traditional hadoop based architecture which relies heavily on writing data out to disk between steps." Every concept technical maps as an advancement or some sort of limitation which existed in place. We need more connected learnings!!!
Aggregate sales by week, day, quarter, holidays, weekends
Handling Missing Data
Zero filling
NaN
The weighted quantile loss (wQuantileLoss) calculates how far the forecast is from actual demand in either direction as a percentage of demand on average in each quantile
For the p10 forecast, the true value is expected to be lower than the predicted value 10% of the time
For the p90 forecast, the true value is expected to be lower than the predicted value 90% of the time
Models
Arima
prophet
DeepAR+
Vector Autoregressive Moving Average with eXogenous regressors model
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4 Things you worked in multiple projects, You know how it works
3 things you did a hello world and you know basics
3 things you read up stack overflow and fill the gaps
The goal is to get a working implementation of the idea. You know few things but didn't deep dive. You implemented few things and did a deep dive as you worked on it in multiple projects.
We may not master all 10 or remember all 10, We cannot wait to master all 10 to build our idea. The measure of knowledge is the ability to experiment, build, it's not just familiarity with all 10 tools or technology. Time to change the perspective we look at skills.
Welcome Visitor,
I have 20 years of experience (Coder - Emprical Learner - Teacher). I am currently working on Data Analytics (Video-Image-Text-Data) / Database / BI space. I dabble with "Data". Ping me or send a request to connect if what I do appeals to you and you want to talk about it (Data Science / Databases / Deep Learning / Architecture / Design Discussions / Consulting Projects/ Machine Learning Training's/ Strategic Leadership Roles).
Personal Goal - Reach / Teach up to 10 Million Students through various mediums (Catalyst between Academics and Industry)
My request to readers, Hope you find the posts, code snippets, notes helpful, please share your learning with others. We can only grow only by learning and teaching.
6+ years in AI, AI experience working on Image, Video, Text, Numbers - Data
15+ years in Databases
10+ in developing, deploying, monitoring large scale solutions in Supply Chain, Retail
Its my personal blog. The objective of this blog is to bookmark/share my learning's. Posts reflect my opinions, perspectives and interests. Blog post presented are my personal views and do not represent my employer's view. I have acknowledged all posts with References/Bookmarks.
For questions/feedback/career opportunities/training / consulting assignments/mentoring - please drop a note to sivaram2k10(at)gmail(dot)com
Coach / Code / Innovate