Have been listening/hearing about Feature generation, feature management. There are a couple of tools/frameworks in this perspective.
For a typical ML product level use case
Analysis - Basically like connecting few tables, doing that analysis of unique, average its all taken care after you define the entities, It is like prebuilt analysis based on identified associations
For a typical ML product level use case
- Who defines the problem - Domain Expert / Product Manager
- Who knows the data sources - BA / Database Developer / Product Manager
- Raw Data -> Processed data - DB Developer
- Data Exploration / Analysis / Feature Creation - BI / DB / ML Developer
- Model Development / Validation - ML Developer
- Deployment / Monitoring / Improvement - Devops / ML Developer
Installing Featuretools
Analysis - Basically like connecting few tables, doing that analysis of unique, average its all taken care after you define the entities, It is like prebuilt analysis based on identified associations
Experimenting this on colab - Colab notebook link
From Link , Feature comparison between different feature stores
Paper - Link
Key Notes
- Handling Data Ingestion
- Aggregating data from diverse sources
- Access controlled and versioned
Key Offerings
- Automated Feature Generation
- Access to generated feature
- Data Privacy / Data Governance
- Data Visualization
My Thoughts
- Today with all cloud trend all the data OLTP, OLAP, SQL, NoSQL sit next to each other
- Generating reports aggregating all sources in near real-time fashion is possible
- Some features/variables can be pulled from OLTP tables
- In a Data Lake / DW, Some of the insights would be already present in computed reports
- Metadata management would already be available in the system which will handle data quality aspects
- ML systems will work together as part of larger Data ecosystem comprising of OLTP, OLAP, SQL, NoSQL system. A lot of feature store workloads are already handled by other pieces.
Keep Thinking!!
No comments:
Post a Comment