"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

August 22, 2020

Day #334 - Exploring - Featuretools

Have been listening/hearing about Feature generation, feature management. There are a couple of tools/frameworks in this perspective.

For a typical ML product level use case
  • Who defines the problem - Domain Expert / Product Manager
  • Who knows the data sources - BA / Database Developer / Product Manager
  • Raw Data -> Processed data - DB Developer
  • Data Exploration / Analysis / Feature Creation - BI / DB / ML Developer
  • Model Development / Validation - ML Developer
  • Deployment / Monitoring / Improvement - Devops / ML Developer
Feature store handle the part between raw data - data aggregation - feature generation/feature engineering

Installing Featuretools


Analysis - Basically like connecting few tables, doing that analysis of unique, average its all taken care after you define the entities, It is like prebuilt analysis based on identified associations

Experimenting this on colab - Colab notebook link 
From Link , Feature comparison between different feature stores

Paper - Link
Key Notes
  • Handling Data Ingestion 
  • Aggregating data from diverse sources
  • Access controlled and versioned
Key Offerings
  • Automated Feature Generation
  • Access to generated feature
  • Data Privacy / Data Governance
  • Data Visualization
My Thoughts
  • Today with all cloud trend all the data OLTP, OLAP, SQL, NoSQL sit next to each other 
  • Generating reports aggregating all sources in near real-time fashion is possible
  • Some features/variables can be pulled from OLTP tables
  • In a Data Lake / DW, Some of the insights would be already present in computed reports
  • Metadata management would already be available in the system which will handle data quality aspects
  • ML systems will work together as part of larger Data ecosystem comprising of OLTP, OLAP, SQL, NoSQL system. A lot of feature store workloads are already handled by other pieces. 
More Reads
Keep Thinking!!

No comments: