Key Summary
Spark Snippets
Happy Learning!!!
- Collect Data from Tables
- Perform Join and Aggregation and generate data with features engineered
- OLTP + BI + Data Science Knowledge - Everything matters to build a model
Checklist for building pipeline
- Handling Late Arriving data
- Pipeline failures
- Handling data quality issues
- App level configurations
- Maximize performance (Indexes / Queries)
- Extract the required data (ETL scripts)
Business Logic
- Clean up data
- Perform Aggregations / Joins
- Generate feature engineered data
Spark Snippets
No comments:
Post a Comment