"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

July 12, 2019

Day #262 - Data Modeling for Analytics Translators


Summary Notes
  • Flexible, Extensible, Governed Effectively
  • DW - Staging - RAW - Processing - Consumption
  • Data processed multiple times
  • Aggregated at the end
  • Operational Data Store
Schema on Write
  • Write data
  • Read data
  • Same Schema. Fixed Structure
Schema on Read
  • Apply schema when you read
  • Write once read many times
  • WORM
  • Bringing data separated by different business, data, databases
Example
  • Data arrived in JSON format
  • Add time_stamp to relate data source
  • Add Source_system
Canonical Model 
  • Repeating data for right reasons
  • Enrich with meta_data, canonical elements
  • Link Canonical elements, suppliers together, Provide unique_id
Data Governance
  • Data problems comes in mass scenarios
  • Reports Data Discrepancies
  • IT framework to manage Data Governance
  • Master Data Management (MDM is a technology which provides a 360 degree view of a user data coming from different sources)
  • Data Quality
  • Data Archival
  • Data Security
MDM
  • Source -> ETL (Clean, Standardize, Transform, MDM) -> Reports, DW, EDW
  • Rules Based
  • Metadata verification
  • Data Collection -> ETL -> Data Quality -> MDM -> DW
Data Quality API / Module
  • Add / Remove Business Rules
  • Field Level Validations against messages
  • Return Error codes or log for failures
  • Auditing and Reporting failed messages automatically
Happy Learning!!!

No comments: