"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

July 11, 2021

Data Ownership - Data Understanding

  • Database Developer - Designs schema in context of performance, index, tracking
  • BI Developer - Designs Schema in terms of running aggregations, Reports, Tracking, and Tracing Updates
  • Machine Learning Engineer - Understands features, picks the relevant ones for Machine learning Algos
  • MLops - Builds a feature store pipeline to get all the data
  • Security Engineer / Data Engineer - Plays the role of making data PII, Runs before data pipeline
Reality
  • With so many perspectives, How do all these folks have the same data understanding?
  • How many versions of data we will keep 
  • Where is data dictionary or rolling updates shared and updated
  • Leverage OLAP as ML Feature store, Do not complicate with multiple layers of data, versions etc..
My Perspective - Not every best practice may solve everything, We can still have decentralized DBs with a balance of OLTP vs OLAP, Feature store, Data governance can still be handled by decentralized storage. Having too many data management tools will lead to different perspectives.

Most conferences are far from reality. Their internal practices may be totally different than the projected practices. Take these conferences with a bit of PR pitch. If everything is so easy we would have seen the different levels of tech maturity.

Keep Thinking!!

No comments: