"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

July 17, 2019

winscp - File Copy Linux to Windows

This tool was handy to copy file from Linux to Windows machines.
  • Import Putty sessions while installation
  • Drag and Drop for file copy
Download from link
One-Click copy tool



Remote Desktop from Windows to Ubuntu link

Happy Learning!!!

July 12, 2019

The Happy Teaching Moment

Teaching is my passion, Learning, building perspectives, connecting dots is my path for wisdom. Personally, professionally all my failures continue to keep teaching me to be more humble, kind and hardworking.
Today, I received feedback for one of my session. Below are the scores. A reason to smile and keep continuing the journey.

Keep Going!!!

Data Modelling for Workloads

  • Denormalized way to suit query patterns
  • Data Properties - Concurrent writes 
  • Metadata - No continuous updates (Just store data, no update involved)
  • Indexes - Single Field, Multikey indexes, Text Indexes, Column Store Indexes
  • Sharding - Shardkey could be location_id (Distributed Storage)
  • Data Partitioning
  • Timeseries Info aggregation in storage level (NoSQL)
  • JSON Modelling - Collections in MongoDB, Establishing Hierarchy and Relationships
Happy Learning!!!

Day #262 - Data Modeling for Analytics Translators


Summary Notes
  • Flexible, Extensible, Governed Effectively
  • DW - Staging - RAW - Processing - Consumption
  • Data processed multiple times
  • Aggregated at the end
  • Operational Data Store
Schema on Write
  • Write data
  • Read data
  • Same Schema. Fixed Structure
Schema on Read
  • Apply schema when you read
  • Write once read many times
  • WORM
  • Bringing data separated by different business, data, databases
Example
  • Data arrived in JSON format
  • Add time_stamp to relate data source
  • Add Source_system
Canonical Model 
  • Repeating data for right reasons
  • Enrich with meta_data, canonical elements
  • Link Canonical elements, suppliers together, Provide unique_id
Data Governance
  • Data problems comes in mass scenarios
  • Reports Data Discrepancies
  • IT framework to manage Data Governance
  • Master Data Management (MDM is a technology which provides a 360 degree view of a user data coming from different sources)
  • Data Quality
  • Data Archival
  • Data Security
MDM
  • Source -> ETL (Clean, Standardize, Transform, MDM) -> Reports, DW, EDW
  • Rules Based
  • Metadata verification
  • Data Collection -> ETL -> Data Quality -> MDM -> DW
Data Quality API / Module
  • Add / Remove Business Rules
  • Field Level Validations against messages
  • Return Error codes or log for failures
  • Auditing and Reporting failed messages automatically
Happy Learning!!!

July 04, 2019

Day #261 - Setting up Spark on my Ubuntu - Big Data Setup - Part V







Happy Data Thinking!!!

Data in Motion - Knowledgebytes


  • Streaming Data - Kafka
  • Consolidate Streaming Data and Analyze - Spark
  • Store Transactional Data Received from Streams - HBASE / RDBMS
  • Historical Data Analysis - HDFS / Map Reduce
#Knowledgebytes