"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

May 16, 2017

Day #69 - TSQL Skills for Data Pipeline and Cleanup Work

Pivot is a key thing when it comes to data preparation tasks, MSSQL pivot without aggregation does need a bit of workaround. Two things we will see in this post

Learning #1 - Script for Insert Data generation from MSSQL tables using SSMS (Hidden Gem in MSSQL)

Step 1 -  Database -> Tasks -> Generate Scripts

Step 2 -  Generate the Database objects (Tables as needed)


Step 3 -  Specify Save to Location, Data only option. After you specify options next step script runs and generates insert statement as needed.




Learning #2- Pivot for Data Preparation scenario
For a given scenario of customer/orders, Pivoting the data for next level of tasks



Happy Learning!!!

May 14, 2017

Weekend Seminar - Deep learning in production at Facebook

Good Talk - Deep learning in production at Facebook https://lnkd.in/fX7BZif

Notes from Session
Deep Learning Use Cases
  • Event Prediction - Listing top relevant stories for the user, predicting relevance - Approach - Logistics regression + Deep Neural Networks
  • Machine Translation - Automatically machine translated posts generated for users - Approach - Encoder - Decoder Architecture, Using RNN
  • Natural Language Processing - Understand Context of text - Deep Text - Approach - CNN for words + RNN for sequences
  • Computer Vision - Understand pics - Approach - CNN @ massive scale. Understand different aspects of pictures - Classification, Detection, Segmentation 
Scaling the models
  • Computer faster - Tweaks in FFT, TiledFFT, Winograd to reduce convolution computations, NNPack, CuDNN for CPUs
  • Memory Usage - GPU + Activations Memory released and reallocated during different layers of processing in Deep Networks
  • Compress models - Exploit redundancy in model designs, prune them
Good Insights!!!

Kaggle Vs Enterprise Machine Learning Adoption - Two sides of coin


Reposting Summary from Quora Answer with my perspective added

What you don't learn in Kaggle Competitions
  • Determining business problem to solve with data
  • Real world data imbalance, Accuracy issues, Maintaining Models
  • Miss the challenges of data engineering (What features to select, causational vs correlation in domain context) 
What you learn by experimenting real world data science applications in Production
  • Identifying / Reusing Existing data for first level models 
  • Identifying pipelines to build for more relevant variables
  • ETL / Data Consolidation / Aggregation, Eliminating outliers / Redundant Data
Today's systems have enough Transactional Reporting  / BI Reports in place. The challenge is evolving from the current system, leveraging current data, build a basic model, slowly build pipelines and extend other machine learning use cases.

Happy Learning!!!