"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

May 16, 2017

Day #69 - TSQL Skills for Data Pipeline and Cleanup Work

Pivot is a key thing when it comes to data preparation tasks, MSSQL pivot without aggregation does need a bit of workaround. Two things we will see in this post

Learning #1 - Script for Insert Data generation from MSSQL tables using SSMS (Hidden Gem in MSSQL)

Step 1 -  Database -> Tasks -> Generate Scripts

Step 2 -  Generate the Database objects (Tables as needed)


Step 3 -  Specify Save to Location, Data only option. After you specify options next step script runs and generates insert statement as needed.




Learning #2- Pivot for Data Preparation scenario
For a given scenario of customer/orders, Pivoting the data for next level of tasks



Happy Learning!!!

May 14, 2017

Weekend Seminar - Deep learning in production at Facebook

Good Talk - Deep learning in production at Facebook https://lnkd.in/fX7BZif

Notes from Session
Deep Learning Use Cases
  • Event Prediction - Listing top relevant stories for the user, predicting relevance - Approach - Logistics regression + Deep Neural Networks
  • Machine Translation - Automatically machine translated posts generated for users - Approach - Encoder - Decoder Architecture, Using RNN
  • Natural Language Processing - Understand Context of text - Deep Text - Approach - CNN for words + RNN for sequences
  • Computer Vision - Understand pics - Approach - CNN @ massive scale. Understand different aspects of pictures - Classification, Detection, Segmentation 
Scaling the models
  • Computer faster - Tweaks in FFT, TiledFFT, Winograd to reduce convolution computations, NNPack, CuDNN for CPUs
  • Memory Usage - GPU + Activations Memory released and reallocated during different layers of processing in Deep Networks
  • Compress models - Exploit redundancy in model designs, prune them
Good Insights!!!

Kaggle Vs Enterprise Machine Learning Adoption - Two sides of coin


Reposting Summary from Quora Answer with my perspective added

What you don't learn in Kaggle Competitions
  • Determining business problem to solve with data
  • Real world data imbalance, Accuracy issues, Maintaining Models
  • Miss the challenges of data engineering (What features to select, causational vs correlation in domain context) 
What you learn by experimenting real world data science applications in Production
  • Identifying / Reusing Existing data for first level models 
  • Identifying pipelines to build for more relevant variables
  • ETL / Data Consolidation / Aggregation, Eliminating outliers / Redundant Data
Today's systems have enough Transactional Reporting  / BI Reports in place. The challenge is evolving from the current system, leveraging current data, build a basic model, slowly build pipelines and extend other machine learning use cases.

Happy Learning!!!

April 29, 2017

Day #68 - CNN / RNN and Language Modelling Notes

At the end of every class, I have a feeling there is a lot to learn. People in the industry know things only at the application level. The depth of topics, mathematics discussed in class is very extensive. I always have a feeling of guilt "need to learn more". Every learning needs the breakpoint to correlate/understand end to end, to see the concept in a more familiar perspective. Always Keep Learning and Keep growing.

CNN Notes
  • In a CNN lower layers learn generic features like edges, shapes and feed it to higher layers
  • Earlier layer - Generic features
  • Later layer - Features specific to the corresponding problem
  • For any related problems we can leverage existing network VGG16, VGG19, Alexnet and modify the higher layers based on our need
  • Relu only passed those in Activation function where its > 0
  • Vanishing gradient problem - Weights will stagnate over a period of time
  • 6E/6W - Gradient Error with respect to weights
  • 6E/6I - Gradient Error with respect to Image
RNN
  • Main things is weights same across RNN
  • Weights between successive layers same
  • Document Classification, Data Generation, Chatbot, Time series - RNN can be used
LSTM - Long short term memory

Topics from Language Modelling class


Happy Learning!!!

April 28, 2017

Day #67 - Exploring Tableau Visualization

Canadian Car sales data visualization examples. The interpretation varies based on representation presented below. The data has all the details. Exploring same data in different visualization perspective will provide a different interpretation of same data.

Visualization #1 - This representation would help us figure out which month has usually high sales numbers

  • Three months of year (Dec-Jan-Feb) has relatively weak sales figures compared to rest of year
  • March-August trend shows good demand from customers resulting in increased sales
  • Last few months of the year shows decreased demand. This could be seasonal factor/holidays/travel. This need to be validated
Visualization #2 - Consolidated snapshot of comparison of yearly performance of sales numbers, Across several years and across all months (This one is a good big picture)


  • January is the lowest period of sales
  • Sales trend is increasing YOY (year over year)
  • May month consistently tops high sales for many years
The data format looks like below in Visualization #1

Visualization #3 - Data in simple table format



  • Six years total sales data is represented
  • Partial data is available for the year 2016
Happy Learning!!!

April 27, 2017

Day #66 - Maths behind backpropagation

Today it's mathematical learning for neural network fundamentals.
Keynotes
  • In Neural Network, Network forward propagates activation to produce output and it back propagates error to determine weight changes
  • Partial Derivative - Derivative of one of the variables holding the rest constant
  • Backpropagation uses gradient descent method, one needs to calculate the derivative of squared error function with respect to the weights of the network.
Happy Learning!!!

April 26, 2017

Keep Learning - Good Motivation Note

Interesting Slide from presentation - Dev @ 40



Happy Learning!!!

April 23, 2017

Smart Farming

Product #1 - Automated Farming + Design Layout + Soil Monitoring + Solar powered = "Smart Farming"


Product #2 - Counting Fruits + Finding Weeds + Cattle monitoring

Happy Farming!!!

April 20, 2017

Data Science - Find your Winning use case

I observe a lot of technologies discussed in Data Science roles. It covers Big Data, Open Source, and Commercial Tools, R, Python, MapR, Spark, Azure, Various cloud providers etc...

"Identifying relevant domain/product related use case that helps improve business/numbers is the key"

This LinkedIn post provides a great clarity on focus on relevant use cases, small wins, and scale success.


Happy Analytics!!!

April 17, 2017

Day #65 - Python Package Installation commands - Windows

Had an issue running a code, Tried different options, Uninstalling existing version of keras and reinstalling it worked. Bookmarking commands


Happy Learning!!!