"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

April 29, 2017

Day #68 - CNN / RNN and Language Modelling Notes

At the end of every class, I have a feeling there is a lot to learn. People in the industry know things only at the application level. The depth of topics, mathematics discussed in class is very extensive. I always have a feeling of guilt "need to learn more". Every learning needs the breakpoint to correlate/understand end to end, to see the concept in a more familiar perspective. Always Keep Learning and Keep growing.

CNN Notes
  • In a CNN lower layers learn generic features like edges, shapes and feed it to higher layers
  • Earlier layer - Generic features
  • Later layer - Features specific to the corresponding problem
  • For any related problems we can leverage existing network VGG16, VGG19, Alexnet and modify the higher layers based on our need
  • Relu only passed those in Activation function where its > 0
  • Vanishing gradient problem - Weights will stagnate over a period of time
  • 6E/6W - Gradient Error with respect to weights
  • 6E/6I - Gradient Error with respect to Image
RNN
  • Main things is weights same across RNN
  • Weights between successive layers same
  • Document Classification, Data Generation, Chatbot, Time series - RNN can be used
LSTM - Long short term memory

Topics from Language Modelling class


Happy Learning!!!

April 28, 2017

Day #67 - Exploring Tableau Visualization

Canadian Car sales data visualization examples. The interpretation varies based on representation presented below. The data has all the details. Exploring same data in different visualization perspective will provide a different interpretation of same data.

Visualization #1 - This representation would help us figure out which month has usually high sales numbers

  • Three months of year (Dec-Jan-Feb) has relatively weak sales figures compared to rest of year
  • March-August trend shows good demand from customers resulting in increased sales
  • Last few months of the year shows decreased demand. This could be seasonal factor/holidays/travel. This need to be validated
Visualization #2 - Consolidated snapshot of comparison of yearly performance of sales numbers, Across several years and across all months (This one is a good big picture)


  • January is the lowest period of sales
  • Sales trend is increasing YOY (year over year)
  • May month consistently tops high sales for many years
The data format looks like below in Visualization #1

Visualization #3 - Data in simple table format



  • Six years total sales data is represented
  • Partial data is available for the year 2016
Happy Learning!!!

April 27, 2017

Day #66 - Maths behind backpropagation

Today it's mathematical learning for neural network fundamentals.
Keynotes
  • In Neural Network, Network forward propagates activation to produce output and it back propagates error to determine weight changes
  • Partial Derivative - Derivative of one of the variables holding the rest constant
  • Backpropagation uses gradient descent method, one needs to calculate the derivative of squared error function with respect to the weights of the network.
Happy Learning!!!

April 26, 2017

Keep Learning - Good Motivation Note

Interesting Slide from presentation - Dev @ 40



Happy Learning!!!

April 23, 2017

Smart Farming

Product #1 - Automated Farming + Design Layout + Soil Monitoring + Solar powered = "Smart Farming"


Product #2 - Counting Fruits + Finding Weeds + Cattle monitoring

Happy Farming!!!

April 20, 2017

Data Science - Find your Winning use case

I observe a lot of technologies discussed in Data Science roles. It covers Big Data, Open Source, and Commercial Tools, R, Python, MapR, Spark, Azure, Various cloud providers etc...

"Identifying relevant domain/product related use case that helps improve business/numbers is the key"

This LinkedIn post provides a great clarity on focus on relevant use cases, small wins, and scale success.


Happy Analytics!!!

April 17, 2017

Day #65 - Python Package Installation commands - Windows

Had an issue running a code, Tried different options, Uninstalling existing version of keras and reinstalling it worked. Bookmarking commands


Happy Learning!!!

April 13, 2017

Day #64 - ETL for Data and Delta Data Management

Custom SSIS example sample for ETL setup for Data Extraction and Update

Scenario
  • Two Databases (Source and Target)
  • Example with Test Table with few columns
  • Ability to get New Data
  • Ability get Delta Data (Updates)
Step in SSIS Project

Step 1 - Create a Data Flow Task

Step 2 - Add connection managers for Source and Target Databases



Step 3 - The operators and layout is (Source Data -> Lookup in Target Database -> Insert / Update TargetDatabase)



Step 4 - OLEDB Data Source Settings


Step 5 -  Lookup to map for data



Step 6 - Lookup Mapping


Step 7 - Match Non-Matching for Insert / Updates



Step 8 - Match Destination Settings


Step 9 - Non Match Update Query



Step 10 - Non Match Update Params

Reference table script


SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Table_1](
[Col1] [int] NULL,
[Col2] [int] NULL,
[Col3] [int] NULL
) ON [PRIMARY]
GO


Happy Learning!!!!

April 08, 2017

Day #63 - Notes from Text processing and Parallel Programming

Quick Summary notes for future reference

Text Processing - Word Sense Disambiguation
  • Rely on leveraging wordnet (Knowledge sources)
  • from nltk.corpus import wordnet - leverage it
  • Leverage Machine readable dictionary
Lesks Algorithm
  • Sense bag (ambigious word)
  • Context bag (different definitions to context word)
  • Close match will be picked
Walkers Algorithm for word sense disambiguation
  • Use Thesaurus to find scores in context
  • Highest score will be picked up for context relevance
  • Thesaurus Library pywordnet, now part of NLTK
Keywords
  • Polysemy - many possible meanings for a word or phrase.
  • Homonym - same spelling or pronunciation but different meanings
Parallel Programming
  • Filter locks
  • Bakery Algorithm
Example Implementation - link

Memory Consistency
  • Strict Consistency 
  • Sequentially consistent
  • Relaxed(Weak) consistent
Linearization Point
From Stackoverflow

Coarse Grained Vs Fine Grained
From Stackoverflow

Petersons Algorithm


More Reads - Link

Happy Learning!!!

April 07, 2017

TSQL Code formatting tool

Free tool for TSQL code formatting. Added to SSMS


Happy Formatting!!!

April 02, 2017

Fundamentals Again - Day #61 - Hypothesis Testing

  • Alternative Hypothesis - There is difference between groups
  • Null Hypothesis - There is no difference between groups
  • Binomial distribution - Two possible outputs
  • Sampling distributions, Mode, Median, Mean, Variability in distribution (Standard Deviation), Chi Square Distribution 
  • Conduct T-Test, Check the P-value to know Significance
Ref - Coursera

Happy Learning!!!