"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

February 10, 2017

Day #55 - Markov chains Basics

This post is from my notes. I had bookmarked some interesting answers on understanding Markov chains.

What is a Markov chain?
The simplest example is a drunkard's walk (also called a random walk). The drunk might stumble in any direction but will move only 1 step from the current position.

The ink drop in a glass of water example

Imagine a traffic light with three states: yellow, green, red; however, instead of going Green-> Yellow-> Red at "fixed intervals", it would go at any color at any time.(randomly - Imagine a dice with 3 color and you throw it and decide what color it will be next).   Alternatively, imagine you are in certain color, say green. If you don't allow to be in the same color again, flip a coin. If it is heads go to red, and if tails go to yellow.

So to make a "chain" we just feed tomorrows result back into today. Then we can get a long chain like rain rain rain no rain no rain no rain rain rain rain no rain no rain no rain a pattern will emerge that there will be long "chains" of rain or no rain based on how we setup our "chances" or probabilities.

Markov Chain - Khan Academy
  • Hidden blue prints of nature / objects around us
  • Once you begin each sequence will converge to one ratio
  • First order and second order model defined by Claude Shannon
Happy Learning!!!

February 07, 2017

Day #54 - Fundamental Concepts - Artificial Neural Networks

Referenced Articles - Link

One liner definitions
  • Image - Represented as RGB Matrix with Height and width = 3 color channels X Height X width
  • Color represented in [0,255] Range
  • Kernel - Small Sized matrix consists of real-valued entries
  • Activation Region - Region where features specific to kernel detected in input
  • Convolution - Calculated by taking dot product of corresponding values of kernel and input matrix certain selected coordinates
  • Zero Padding - Systematically adding inputs to adjust size based on requirements
  • Hyperparameter- Properties pertaining to the structure of layers and neurons (spatial arrangement, receptive field values called hyperparameters). Main CNN hyperparameters are R - Receptive Field, Zero Padding - P, input volume dimension ( Width X Height X Depth) and Stride Length (S)
  • Convolutional Layer - Convolution operation with input filters and identifying the activation region. Convolutiuon Layer output - ReLu (Activation Values)
  • ReLu - Rectified Linear Unit Layer. Most commonly deployed activation function for output of CNN neurons. max(0,x)
  • ReLu is not differentiable with origin so we use Softplus function ln(1+e^x). Derivative of Softplus function is sigmoid function
  • Pooling - Placed after convolution. Objective is downsampling (reduce dimensions)
  • Advantages of downsampling
    • Decreased size of input for upcoming layers
    • Works against overfitting
  • Pooling takes sliding window across input transforming into representative values. Transformation performed by taking maximum value in observable window (max pooling)
Happy Learning!!!

February 03, 2017

Day #53 - Tech Talk - Nikhil Garg - Building a Machine Learning Platform at Quora - MLconf SF 2016

Keynotes from Session

Machine Learning Platform - Collection of systems to sustainable increase the business impact of ML at scale

Build or Buy
1. Degree of Integration with the product. Delegation of components
2. Support for Production Systems (cannot outsource business logic to outside platforms)
3. Blurry line between experimentation & production
4. Leverage Open source in an open manner
5. Commercial platforms are not super valuable - Can often train most models in single multi-core machine
6. Blurry line between ML & Product Development (Inhouse tools for monitoring/training / deploying etc..)
7. ML is Quora's core competency

Machine Learning Models Deployed

Machine Learning Use Cases

Happy Analytics!!!

February 02, 2017

Machine Learning Quotes

Quote #1 - "In Markov model our assumption is future state depends on only current state, not any other previous states"

Quote #2 - "In Bayes, we have naive assumption the current term is independent of the previous term - Naive assumption"

Happy Learning!!!

January 21, 2017

Day #52 - Deep Learning Class #1 Notes

AI - Reverse Engineering the brain (Curated Knowledge)
ML - Machine Learning is subset of AI. Teaching Machine to Learn

Deep Learning - Rebirth of Neural Networks
  • Multiple layer of neurons
  • Directed Graph
  • First Layer is input layer
  • Last Layer is output layer
  • Intermediate layer is hidden Layer
  • Deep Learning is inspired by human brain
  • In Deep Learning features are learnt
  • Gradient Descent - Process of making updates in NN
  • Neural Networks is discriminative approach
  • Neurons in neural networks end up in becoming feature selectors
Discriminative Classifiers - Logistics, SVM (uses kernel for non-linear classification), Decision Trees
Generative Model - Naive Bayes

Types of Neural Networks
  • Autoencoders for dimensionality reduction
  • CNN Convolutional NN
  • RNN Recurrent NN
Interesting Deep Learning Demo Sites Discussed

Happy Learning!!!

January 13, 2017

Day #51 - Neural Networks

Happy New Year 2017. This post is on Neural Networks.

Neural Networks

  • ANN - inspired by biological networks, Modelling network based on neurons
  • Key layers - Input Layer, Hidden Layer, Output Layer
  • Neural networks that can learn - Perceptrons, backpropagation networks, Boltzaman machines,recurrent networks
  • In below example for XOR implementation we use backpropagation 

Implementation overview

  • Initialize the edge weights at random (we do not exact weights we chose randomly, By training we find exact values)
  • Calculate the error - we have some training data and some results - Supervised learning, Calculated output not logical output, Error term present 
  • Calculate the changes of edge weights and update the weights (backpropagation process), Calculate edge weight changes and update accordingly
  • Algorithm terminates when error rate is small

Happy Pongal & Happy Learning!!!

December 17, 2016

Day #50 - Recommendation Systems

Recommendation Systems
  • Content Based
  • Collaborative (User-User / Item-Item)
Content Based Key Features - Able to recommend based on user taste and historical behavior. No need of other user data
  • No Need of Data from Other users
  • Recommend New and Unpopular items
  • Finding Appropriate feature is hard
  • Unable to Exploit Quality judgment from other users
Collaborative Key Features - Recommendation based on similar users / similar items. 
  • For item-i, find other similar items
  • Better than user-user
  • Need enough users data
  • Works on any kind of item no feature selection is needed
  • Find users who have bought / rated similar items
  • Hard to find users rated same items
More Advanced Methods - Latent Factor Models

Happy Learning!!!

Day #49 - Clustering Key Notes

Happy Learning!!!

December 16, 2016

December 13, 2016

Day #47 - Deep Dive - Learning's

Tip #1 - Support Vector Machines
  • Performs classification by obtaining and utilizing optimal separating hyperplane that separates two classes and maximizes the distance to the closest point from either class called margin
  • Training involves non-linear optimization
  • Objective function is convex
  • So, the solution to optimization problem is relatively straight forward
Tip #2 Regularization - Involves adding penalty term in Error function. Two types of regularization in linear regression
  • Ridge
  • Lasso
Tip #3 - Stochastic Gradient Descent
  • Also called as batch gradient descent
  • One example at a time, move at once
  • Cheaper computation
  • Randomization - Escape shallow valleys, local minima, does take care of escaping silly local minima
  • Simplest possible optimization
  • SGD is applied in Neural Networks
Tip #4 - Gradient Descent
  • Meant to minimize non-linear function
  • Error measure convex function
  • Finding local minimum
  • Initialize -> Iterate until termination ->Adjust Learning Rate -> Terminate on local minimum
  • Return Weights
Tip #5 - Bias and Variance
  • Models with two few parameters may lead to High Bias
  • Models with too many parameters are inaccurate due to Large Variance
Happy Learning!!!

December 11, 2016

Day #46 - Recursive Feature Elimination

Recursive feature elimination is step wise backward feature elimination.

Backward Search
  • Start with all features
  • Greedily remove the least relevant feature
  • Stop when selected the least number of features
Recursive Feature Elimination
  • Train SVM
  • Rank the Features
  • Eliminate Feature with lowest Rank
  • Repeat until required number of features are retained
For each iteration RFE eliminates one feature with minimum weight. Intuition is feature with minimum weight would least influence weight vector form.

Happy Learning!!!

Day #45 - Handling Imbalanced Classes

  • SMOTE - Synthetic minority over sampling technique
  • Sampling with Replacement
  • Sampling without Replacement
  • Under sampling of Majority Class, Oversampling of Minority Class
  • Collect more samples
Happy Learning!!!

December 04, 2016

Day #44 - Real time and Batch Analytics - Vendors - Stack Comparison

Summary of analysis after evaluating different stacks

Happy learning!!!

December 03, 2016

November 26, 2016

Day #42 - Classes in python

Today it's bit more on classes in python. It is similar to C# / C++ / Java

November 11, 2016

Day #41 - Machine Learning Interesting Qns

I do read through a lot of materials. Some readings are very clear and needs bookmark. Some of those questions and answers
  1. How does KNN predict class label for new example ?
    • Find the nearest K neighbour of example which needs to be classified. Take the major vote based on class labels of the K neighbours found
  2. Classification - Map input to discrete outputs
  3. Generative Model - Naive Bayes
  4. Discriminative Model - SVM, Decision Trees, Neural Networks, Boosting, KNN
  5. Regression - Map input to continuous outputs
  6. Decision Tress - Embedded Implicit Feature Selection method
  7. PCA
    • Taking Data into a new space
    • Number of Eigen Values = Number of original dimensions
    • Pick the top k Eigen Value Vectors

       8. Linearly non-separable in normal plane. With SVM Kernal Technique we can project it in hyper plane and make it linearly separable

       9. Linearly Separable

Happy Learning!!!

November 05, 2016

Day #40 - Download Images from Web using Python

This post is about downloading images from a URL
  • Read from the input file
  • Perform recursive download for all files
  • Try catch handled errors and downloaded file successfully

October 31, 2016

Day #39 - Useful Tool MyMediaLite for Recommendations

This post is based on learning's for assignment link1, link2

Input is User-Items file as listed below

Sample Execution Command

We will be supplying parameter 20 in user20.txt to identify recommendations for user 20. The recommender type is mentioned in the --recommender parameter

Happy Learning!!!