"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

September 25, 2016

Persuading Organization to embrace analytics

Products / systems currently running need to look at their Data Collection techniques to identify more relevant data to perform better analytics. If current systems rely on point in time data, overwrite / archive historical records over a period of time, we will lose all the valuable information

Why Analytics ?
  • Predict your future based on your past and present
  • Correct your mistakes before it's too late
  • Identify and correct poor performing segments of business

How Analytics differs from Business Intelligence ?
  • I have worked for ETL, data marts, Schemas for BI projects
  • BI helps to summarize compare business performance YoY, QoQ
  • Analytics, is next step for BI to look at future trends

Where are we lagging ?

We need analytics but we do not have enough data points / features to perform analytics. Data collection is a key aspect. The underlying blood of Data science is collecting meaningful data and making models out of it. We need to devote sufficient time to collect data, pipeline it, process and aggregate it for Data Analysis, Modelling.

To evolve from a current product to a system with Analytics capabilities we need to change we way we store data, process data. Technical aspects, project deadlines, resistance has to be handled to make things work.

Persist, Persuade, Implement....

Happy Learning!!!

September 05, 2016

Day #31 - Support Vector Machines

SVM
  • Support Vector Machines
  • Widest Street approach separating +ve and -ve classes, Separations as wide as possible
  • SVM works on classifying only two classes
  • Hard SVM (Strictly linearly separable)
  • Soft SVM (Minimize how they fall on another side, Constant C to minimize how much allow one point go on another side)
  • Kernel Functions perform transformation of data
  • Using Kernel function we simulate idea of finding linear separator 
  • Kernels take data into higher dimensional space
  • Other Key concepts discussed (Lagrange Multipliers, Quadratic Optimization problem)
  • Lagrangian constraint transform from 1D to 2D data
  • SVM (Linear way of approximation)
  • Types of Kernels - Polynomial Kernel, Radial Basis Function Kernel, Sigmoid Kernel
Maths Behind it - Link
Good Relevant Read - SVM

Happy Data Analysis!!!

Day #30 - Machine Learning Fundamentals

Supervised Learning
  • Classification and Regression problems
  • Past data + Past outputs leveraged
  • Regression - Continuous Values
  • Classification - Discrete Labels
Unsupervised
  • Clustering - Discrete Labels
  • Dimensionality reduction - Continuous Values
Classifiers
  • SVM (Linear way of approximations)
  • KNN (Lazy learner)
  • Decision Tree (Rule based approach, Set of Rules)
  • Naive Bayes (Pick class with maximum probability)
Evaluation Methods
  • K-Fold Validation
  • Cross Validation
  • Ranking / Search - Relevance
  • Clustering - Intra-cluster and inter-cluster distances
  • Regression - Mean Square Error
  • ROC Curve 
Bagging
  • Build classifier with 30% of data
  • Again partition and build another classifier with next 30% of data
  • Random Forests - Random combination of Trees
  • Randomly decide and split on attributes
Boosting
  • Multiple weak classifiers build strong classifier
  • Sample with replacement
  • Adaboost - Adaptive boosting
Stacking
  • Use Output from one classifier as input for another classifier
  • Knn -> O/P -> SVM
Happy Learning!!!