Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): October 2016

October 31, 2016

Day #39 - Useful Tool MyMediaLite for Recommendations

This post is based on learning's for assignment link1, link2

Input is User-Items file as listed below

Sample Execution Command

We will be supplying parameter 20 in user20.txt to identify recommendations for user 20. The recommender type is mentioned in the --recommender parameter

Happy Learning!!!

October 30, 2016

Day #38 - Python Matrix Operations Learnings

Happy Learning!!!

October 12, 2016

Day #37 - Numpy Learnings - Matrices

Happy Learning!!!

October 10, 2016

Day #36 - Pandas Dataframe Learning's

Happy Learning!!!

Day #35 - Bias Vs Variance

These are frequently occurring terms with respect to performance of model against training and testing data sets.

Classification error = Bias + Variance

Bias (Under-fitting)

Bias is high if the concept class cannot model the true data distribution well, and does not depend on training set size.
High Bias will lead to under-fitting

How to identify High Bias

Training Error will be high
Cross Validation error also will be high (Both will be nearly the same)

Variance(Over-fitting)

High Variance will lead to over-fitting

How to identify High Variance

Training Error will be high
Cross Validation error also will be Very Very High compared to training error

Hot to Fix ?
Variance decreases with more training data, and increases with more complicated classifiers

Happy Learning!!!

October 08, 2016

Day #34 - What is diffference between Logistics Regression and Naive Bayes

Both are probabilistic
Logistics

Discriminative (Entire approach is purely discriminative)
P(Y/X)
Final Value lies between Zero and 1
Formula given by exp(w0+w1x)/(exp(w0+ w1x)+1)
Further can be expressed as 1/(1+(exp-(w0+ w1x))

Binary Logistic Regression - 2 class

Multinomial Logistic Regression - More than 2 class

Example - Link

Link - Ref
Logistic Regression

Classification Model
Probability of success as a sigmoid function of a linear combination of features
y belongs to (0,1) - 2 Class problem
p(yi) = 1 / 1+e-(w1x1+w2x2)
Linear combination of features - w1x1+w2x2
w can be found with max likelihood estimate-

Naive Bayes

Generative Model
P(X/ Given Y) is Naive Bayes Assumption
Distribution for each class

Happy Learning

October 04, 2016

Day #33 - Pandas Deep Dive

Happy Learning!!!

October 02, 2016

Good Data Science Course Links

AI Lectures

Introduction to Machine Learning

Happy Learning!!!

Short Analytics Concept Videos

Descriptive Analysis (Analysis of existing data, Trends and Patterns),
Diagnostic Analysis (Reasons / Patterns behind events)
Predictive Analytics (Future how will it look like)
Prescriptive Analysis (How to be prepared / handle the future)

Great Compilation, Keep Learning!!!

October 01, 2016

Day #32 - Regularization in Machine Learning

A large coefficient will result in overfitting. To avoid we perform regularization. Regularization - To avoid overfitting

L1 - Sum of values (Lasso - Least absolute shrinkage and selection operator). L1 will be meeting in co-ordinates and result in one of the dimensions zero. This would result in variable elimination. The features that minimally contribute will be ignored.
L2 - Sum of squares of values (Ridge). L2 is kind of circle shaped. This will shrink all coefficient in same proportion but eliminate none
Discriminative - In SVM we use hyperplane to classify the classes. This is example for discriminative approach
Probabilistic - Generated by Gauss Distribution. This is again based on Central Limit Theorem. Infinite points will fit into a Normal distribution. Here we apply gauss distribution model
Max Likelihood - Probability that the point p belongs to one distribution.

Good Read for L2 - Indeed, using the L2 loss comes from the assumption that the data is drawn from a Gaussian distribution

Another Read -

L1 Loss function minimizes the absolute differences between the estimated values and the existing target values. L1 loss function is more robust and is generally not affected by outliers
L2 loss function minimizes the squared differences between the estimated and existing target values. L2 error will be much larger in the case of outliers

Happy Learning!!!