Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): August 2016

August 31, 2016

Day #29 - Decision Trees

Hierarchical, Divide and Conquer strategy, Supervised algorithm
Works on numerical data
Concepts discussed - Information gain, entropy computation (Shanon entropy)
Pruning based on chi-square / Shannon entropy
Convert all string / character into categorical / numerical mappings
You can also bucketize continuous variables

Basic Python pointers

	#Data pre-process examples

	#Step 1 - Read Training data
	input_file = "Training.xls"
	df = pd.read_csv(input_file,header=0)

	#Step 2 - Replace with numerical values
	d = {'Male': 1, 'Female':2}
	df['Gender'] = df['Gender'].map(d)

	#Step 3 - Remove insignificant id column
	df.drop(['SerialNo'],1,inplace=True)

	#Step 4 - Handle Missing Data
	df = df.fillna(-99)

	#Step 5
	#Get list of all column names
	features = list(df.columns)

	#Step 6
	#Identify X (Features) and Y (Predictor) Values
	features = list(df.columns[:18])
	y = df['Predictor']
	x = df[features]

view raw pythondatabasics.py hosted with ❤ by GitHub

Good Reads
Link1 , Link2, Link3, Link4, Link5, Link6

Happy Learning!!!

August 15, 2016

Day #28 - R - Forecast Library Examples

Following Examples discussed. Library used - R - Forecast Library

Moving Average
Single Exponential Smoothing - Uses single smoothing factor
Double Exponential Smoothing - Uses two constants and is better at handling trends
Triple Exponential Smoothing - Smoothing factor, trend, seasonal factors considered
ARIMA


	#Create Dataset
	#Generate sample data of 100 records with mean = 850 and standard deviation = 900
	myvector <- rnorm(1000, 850, 900)
	myvector
	#Split it into timeseries weekly data for 104 weeks
	myts = ts(myvector, start=c(2014,1),end=c(2016,1),frequency=104)
	myts

	#Plot Data
	plot(myts)

	#Apply moving average
	sm = ma(myts, order=4)#4 week average
	plot(sm)

	#Plotting with Single Exponential
	fit = HoltWinters(myts,beta = FALSE, gamma = FALSE)
	plot(fit)

	#Plotting with Double Exponential
	fit = HoltWinters(myts, gamma = FALSE)
	plot(fit)

	#triple exponential
	fit = HoltWinters(myts)

	#Additive triple exponential
	fit1 = HoltWinters(myts, seasonal = "multiplicative")

	#Multiplicative triple exponential
	fit2 = HoltWinters(myts, seasonal = "additive")

	#ARIMA
	arimafit = arima(myts, order = c(0,0,0))
	summary(arimafit)
	plot(myts, xlim=c(2014,2016),lw=2,col="blue")
	lines(predict(arimafit,n.ahead=50)$pred,lw=2,col="red")

	#Auto Regression
	arfit = ar(myts)
	summary(arfit)
	pred = predict(arfit,n.ahead=30)
	plot(myts,type="l",xlim=c(2014,2016),ylim=c(100,2200),xlab="weeks",ylab="forecast")
	lines(pred$pred,col="red")

view raw RForecastExample.R hosted with ❤ by GitHub

Happy Learning!!!

August 08, 2016

Applied Machine Learning Notes

Supervised Learning

Classification (Discrete Labels)
Regression (Output is continuous, Example - Age, Stock prices)
Past data + Past Outputs used

Unsupervised Learning

Dimensionality reduction (Data in higher dimensions, Remove dimension without losing lot of information)
Reducing dimensionality makes it easy for computation (Continuous values)
Clustering (Discrete labels)
No Past outputs, Only current data

Reinforcement Learning

All Game Playing is unsupervised
Learning Policy
Negative / Positive reward for each step

Type of Models

Inductive (Learn model, Learn from a function) vs Transductive (Lazy learning ex- Opinion from like minded people)
Online (Learn from every new incoming tweet) vs Offline (Look past 1 Yeat tweet)
Generative (Apply Gaussian on Data, Use ML and compute Mean / Variance) vs Discriminative (Two sides of Line)
Parametric vs Non-Parametric Models

Happy Learning!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

August 31, 2016

Day #29 - Decision Trees

August 15, 2016

Day #28 - R - Forecast Library Examples

August 08, 2016

Applied Machine Learning Notes

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts