Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): December 2017

December 31, 2017

Day #94 - Integrate R and Python

	#To invoke R Code from python there are a two options used

	#Option #1 - Invoke as external R file using subprocess command
	#myscript.R:
	#x <- 4
	#print(x)

	import subprocess
	subprocess.check_call(['Rscript', 'E:\\RNotes\\PetProject\\RModel\\Script.R'], shell=False)

	#Option #2 - Using rpy2 package

	import rpy2.robjects as robjects
	r_source = robjects.r['source']
	r_source("E:\\RNotes\\PetProject\\RModel\\Script.R")

view raw R_Python.py hosted with ❤ by GitHub

Happy Learning!!!

December 08, 2017

Day #93 - Regularizations

Four methods of Regularization

Cross Validation inside training data

4 to 5 folds of K-Fold Validations
Split into K non-intersecting subsets
Leave one out scheme
Target variable leakage is still present in K Fold Scheme

Smoothing based on size of category

Category big lot of data points
Formula = (mean(target)*nrows+globalmean*alpha)/(nrows+alpha)
alpha = category size we can trust

Add Random Noise

Unstable, Hard to make it work
Too much noise
LOO, Leave one out Regularization

Sorting and calculating mean on some type of data

Fix sorting order of data
Use Rows 0 to N-1 to calculate mean for N-1
Least Leakage


	#Cross Validation inside training data
	y_tr = df_tr['target'].values #target variable
	skf = StratifiedKFold(y_tr,5,shuffle=True,random_state=123)
	#iterate into chunks
	for tr_ind, val_ind in skf:
	x_tr, x_val = df_tr.iloc[tr_ind],df_tr.iloc[val_ind]
	#for all columns iterate and map estimated encodings to dataframes
	for col in cols:
	#iterate through columns we want to encode
	means = x_val[col].map(x_tr.groupby(col).target.mean())
	x_val[col+'_mean_target'] = means
	train_new.iloc[val_ind] = x_val
	#global mean
	prior = df_tr['target'].mean()
	#fill NANs with global mean
	train_new.fillna(prior,inplace=True)


	#Expanding Mean
	cumsum = df_tr.groupby(col)['target'].cumsum()-df_tr['target']
	cumcnt = df_tr.groupby(col).cumcount()
	train_new[col+'_mean_target']=cumsum/cumcnt

view raw Regularization.py hosted with ❤ by GitHub

Happy Learning!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

December 31, 2017

Day #94 - Integrate R and Python

December 08, 2017

Day #93 - Regularizations

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts