"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

August 31, 2016

Day #29 - Decision Trees

  • Hierarchical, Divide and Conquer strategy, Supervised algorithm
  • Works on numerical data
  • Concepts discussed - Information gain, entropy computation (Shanon entropy)
  • Pruning based on chi-square / Shannon entropy
  • Convert all string / character into categorical / numerical mappings
  • You can also bucketize continuous variables
Basic Python pointers

#Data pre-process examples
#Step 1 - Read Training data
input_file = "Training.xls"
df = pd.read_csv(input_file,header=0)
#Step 2 - Replace with numerical values
d = {'Male': 1, 'Female':2}
df['Gender'] = df['Gender'].map(d)
#Step 3 - Remove insignificant id column
df.drop(['SerialNo'],1,inplace=True)
#Step 4 - Handle Missing Data
df = df.fillna(-99)
#Step 5
#Get list of all column names
features = list(df.columns)
#Step 6
#Identify X (Features) and Y (Predictor) Values
features = list(df.columns[:18])
y = df['Predictor']
x = df[features]
Good Reads
Link1 , Link2, Link3, Link4, Link5, Link6

Happy Learning!!!

August 15, 2016

Day #28 - R - Forecast Library Examples

Following Examples discussed. Library used - R - Forecast Library
  • Moving Average
  • Single Exponential Smoothing - Uses single smoothing factor
  • Double Exponential Smoothing - Uses two constants and is better at handling trends
  • Triple Exponential Smoothing - Smoothing factor, trend, seasonal factors considered
  • ARIMA

#Create Dataset
#Generate sample data of 100 records with mean = 850 and standard deviation = 900
myvector <- rnorm(1000, 850, 900)
myvector
#Split it into timeseries weekly data for 104 weeks
myts = ts(myvector, start=c(2014,1),end=c(2016,1),frequency=104)
myts
#Plot Data
plot(myts)
#Apply moving average
sm = ma(myts, order=4)#4 week average
plot(sm)
#Plotting with Single Exponential
fit = HoltWinters(myts,beta = FALSE, gamma = FALSE)
plot(fit)
#Plotting with Double Exponential
fit = HoltWinters(myts, gamma = FALSE)
plot(fit)
#triple exponential
fit = HoltWinters(myts)
#Additive triple exponential
fit1 = HoltWinters(myts, seasonal = "multiplicative")
#Multiplicative triple exponential
fit2 = HoltWinters(myts, seasonal = "additive")
#ARIMA
arimafit = arima(myts, order = c(0,0,0))
summary(arimafit)
plot(myts, xlim=c(2014,2016),lw=2,col="blue")
lines(predict(arimafit,n.ahead=50)$pred,lw=2,col="red")
#Auto Regression
arfit = ar(myts)
summary(arfit)
pred = predict(arfit,n.ahead=30)
plot(myts,type="l",xlim=c(2014,2016),ylim=c(100,2200),xlab="weeks",ylab="forecast")
lines(pred$pred,col="red")
Happy Learning!!!

August 08, 2016

Applied Machine Learning Notes


Supervised Learning
  • Classification (Discrete Labels)
  • Regression (Output is continuous, Example - Age, Stock prices)
  • Past data + Past Outputs used
Unsupervised Learning
  • Dimensionality reduction (Data in higher dimensions, Remove dimension without losing lot of information)
  • Reducing dimensionality makes it easy for computation (Continuous values)
  • Clustering (Discrete labels)
  • No Past outputs, Only current data
Reinforcement Learning
  • All Game Playing is unsupervised
  • Learning Policy
  • Negative / Positive reward for each step
Type of Models
  • Inductive (Learn model, Learn from a function) vs Transductive (Lazy learning ex- Opinion from like minded people)
  • Online (Learn from every new incoming tweet) vs Offline (Look past 1 Yeat tweet)
  • Generative (Apply Gaussian on Data, Use ML and compute Mean / Variance) vs Discriminative (Two sides of Line)
  • Parametric vs Non-Parametric Models
Happy Learning!!!