"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

August 31, 2016

Day #29 - Decision Trees

  • Hierarchical, Divide and Conquer strategy, Supervised algorithm
  • Works on numerical data
  • Concepts discussed - Information gain, entropy computation (Shanon entropy)
  • Pruning based on chi-square / Shannon entropy
  • Convert all string / character into categorical / numerical mappings
  • You can also bucketize continuous variables
Basic Python pointers

#Data pre-process examples
#Step 1 - Read Training data
input_file = "Training.xls"
df = pd.read_csv(input_file,header=0)
#Step 2 - Replace with numerical values
d = {'Male': 1, 'Female':2}
df['Gender'] = df['Gender'].map(d)
#Step 3 - Remove insignificant id column
df.drop(['SerialNo'],1,inplace=True)
#Step 4 - Handle Missing Data
df = df.fillna(-99)
#Step 5
#Get list of all column names
features = list(df.columns)
#Step 6
#Identify X (Features) and Y (Predictor) Values
features = list(df.columns[:18])
y = df['Predictor']
x = df[features]
Good Reads
Link1 , Link2, Link3, Link4, Link5, Link6

Happy Learning!!!

No comments: