- Hierarchical, Divide and Conquer strategy, Supervised algorithm
- Works on numerical data
- Concepts discussed - Information gain, entropy computation (Shanon entropy)
- Pruning based on chi-square / Shannon entropy
- Convert all string / character into categorical / numerical mappings
- You can also bucketize continuous variables
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Data pre-process examples | |
#Step 1 - Read Training data | |
input_file = "Training.xls" | |
df = pd.read_csv(input_file,header=0) | |
#Step 2 - Replace with numerical values | |
d = {'Male': 1, 'Female':2} | |
df['Gender'] = df['Gender'].map(d) | |
#Step 3 - Remove insignificant id column | |
df.drop(['SerialNo'],1,inplace=True) | |
#Step 4 - Handle Missing Data | |
df = df.fillna(-99) | |
#Step 5 | |
#Get list of all column names | |
features = list(df.columns) | |
#Step 6 | |
#Identify X (Features) and Y (Predictor) Values | |
features = list(df.columns[:18]) | |
y = df['Predictor'] | |
x = df[features] |
Link1 , Link2, Link3, Link4, Link5, Link6
Happy Learning!!!