"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

January 01, 2020

Day #309 - Handle Categorical Columns

Have a Great, Peaceful and Successful 2020
This post is on Handling Categorical Columns
import pandas as pd
#Define Data Frames
Data = {'Location': ['Singapore', 'India', 'Japan', 'China','Korea'],
'avgage': [22,38,26,35,22],
'Education': ['UG','PG','Phd','UG','PG']
}
Dataset = pd.DataFrame(Data)
#Categorize Location
location = Dataset['Location']
catlocation = pd.get_dummies(location)
print(catlocation)
#Categorize Education
education = Dataset['Education']
cateducation = pd.get_dummies(education)
print(cateducation)
#Standardize Avg Age
from sklearn import preprocessing
age = Dataset['avgage'].values
min_max_scaler = preprocessing.MinMaxScaler()
age_scaled = min_max_scaler.fit_transform(age.reshape(-1, 1))
agedf = pd.DataFrame(age_scaled)
print(agedf)
#Merge all the data
frames = [catlocation,cateducation,agedf]
#Merge three frames horizontally
merged_data = pd.concat(frames, axis=1)
print(merged_data)
view raw catcolumns.py hosted with ❤ by GitHub
Happy Learning!!!

No comments: