Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Day #72- Feature Generation

October 27, 2017

Day #72- Feature Generation - Numeric Features

Feature Generation

Predict Apple Sales (Linear Trend)
Examples - Add features indicating week number, GBDT will consider min calculated value for each week
Created Generated Tree

Numeric Features - Preprocessing

Tree based Methods (Decision Tree)
Non Tree based Methods (NN, Linear Model, KNN)

Technique #1 - Scaling of values

Apply Regularization in equal amounts
Do proper scaling

Min Max Scalar

To [0,1]
sklearn.preprocessing.MinMaxScaler
X = (X-X.min())/(X.max()-X.min())

Standard Scaler

To mean = 0, std = 1
sklearn.preprocessing.StandardScaler
X = (X-X.mean())/X.std()

Preprocessing (Scaling) should be done for all features not just for fewer features. Initial impact on the model will be roughly similar
Preprocessing Outliers

Calculate lower and upper bound values
Rank transformation
Better option than Min-Max Scale

Ranking, Transformations

scipy.stats.rankdata
Log transformation - np.log(1+x)
Raising to power < 1 - np.sqrt(x+2/3)

Feature Generation (Based on Feature Knowledge, Exploratory Data Analysis)

Creating new features
Engineer using prior knowledge and logic
Example, Adding price per square feet if price and size of plot is provided

Summary

Tree based methods don't depend on scaling
Non-Tree methods hugely depend on scaling

Most often used preprocessing

MinMaxScaler - to [0,1]
StandardScaler - to mean==0, std==1
Rank - sets spaces between sorted values to be equal
np.log(1+x) and np.sqrt(1+x)

Happy Learning and Coding!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

October 27, 2017

Day #72- Feature Generation - Numeric Features

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts