- Calculate statistics of derived features from neighborhood analysis
- User_id / Page_id / Ad_price / Ad_position
- Use label encoder
- Treat data points implicitly
- Add lowest and highest price for position of add
- maximum and minimum price values
- Pages user visited
- Standard deviation of prices
- Most visited page
- Many more features
- Introduce new information
- Number of houses in 500m, 1000m
- Average price per sq.m
- Number of schools / supermarkets / parking lots in 500m / 1000m
- Distance to closest substation
- Embrace both group-by and nearest neighbor methods
- Approach for feature extraction
- User / Items mapping matrix
- User - Attributes matrix
- U X M = R
- Row and column related features
- BOW represent larger parse vector
- Document represented by small dense vector (Dimensionality reduction)
- Matrix Factorizations
- SVD, PCA, TruncatedSVD for sparse matrices
- NMF (Non-Negative Matrix Factorization) - Zero or Positive Number
- NMF makes data suitable for decision trees
- Used for Dimensionality reduction
x_all = np.cancatenate([x_train,x_test])
pca.fit(x_all)
x_train_pca = pca.transform(x_train)
x_test_pca = pca.transform(x_test)
Happy Learning!!!
No comments:
Post a Comment