- Looking data, Understanding data
- Complete data understanding required to build accurate models
- Generate Hypothesis / Apply Intuition
- Top solutions use Advanced and Aggressive Modelling
- Find insights and magic feature, Start with EDA before hardcore modeling
- Identify Patterns (Visualization to idea)
- Use patterns to find better models (Idea to visualization, Hypothesis testing)
- Domain Knowledge (Google, Wikipedia understand data)
- Check data is Intuitive (Values in data validate based on acquired domain knowledge, Manual correction of error, Mark incorrect rows and label them for model to leverage it)
- Understand how data is generated (Test set / Training set generated by the Same Algorithm ? / Need to know underlying data generation Process / Visualize Training / Test set plots)
Anonymized Data
- Replace data with encrypted text (This will not impact model though)
- No meaningful names of columns
- Find unique values of features, sort them and find differences
- Distance between two consecutive features and the pattern for it
- Guess the meaning of the columns
- Guess the types of the column (Categorical, Boolean, Numeric etc..)
- Find relation between pairs
- Find feature groups
- df.dtypes
- df.info()
- x.value_counts()
- x.isnull()
Happy Learning and Coding!!!
No comments:
Post a Comment