"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

October 29, 2017

Day #80 - Visualizations

EDA is an art. Visualizations are art tools. Several different plots to prove hypothesis

Visualization Tools
  • Histograms (Split into bins, how many points fall in each bins, vary number of bins) - plt.hist(x)
  • XGBoost will benefit from explicit missing values
  • Plots - index versus value, plt.plot(x,'.'), randomness over indices
  • Statistics
Explore Feature Relations
  • Scatter Plots (Draw one features vs other), Data distribution between train and test tests validate how they are distributed
  • Correlation Plots (Run K-means clustering and reorder feature) - How similar features are
  • Plot (index vs feature statistics)
Feature Groups
  • Generate new features based on groups
Pairs
  • ScatterPlot, Scatter matrix
  • Correlation Plot (Corrplot)
 Groups
  •  Corrplot + Clustering
  •  Plot (Index vs feature statistics)

More Read (Link)




Happy Learning and Coding!!!

No comments: