"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

November 28, 2017

Day #92 - Mean Encoding

Mean Coding
  • Add new variables based on certain features
  • Label encoding is done usually
  • Mean encoding is done as variable count / distinct unique variables
  • The proportion of label encoding also is included in this step
  • Min encoding with label encoding
  • Label encoding - No logical order
  • Mean encoding - Classes are separable
  • We can reach better loss with sorted trees
  • Trees need huge number of splits 
  • Model tries to treat all categories differently
Constructing Mean Encoding
  • Goods - Number of ones in a group
  • Bads - Number of zeros
Likelihood = Goods/(Goods + Bads) = mean(target)
Weight of Evidence = In(Goods/Bads)*100
Count = Goods = sum(target)
Diff = Goods-Bads


Happy Learning!!!

November 24, 2017

Database Sharding and Scalability Basics

Some Key considerations for NOSQL Vs RDBMS
  • Performance - Latency tolerance, How slow my queries can run for huge data sets
  • Durability - Data loss tolerance when database crashes losing in-memory or Lost transactions tolerance
  • Consistency - Weird results tolerance (Dirty data tolerance)
  • Availability - Downtime tolerance
Options for Scalability
  • Replication - Create copies of database, Application can talk to either database
  • Sharding - Sharding choosing a partition key, Key-value store partition based on key
  • Caching - Precomputed and stored, Manage cache expiration time and refresh logic
For streaming data we had already discussed Events Hub, Apache kafka. Now we have something called KSQL (Kafka streaming SQL to run on continuous data)

Great Session Talk

 

RDBMS VS NOSQL Considerations, Quick Summary
  • Performance - Latency tolerance
  • Durability - Data loss tolerance
  • Consistency - Weird results tolerance (Dirty data tolerance)
  • Availability - Downtime tolerance
Happy Learning!!!

November 16, 2017

Day #91- Retail Analytics - Data Mining / Analytics

Running a successful #Retail Store has a lot of Data Mining / Analytics challenges to solve and arrive at decisions based on data. Some of interesting Retail Data Mining / Analytics problems are
  • What sells best in each store with item level details
  • What are shopping time/routine for particular store
  • Using web data identify the relevance of shopping district / retail environment
  • What are money making items in the store (Quantity vs Price)
  • What is Sales / Stock ratio?
  • What is the forecast value of minimum orders for items in each store based on sales/traffic trends?
  • What is the correlation between Loss items, Shopping days/periods / people movements?
  • What is the retail price points identified based on End of Season Sales ?Forecasts / Predictions come as next steps after Data Analysis
Happy Analytics!!!

November 15, 2017

Day #90 - Regression Metrics Optimization

RMSE, MSE, R-Squared (Sometimes called L2 Loss)
Tree-Based
  • XGBoost, LightGBM
  • sklearn.RandomForestRegressor
Linear Models
  • sklearn.<>Regression
  • sklearn.SGDRegressor
Neural Networks
  • PyTorch
  • Keras
MAE (L1, Medial Regression)
Tree-Based
  • LightGBM
  • sklearn.RandomForestRegressor
MSPE, MAPE
  • MSPE is weighted version of MSE
  • MAPE is weighted version of MAE
Happy coding and learning!!!

November 14, 2017

Day #89 - Capsule networks

Key lessons
  • Instead of adding layers it nests layers inside it
  • We apply non-linearity to grouped neuros (capsule)
  • Dynamic routing - Replace scalar output feature detector of CNN by routing by agreement based on output
CNN History
  • Latest paper on capsule networks
  • Offers state of art performance for MNIST dataset
  • Convolutional networks - Learn mapping for input data and output label
  • Convolution layer - Series of matrix multiplication and summation operation, Output feature map (bunch of learned features from image)
  • RELU - Apply non-linearity to it (Network can learn both linear and non-linear functions). Solves vanishing gradient problem. (As gradeient is backpropagating its getting smaller and smaller, RELU prevents it)
  • Pooling - Creates sections and take maximum pixel value from each sections
  • Each line of code corresponds to layers in networks
  • Dropout - Neurons randomly turned on to prevent overfits (Regularization technique)
  • For handling rotations - AlexNet added different rotations to generalize to different rotations
  • Deeper networks improved classification accuracy
  • VGGnet adding more layers
  • Googlenet - Convolution with different sizes processed on same input, Several of those together
  • Resnet - Instead of stacking layers, Add operation improved vanishing gradient problem

Convolutional Network Challenges
  • As we go up the hierarchy each of features learnt will be more complex
  • Hierarchy happening with each layers
  • Sub-sampling loses spatial relationships
  • Spatial correlations are missed in sub-sampling and pooling
  • Bad for rotated images (Invariance issues)
Capsule Networks
  • Basic idea - Human brain attains transnational invariance in a better way, Instead of adding layers it nests layers inside it
  • Nested layer is called capsule, group of neurons
  • CNN route by pooling
  • Deeper in terms of nesting
Layer based squashing
  • Based on output neuron we apply non-linearity
  • We apply non-linearity to grouped neuros (capsure)
Dynamic routing
  • Replace scalar output by routing by agreement
  • Hierarchy tree of nested layers
Key difference - All iterations to compute output, For every capsule nested apply operations
Happy coding and learning!!!

Day #88 - Metrics Optimization

Loss vs Metric
  • Metric - Function which we want to use to evaluate the model. Maximum accuracy in classification
  • Optimization Loss - Easy to optimize for given model, Function our model optimizes. MSE, LogLoss
  • Preprocess train and optimize another metric - MSPE, MAPE, RMSLE
  • Optimize another metric postprocess predictions - Accuracy, Kapps
  • Early Stopping - Stop traning when models starts to overfit
 Custom loss functions

Accuracy Metrics





Happy Coding and Learning!!!

November 10, 2017

Day #87 - Classification Metrics

  • Accuracy (Essential for classification), Weighted Accuracy = Weighted Kappa
  • Logarithmic Loss (Depends on soft predictions probabilities)
  • Area under Receiver Operating Curve (Considers ordering of objects, tries all threshold to convert soft predictions to hard labels)
  • Kappa (Similar to R Squared)
Notations
N - Number of objects
L - Number of classes
y - Ground truth
yi - Predictions
[a = b] - indicator function
  • Soft labels (soft predictions) are classifier's scores - Probabilities of objects
  • Hard Labels (hard predictions) - argmax fi(x), [f(x)>b], b - threshold for binary classification, Predict label, maximum value from soft prediction and set class for prediction label. Function of soft label
Accuracy Score
  • Most referred measure of classifier quality
  • Higher is better
  • Need hard predictions
  • Number of correctly guessed objects
  • Argmax of soft predictions
Logloss
  • Work with soft predictions
  • Make classifier output posterior probabilities
  • Penalises for wrong answers
  • Set constant to frequencies of each class
Area Under Curve
  • Based on threshold decide percentage of above / below the threshold
  • Metric tries all possible ones and aggregate scores
  • Depends on order of objects
AUC - ROC
  • Compute TruePositive, FalsePositive
  • AUC max value 1
  • Fraction of correctly ordered pairs
AUC = Fraction of  correctly ordered pairs / total number of pairs
 = 1 - (Fraction of incorrectly ordered pairs / total number of pairs)

Cohen's Kappa
  • Score = 1- ((1-accuracy)/(1-baseline))
  • Baselines different for each data
  • Similar to R squared
  • Here R predictions for dataset used as baseline
  • Error = (1- Accuracy)
  • Weighted Error Score = Confusion matrix * Weight matrix and sum their results
  • Weighted Kappa = 1 - ((weighted error)/(weighted baseline error))
  • Useful for medical applications

Ref - Link


Happy Learning and Coding!!!

November 09, 2017

Day #86 - Regression Metrics

  • Relative Errors most important to us
  • MSW, MAE work with absolute error not for relative errors
  • MSPE (mean square percentage error)
  • MAPE (mean absolute percentage error) - Weighted version of MAE
  • RMSLE (Root mean square lograthmic error) - RMSE calculated in lograthmic scale - Cares about relative errors
Happy Coding and Learning!!!

November 07, 2017

Day #85 - Regression Metrics Optimization

Metrics
  • Metrics used to evaluate submissions
  • Best result finding optimal hyperplane
  • Exploratory metric analysis along with data analysis
  • Own ways to measure effectiveness of algorithms
Regression - Metrics
  • Mean Aquare Error
  • RMSE
  • R Squared
  • Same from optimization perspective
Classification
  • Accuracy
  • LogLoss
  • AUC
  • Cohen's Kappa
Regression Metrics
N - Samples
y - target values
y~ - target Predictions
yi - target ith value
yi~ - prediction ith object

Mean Square Error
MSE = 1/N(yi - yi~)^2
- Average the squared differences between actuals and targets

RMSE - Root Mean square Error = Sqrt(MSE)

  • Same as scale of target
  • RMSE vs MSE
  • Similar in terms of minimizers
  • Every RMSE minimizer is MSE minimizer
  • MSE(a) > MSE(b) <=> RMSE(a) > RMSE(b)
  • MSE orders in same way as RMSE
  • MSE easier to work with
  • Bit of difference in gradient based model
  • They may not be interchargeable for learning methods (learning rate)
R Squared
  • How much model is better than constant baseline
  • 1 predictions perfect
  • WHEN MSE is 0, R Square = 1
  • All reasonable models score between 0 and 1
MAE - Mean Absolute Error
  • Avg of absolute difference value between target and predictions
  • Widely used in Finance
  • 10$ Error twice worse than 5$ Error
  • MAE easier to justify
  • Median of target values useful for MAE
  • MAE gradient step function -1 smaller than target, +1 when greater than target
  • MAE is not differentiable
MAE vs MSE
  • For outliers - use MAS
  • unexpected but normal MSE
  • MAE robust to outliers
Happy Learning and Coding!!!

November 05, 2017

Day #84 - Data Leaks and Validations

  • Mimic Train / Test Splot as the test data
  • Perform KFold Validations
  • Choose best parameters for models
  • Submission Stage (Can't mimic exact train / test split)
  • Calculate mean and standard deviations of leader board scores
Data Leaks
  • Unexpected information in data that lets you make good predictions
  • Unusable in real world
  • Results of unintentional error
Time Series
  • Incorrect timesplits still exists
  • Check public and private splits
  • Missing feature columns are data leaks
Unexpected Information
  • Use File creation dates
  • Resize features / change creation date
  • ID's no sense to include in model
Happy Learning and Coding!!!