"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

March 17, 2016

R Day #5 Tip of the Day - Linear Regression

I have taken up Udemy Data Science Classes. Below notes from Linear Regression Classes

Linear Regression 
  • Analyze relationship between two or multiple variables
  • Goal is preparing equation (Outcome y, Predictors x)
  • Estimate value of dependent and independent variables using relationship equations
  • Used for Continuous variables that have some correlation between them
  • Goodness of fit test to validate model
Linear Equations
  • Explains relationship between two variables
  •  X (Independent- Predictor), Y (Dependent)
  •  Y = AX + B
  •  A (Slope) = (Y/X)
  •  B - Intercept (Value of Y when X =0)
  •  Equation becomes predictor of Y
Fitting Line
  • Sum of squares of vertical distances minimal
  • Best Line = least residual
  • Difference between model and actual values are called as residuals
Goodness of Fit
  • R square measure 
  • Sum of squares of distances (Sum of squares of vertical distances minimal)
  • Uses residual values
  • Higher R square value better the fit (close to 1 higher fit)
  • Higher Correlation means better fit (R square will also be high) 
Multiple Regression
  • Multiple predictors involved (X Values)
  • More than one independent variable used to predict dependent variable
  • Y = A1X1 + A2X2 + A3X3 +ApXp + B

Homoscedasticity -  all random variables in the sequence or vector have the same finite variance
heteroscedasticity -  variability of a variable is unequal across the range of values of a second variable that predicts it
Pearson's correlation coefficient (r) is a measure of the strength of the association between the two variables

Happy Learning!!!

No comments: