Linear Regression
- Analyze relationship between two or multiple variables
- Goal is preparing equation (Outcome y, Predictors x)
- Estimate value of dependent and independent variables using relationship equations
- Used for Continuous variables that have some correlation between them
- Goodness of fit test to validate model
- Explains relationship between two variables
- X (Independent- Predictor), Y (Dependent)
- Y = AX + B
- A (Slope) = (Y/X)
- B - Intercept (Value of Y when X =0)
- Equation becomes predictor of Y
- Sum of squares of vertical distances minimal
- Best Line = least residual
- Difference between model and actual values are called as residuals
- R square measure
- Sum of squares of distances (Sum of squares of vertical distances minimal)
- Uses residual values
- Higher R square value better the fit (close to 1 higher fit)
- Higher Correlation means better fit (R square will also be high)
- Multiple predictors involved (X Values)
- More than one independent variable used to predict dependent variable
- Y = A1X1 + A2X2 + A3X3 +ApXp + B
Homoscedasticity - all random variables in the sequence or vector have the same finite variance
heteroscedasticity - variability of a variable is unequal across the range of values of a second variable that predicts it
Pearson's correlation coefficient (r) is a measure of the strength of the association between the two variables
Happy Learning!!!
No comments:
Post a Comment