"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;
Showing posts with label R Tips. Show all posts
Showing posts with label R Tips. Show all posts

March 17, 2016

R Day #5 Tip of the Day - Linear Regression

I have taken up Udemy Data Science Classes. Below notes from Linear Regression Classes

Linear Regression 
  • Analyze relationship between two or multiple variables
  • Goal is preparing equation (Outcome y, Predictors x)
  • Estimate value of dependent and independent variables using relationship equations
  • Used for Continuous variables that have some correlation between them
  • Goodness of fit test to validate model
Linear Equations
  • Explains relationship between two variables
  •  X (Independent- Predictor), Y (Dependent)
  •  Y = AX + B
  •  A (Slope) = (Y/X)
  •  B - Intercept (Value of Y when X =0)
  •  Equation becomes predictor of Y
Fitting Line
  • Sum of squares of vertical distances minimal
  • Best Line = least residual
  • Difference between model and actual values are called as residuals
Goodness of Fit
  • R square measure 
  • Sum of squares of distances (Sum of squares of vertical distances minimal)
  • Uses residual values
  • Higher R square value better the fit (close to 1 higher fit)
  • Higher Correlation means better fit (R square will also be high) 
Multiple Regression
  • Multiple predictors involved (X Values)
  • More than one independent variable used to predict dependent variable
  • Y = A1X1 + A2X2 + A3X3 +ApXp + B

Homoscedasticity -  all random variables in the sequence or vector have the same finite variance
heteroscedasticity -  variability of a variable is unequal across the range of values of a second variable that predicts it
Pearson's correlation coefficient (r) is a measure of the strength of the association between the two variables

Happy Learning!!!

March 16, 2016

R Day #4 - Tip for the Day - Learning Resources


I came across this site NYU Stats R Site

Please bookmark it for a good walkthru and learning on R Core topics

Happy Learning!!!

March 13, 2016

R Day #3 - Tip for the Day

This is based on reading from notes from link 

Logistic Regression
  • Applied when response is binary
  • (0/1, yes/No etc..), Also known as dichotomous outcome variable 
Binomial probability model
  • consists of (i) n independent trials where 
  • (ii) each trial results in one of two possible outcomes (Yes/No, 1/0)
  • (iii) the probability p of a success stays the same for each trial
Maximum likelihood - Find the value of the parameter(s) (in this case p) which makes the observed data most likely to have occurred

Poisson Regression
Applied for below situations
  • The occurrences of the event of interest in non-overlapping “time” intervals are independent
  • The probability two or more events in a small time interval is small, and
  • The probability that an event occurs in a short interval of time is proportional to the length of the time interval
  • Heteroscedasticity - means unequal error variances
Negative Binomial Model
  • The Poisson model does not always provide a good fit to a count response. 
  • An alternative model is the negative binomial distribution
Happy Learning!!!

March 11, 2016

Day #2 - Multivariate Linear Regression - R

  • More than one predictor involved in this case

Happy Learning!!!

March 10, 2016

R Day #1 - Simple Linear Regression - Slope

UCLA Notes were very useful.

Linear Regression Model - Representing mean of response variable as function using slope and intercept parameters. Can be used for predictions. I have earlier used moving average algorithm for forecasting.
  • Simple Linear Regression - Explanatory variable is 1 (Dependent variable is 1)
  • Multivariate Linear Regression - Number of Explanatory variables more than 1
Good Summary of Data Quality Issues were summarized
  • Data-entry errors
  • Missing values
  • Outliers
  • Unusual (e.g. asymmetric) distributions
  • Unexpected patterns
R Cookbook had good step by step examples to try out - link

Basics Maths Again

Slope - lines rate of change in the vertical direction

y = mx + b
  • y = dependent variable as y depends on x
  • x = independent variable
  • m , b = characteristics of line
  • b = y intercept where line crosses y axis
Ref - Link

Slope     = Rise / Run
              = Change in y / Change in X

Equation y = x
1 = 1
2 = 2

Slope = y/x = 2/2 = 1
Slope = y2-y1 / x2-x1

Slope > 1 tilt upwards towards y axis
Slope < 1 tilt downwards towards x axis




Ref - Link


Ref - Link

Happy Learning!!!