"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

January 07, 2016

R and Hypothesis Tests

It took couple of months to completely Analyse and Arrive at Hypothesis testing learnings
  • Formulating and Identifying NULL Hypothesis and Alternate Hypothesis
  • Computing the Normal Distribution (Left Side, Right Side Both Side Tests)
  • Identifying Area under the region (Using pnorm in R language)
  • Compute T value or Z value
  • Compute P value
  • If p value < 0.05 then reject Null Hypothesis
  • If p value > 0.05 then accept Null Hypothesis (we fail to reject the null hypothesis. Or, to put it another way, if the p-value is high, the null will fly)
Finding P-Values Here we use the pnorm function.
Usage: P-value = pnorm(zx¯, lower.tail = ).
  • Left-Tailed Tests: P-value = pnorm(zx¯, lower.tail=TRUE)
  • Right-Tailed Tests: P-value = pnorm(zx¯, lower.tail=FALSE)
  • Two-Tailed Tests: P-value = 2 * pnorm( abs(zx¯), lower.tail=FALSE)
For below two problems Applying the above logic

R and Hypothesis Tests

Problem #1 - P Test Case
A rental car company claims the mean time to rent a car on their website is 60 seconds with a standard deviation of 30 seconds. A random sample of 36 customers attempted to rent a car on the website. The mean time to rent was 75 seconds. Is this enough evidence to contradict the company's claim? What is the p-value

H0 = No change in mean time
Ha > mean time is greater than 60 seconds

Population Mean = 60
Population SD = 30
Sample Population Mean = 75
Sample Count = 36

Considering - Population Mean = 60, Population SD = 30

SError of sample = sd / number of samples
Standard Error = 30 / sqrt(36)
Standard Error = 30 / 6 = 5

Z score = Sample Mean - Population Mean / Standard Error
Z score = 75-60/5 = 3

Two tailed tests since it has <> symbol
2*pnorm(75, mean=60, sd=5, lower.tail=FALSE)
p value = 0.002699796

Since p value is less than 0.05, you reject the null hypothesis

Problem #2
An outbreak of Salmonella related illness was attributed to ice cream produced at a certain factory. Scientists measured the level of Salmonella in 9 randomly sampled batches of ice cream. The levels (in MPN/g) were: 0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418. Is there evidence that the mean level of Salmonella in the ice cream is greater than 0.3 MPN/g? What is the p-value

H0 = mean is 0.3 MPN
Ha = mean is > 0.3 MPN

Option #1
Using R t-test
x = c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418)
t.test(x, alternative="greater", mu=0.3)

p-value = 0.02927, P value < 0.5 so we can reject null hypothesis

Option #2
populationmean = 0.3
samplemean  = 0.4564444
standarddeviation  = 0.2128439
9 random samples, degree of freedom = 8

collectedsample=c(0.593,0.142,0.329,0.691,0.231,0.793,0.519,0.392,0.418)
samplemean = mean(collectedsample)
standarddeviation = sd(collectedsample)
populationmean = 0.3
sdx = standarddeviation/3
t = (0.4564444-0.3)/(sdx)
t
df = 8
t value is  2.205058

pvalue = pt(-abs(t),df=8)
pvalue = pt(-abs(2.205058),df=8)

pvalue = 0.0292652

Since sample size is < 30 we cannot use pnorm function here

Happy Learning!!!

No comments: