"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

November 24, 2015

t-test and z-test





Problems to workout (Good Compiled List)

References
Link1
Link2

Z - Scores

Z - Scores makes it easy to compare scores from distributions using different scales

Formula #1


Formula #2








Formula #3 for raw score computation is defined by 


Formula #4 for Standard Error

Trying out problems in link 

Problem 2. Suppose X is a normal random variable with a mean of 120 and a standard deviation of 20. Determine the probability that X is greater than 135.

Mean = 120
SD = 135
Z score = (135-120)/20 = 0.75

z score from attached link



Find P(Z < 0.75) = 0.7734
1 - 0.7734 = 0.2266

Problem 4. If the test scores of 400 students are normally distributed with a mean of 100 and a standard deviation of 10, approximately how many students scored between 90 and 110?
Mean = 100
SD = 10

For x = 90, z = (90-100)/10 = -1
For x = 110, z = (110-100)/10 = 1
For Z (< -1), 


= 0.1587
For Z (<1), 

= 0.8413
= 0.8413-0.1587
= 0.6826

Multiply this percentage by 400. After rounding, we get 273 students.

Problem 16. A traffic study shows that the average number of occupants in a car is 1.5 and the standard deviation is .35. In a sample of 45 cars, find the probability that the mean number of occupants is greater than 1.6.

Mean = 1.5
SD = .35

Applying Formula #2

P(mean > 1.6) = 1- P(mean < 1.6)
Z(1.6) = ((1.6-1.5)*sqrt(45)) / 0.35
          = 1.916

P(Z<1.6) =  0.9719
P(Z>1.6) = 1- 0.9719 = 0.0281

Happy Learning!!!


November 21, 2015

chi-square test for homogeneity

The chi-square test for homogeneity is a test made to determine whether several populations are similar or equal or homogeneous in some characteristics

This link was useful

I tried the problem provided in the link

Problem - Know how to compute the chi-square homegeniety test statistics.

Step 1 


Step 2



Step 3



1-pchisq(19,df=2) - R Command
7.485183e-05

Since it is less than 0.05, you reject the null hypothesis

Happy Learning!!!

Chi Square Test for Independence

  • Uses a cross classification table to examine the nature of the relationship between these variables
  • Tables are sometimes referred to as contingency tables
  • Determine variables are dependent on each other or not
Approach
  • H0: chi square test for independence is conducted by assuming that there is no relationship between the two variables
  • Ha: alternative hypothesis is that there is some relationship between the variables
The general formula for the degrees of freedom is the number of rows minus one, times the number of columns minus 1.

In terms of independence and dependence these hypotheses could be stated
  • H0 : X and Y are independent
  • H1 : X and Y are dependent
Expected Frequency = ((row total)*(column total))/Total Population

I liked the example provided in link  

Problem - Test for a Relationship between Sex and Class

X (Sex)
Y (Social Class) Male(M) Female(F) Total
Upper Middle (A) 33 29 62
Middle (B) 153 181             334
Working (C) 103 81 184
Lower (D) 16 14 30
Total 305 305            610

Table 10.12: Social Class Cross Classified by Sex of Respondents

Expected Frequency = ((row total)*(column total))/Total Population



1-pchisq(4.8748,df=3)
 0.1811978
Significance is greater than or equal to 0.05, you don't reject the null hypothesis

Results match with the problem although approach is different. The sum total sum is 610 (Total Sum)

Happy Learning!!!

Stats - Chi-Square Goodness of Fit Test

Purpose -  Test association of variables in two-way tables

The chi-square test is defined for the hypothesis:
H0: The data follow a specified distribution
Ha: The data do not follow the specified distribution
This means that if the significance value is less than 0.05, you reject the null hypothesis; if significance is greater than or equal to 0.05, you don't reject the null hypothesis

Formula is
I liked the example mentioned in notes

Problem - Testing an octadedral die to see if it is biased

Score 1 2 3 4 5 6 7 8
Frequency 7 10 11 9 12 10 14 7 (Observed)

Degree of Freedom = Number of entries - 1. Here is is 8-1 = 7
Test the hypothesis H0 - The Die is Fair
H1: Die is not fair
Significance level alpha = 0.005

Expected frequency is uniform distribution of Ei = Sum of all observed scores / 8(Number of items)
= 80/8 = 10

The expected values will be
Score 1 2 3 4 5 6 7 8
Frequency 10 10 10 10 10 10 10 10 (Expected)

To compute the score we need to find values of (Oi-Ei ), ((Oi-Ei )*(Oi-Ei ))/ Ei

For each element between  both the arrays


Compute chisquare value (R Command)
1-pchisq(4,df=7)
0.7797774

This is above significance level > 0.05. So we cannot reject null hypothesis

Answer - The Die is Fair

Happy Learning!!!

Good Read on Taylor Seris

Two summary points
  • A Taylor Series is an expansion of a function into an infinite sum of terms, like these ones
  • A derivative gives you the slope of a function at any point
Detailed Notes in link
Taylor series Formula Compilation - link

Happy Learning!!!

November 08, 2015

K Means Clustering


I'm slowly moving in Stats with a lot of learning. This post is from my class notes

K-means clustering

  • Finding groups of object similar to one another
  • Partitioning cluster approach
  • Mean moves every time (Within first few iterations it will converge)
  • Classify a given data set through a certain number of clusters
  • This does not fit well for Sparse / Dense clusters

Great 5 Minute Video



Step 1 - "Figure out centric of region"
Step 2 - "Select K Data points randomly"
Step 3 - "Assign each data point to nearest centre"
Step 4 - "Recalculate the new centroids"
Step 5 - "Repeat Step 3,4"

More Reads - K-Means Clustering

DTW  - Dynamic Time Warping Algorithm. DTW - allowing similar shapes to match even if they are out of phase in the time axis

Ref - Link

Happy Learning!!!

November 02, 2015

Quick Tip - Python Stemming Module Installation - Windows


Copy the scripts to package folder. Run the command easy_install.py specifying the package containing scripts.

Happy Learning!!!