"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

January 30, 2016

NOSQL Basics

Getting ready for second semester, Quick basics for NOSQL Basics. Two short reads on NOSQL

Difference between RDBMS & NOSQL

RDBMS - NOSQL
Scaleup - Scale out
Structured Data - Semi / Unstructured data
Atomic Transaction - Eventual Consistency
Stored structure differently in disk  

Atomic vs Eventual Transactions
Atomic - ATM transactions (Either all changes made or none will be made)
Eventual Consistency - They cannot guarantee all are done at this point, They will be completed at some point (Eventually)




Happy Learning!!!

January 18, 2016

Type I and Type II Error


Type I Error - Rejecting the null hypothesis eventhough it is true
Type II error, also known as a "false negative": the error of not rejecting a null hypothesis when the alternative hypothesis is the true state of nature

I liked below comment from Khan Academy
The easiest way to think about Type 1 and Type 2 errors is in relation to medical tests. A type 1 error is where the person doesn't have the disease, but the test says they do (false positive). A type 2 error is where the person has the disease but the test doesn't pick it up (false negative).

Happy Learning!!

January 12, 2016

Loadrunner script generation from SOAP Web Service Request


I have been learning loadrunner basics for my work. I found this site extremely handy and useful

Loadrunner XML Tools

Basically this utility translates your soap request into loadrunner request. This is handy to customize, parametrize and take it further.

Happy Learning!!!

January 07, 2016

R and Hypothesis Tests

It took couple of months to completely Analyse and Arrive at Hypothesis testing learnings
  • Formulating and Identifying NULL Hypothesis and Alternate Hypothesis
  • Computing the Normal Distribution (Left Side, Right Side Both Side Tests)
  • Identifying Area under the region (Using pnorm in R language)
  • Compute T value or Z value
  • Compute P value
  • If p value < 0.05 then reject Null Hypothesis
  • If p value > 0.05 then accept Null Hypothesis (we fail to reject the null hypothesis. Or, to put it another way, if the p-value is high, the null will fly)
Finding P-Values Here we use the pnorm function.
Usage: P-value = pnorm(zx¯, lower.tail = ).
  • Left-Tailed Tests: P-value = pnorm(zx¯, lower.tail=TRUE)
  • Right-Tailed Tests: P-value = pnorm(zx¯, lower.tail=FALSE)
  • Two-Tailed Tests: P-value = 2 * pnorm( abs(zx¯), lower.tail=FALSE)
For below two problems Applying the above logic

R and Hypothesis Tests

Problem #1 - P Test Case
A rental car company claims the mean time to rent a car on their website is 60 seconds with a standard deviation of 30 seconds. A random sample of 36 customers attempted to rent a car on the website. The mean time to rent was 75 seconds. Is this enough evidence to contradict the company's claim? What is the p-value

H0 = No change in mean time
Ha > mean time is greater than 60 seconds

Population Mean = 60
Population SD = 30
Sample Population Mean = 75
Sample Count = 36

Considering - Population Mean = 60, Population SD = 30

SError of sample = sd / number of samples
Standard Error = 30 / sqrt(36)
Standard Error = 30 / 6 = 5

Z score = Sample Mean - Population Mean / Standard Error
Z score = 75-60/5 = 3

Two tailed tests since it has <> symbol
2*pnorm(75, mean=60, sd=5, lower.tail=FALSE)
p value = 0.002699796

Since p value is less than 0.05, you reject the null hypothesis

Problem #2
An outbreak of Salmonella related illness was attributed to ice cream produced at a certain factory. Scientists measured the level of Salmonella in 9 randomly sampled batches of ice cream. The levels (in MPN/g) were: 0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418. Is there evidence that the mean level of Salmonella in the ice cream is greater than 0.3 MPN/g? What is the p-value

H0 = mean is 0.3 MPN
Ha = mean is > 0.3 MPN

Option #1
Using R t-test
x = c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418)
t.test(x, alternative="greater", mu=0.3)

p-value = 0.02927, P value < 0.5 so we can reject null hypothesis

Option #2
populationmean = 0.3
samplemean  = 0.4564444
standarddeviation  = 0.2128439
9 random samples, degree of freedom = 8

collectedsample=c(0.593,0.142,0.329,0.691,0.231,0.793,0.519,0.392,0.418)
samplemean = mean(collectedsample)
standarddeviation = sd(collectedsample)
populationmean = 0.3
sdx = standarddeviation/3
t = (0.4564444-0.3)/(sdx)
t
df = 8
t value is  2.205058

pvalue = pt(-abs(t),df=8)
pvalue = pt(-abs(2.205058),df=8)

pvalue = 0.0292652

Since sample size is < 30 we cannot use pnorm function here

Happy Learning!!!

January 02, 2016

R + Stats

The Following course material is very useful for R + Stats Combinations. It's a great material for R learning. Captured below are notes from 5,6,7,8 chapters

What is a central limit theorem?

The central limit theorem states that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal. In practice, some statisticians say that a sample size of 30 is large enough when the population distribution is roughly bell-shaped

Binomial Probability - Only two mutually exclusive events often referred as success, failure. Also called bernouli trial (Link )
R commands - The dbinom and pbinom functions

Discrete Probability Distributions

R command - pnorm
Command Syntax - pnorm(x, mean = , sd = , lower.tail= )

Two-Tailed Tests - Testing for the possibility of the relationship in both directions. This means that .025 is in each tail of the distribution

One-Tailed Tests - one-tailed test allots all of your alpha to testing the statistical significance in the one direction of interest. This means that .05 is in one tail of the distribution of your test statistic.

Alternative hypothesis has the > operator, right-tailed test 
Right-Tailed Tests: P-value = pnorm(zx¯, lower.tail=FALSE)

Alternative hypothesis has the < operator, left-tailed test 
Left-Tailed Tests: P-value = pnorm(zx¯, lower.tail=TRUE)

Alternative hypothesis has the ≠ operator, two-tailed (left and right) test
Two-Tailed Tests: P-value = 2 * pnorm( abs(zx¯), lower.tail=FALSE)

pnorm(x, µ, σ), 
  • x is an observation from a normal distribution 
  • mean µ 
  • standard deviation σ
Computing P value from t value 
pt(abs(t-value), df=degree of freedom)

Reference

Happy Learning!!!