Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): January 2016

January 30, 2016

NOSQL Basics

Getting ready for second semester, Quick basics for NOSQL Basics. Two short reads on NOSQL

Difference between RDBMS & NOSQL

RDBMS - NOSQL

Scaleup - Scale out

Structured Data - Semi / Unstructured data

Atomic Transaction - Eventual Consistency

Stored structure differently in disk

Atomic vs Eventual Transactions

Atomic - ATM transactions (Either all changes made or none will be made)

Eventual Consistency - They cannot guarantee all are done at this point, They will be completed at some point (Eventually)

Happy Learning!!!

January 18, 2016

Type I Error - Rejecting the null hypothesis eventhough it is true
Type II error, also known as a "false negative": the error of not rejecting a null hypothesis when the alternative hypothesis is the true state of nature

I liked below comment from Khan Academy
The easiest way to think about Type 1 and Type 2 errors is in relation to medical tests. A type 1 error is where the person doesn't have the disease, but the test says they do (false positive). A type 2 error is where the person has the disease but the test doesn't pick it up (false negative).

Happy Learning!!

January 12, 2016

Loadrunner script generation from SOAP Web Service Request

I have been learning loadrunner basics for my work. I found this site extremely handy and useful

Loadrunner XML Tools

Basically this utility translates your soap request into loadrunner request. This is handy to customize, parametrize and take it further.

Happy Learning!!!

January 07, 2016

R and Hypothesis Tests

It took couple of months to completely Analyse and Arrive at Hypothesis testing learnings

Formulating and Identifying NULL Hypothesis and Alternate Hypothesis
Computing the Normal Distribution (Left Side, Right Side Both Side Tests)
Identifying Area under the region (Using pnorm in R language)
Compute T value or Z value
Compute P value
If p value < 0.05 then reject Null Hypothesis
If p value > 0.05 then accept Null Hypothesis (we fail to reject the null hypothesis. Or, to put it another way, if the p-value is high, the null will fly)

Finding P-Values Here we use the pnorm function.
Usage: P-value = pnorm(zx¯, lower.tail = ).

Left-Tailed Tests: P-value = pnorm(zx¯, lower.tail=TRUE)
Right-Tailed Tests: P-value = pnorm(zx¯, lower.tail=FALSE)
Two-Tailed Tests: P-value = 2 * pnorm( abs(zx¯), lower.tail=FALSE)

For below two problems Applying the above logic

R and Hypothesis Tests

Problem #1 - P Test Case
A rental car company claims the mean time to rent a car on their website is 60 seconds with a standard deviation of 30 seconds. A random sample of 36 customers attempted to rent a car on the website. The mean time to rent was 75 seconds. Is this enough evidence to contradict the company's claim? What is the p-value

H0 = No change in mean time
Ha > mean time is greater than 60 seconds

Population Mean = 60
Population SD = 30
Sample Population Mean = 75
Sample Count = 36

Considering - Population Mean = 60, Population SD = 30

SError of sample = sd / number of samples
Standard Error = 30 / sqrt(36)
Standard Error = 30 / 6 = 5

Z score = Sample Mean - Population Mean / Standard Error
Z score = 75-60/5 = 3

Two tailed tests since it has <> symbol
2*pnorm(75, mean=60, sd=5, lower.tail=FALSE)
p value = 0.002699796

Since p value is less than 0.05, you reject the null hypothesis

Problem #2
An outbreak of Salmonella related illness was attributed to ice cream produced at a certain factory. Scientists measured the level of Salmonella in 9 randomly sampled batches of ice cream. The levels (in MPN/g) were: 0.593 0.142 0.329 0.691 0.231 0.793 0.519 0.392 0.418. Is there evidence that the mean level of Salmonella in the ice cream is greater than 0.3 MPN/g? What is the p-value

H0 = mean is 0.3 MPN
Ha = mean is > 0.3 MPN

Option #1
Using R t-test
x = c(0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418)
t.test(x, alternative="greater", mu=0.3)

p-value = 0.02927, P value < 0.5 so we can reject null hypothesis

Option #2
populationmean = 0.3
samplemean = 0.4564444
standarddeviation = 0.2128439
9 random samples, degree of freedom = 8

collectedsample=c(0.593,0.142,0.329,0.691,0.231,0.793,0.519,0.392,0.418)
samplemean = mean(collectedsample)
standarddeviation = sd(collectedsample)
populationmean = 0.3
sdx = standarddeviation/3
t = (0.4564444-0.3)/(sdx)
t
df = 8
t value is 2.205058

pvalue = pt(-abs(t),df=8)
pvalue = pt(-abs(2.205058),df=8)

pvalue = 0.0292652

Since sample size is < 30 we cannot use pnorm function here

Happy Learning!!!

January 02, 2016

R + Stats

The Following course material is very useful for R + Stats Combinations. It's a great material for R learning. Captured below are notes from 5,6,7,8 chapters

What is a central limit theorem?

The central limit theorem states that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal. In practice, some statisticians say that a sample size of 30 is large enough when the population distribution is roughly bell-shaped

Binomial Probability - Only two mutually exclusive events often referred as success, failure. Also called bernouli trial (Link )

R commands - The dbinom and pbinom functions

Discrete Probability Distributions

R command - pnorm

Command Syntax - pnorm(x, mean = , sd = , lower.tail= )

Two-Tailed Tests - Testing for the possibility of the relationship in both directions. This means that .025 is in each tail of the distribution

One-Tailed Tests - one-tailed test allots all of your alpha to testing the statistical significance in the one direction of interest. This means that .05 is in one tail of the distribution of your test statistic.

Alternative hypothesis has the > operator, right-tailed test

Right-Tailed Tests: P-value = pnorm(zx¯, lower.tail=FALSE)

Alternative hypothesis has the < operator, left-tailed test

Left-Tailed Tests: P-value = pnorm(zx¯, lower.tail=TRUE)

Alternative hypothesis has the ≠ operator, two-tailed (left and right) test

Two-Tailed Tests: P-value = 2 * pnorm( abs(zx¯), lower.tail=FALSE)

pnorm(x, µ, σ),

x is an observation from a normal distribution
mean µ
standard deviation σ

Computing P value from t value

pt(abs(t-value), df=degree of freedom)

Reference

Link1

Link2

Happy Learning!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

January 30, 2016

NOSQL Basics

January 18, 2016

Type I and Type II Error

January 12, 2016

Loadrunner script generation from SOAP Web Service Request

January 07, 2016

R and Hypothesis Tests

January 02, 2016

R + Stats

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts