Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): R

Showing posts with label R. Show all posts

March 10, 2016

Regression Basics

This post is on basics of Regression and Steps Involved. Linear Regression defines relationships between variables involved. We use it to identify relationships between variables.

Steps Involved

Plot line between Independent Variable in X Axis, Dependent Variable Y Axis
Identify if their positive or negative relationship (When X increases with respective to Y it is positive)
Plot a line that minimizes errors between estimates / actuals

Y = B0 + B1X (B0, B1 Derived Mathematically)
where B0 is Y Intercept, B1 is Slope

R Squared

R Squared Verification

How well regression line predicts actual values
Take Actual values (compute mean of them). Distance between actual value of mean will sum up to zero
Perfect fit R square equals 1

Standard Error of Estimates

Compare estimated values vs Actual Values
Distance between estimated and actual values

Correlation Coefficient

Fit the line
Remember slope +ve or -ve
Scatter along Y and X Axis
High Correlation means good fit

In next post we will look @ R Examples

Happy Learning!!!

February 29, 2016

Naive Bayes Classifier

Naive Based Classifier Notes and Examples

Work on assumption occurrence of word i is not dependent on occurrence of word i+1
Usually a sentence will have context only when words occur with appropriate terms and positions
For example purpose, we have listed below two classes and a testing document to classify the same

Ref - Link

Bayesian data analysis is a fundamental concept in data science. But it took me 2 years to understand its importance. In 2 minutes, I'll share my best findings over the last 2 years exploring Bayesian Modeling. Let's go.

1. Why Bayesian Data Analysis? Bayesian modeling is a… pic.twitter.com/nowKIL4AB4
— 🔥 Matt Dancho (Business Science) 🔥 (@mdancho84) February 6, 2024

Happy Learning!!!

February 23, 2016

Hierarchical Clustering

Compute distance in every pair of cluster
Merge nearest ones until number of clusters = number of clusters needed
Entire process can be represented as dendrogram
At the end of the algorithm dendogram is plotted

Measuring Distance between clusters

Single (Minimum Distance between two pairs one from each clusters)
Complete (Maximum between two pairs one from each clusters)
Average (Average of all possible pairs)

Happy Learning!!!

K-medoids, K-means

Great Learning and lot of revisions needed to really deep dive and understand the fundamentals.

K-means

Prone to outliers (Squared Euclidean gives greater weight to more distant points)
Can't handle categorical data
Work with Euclidean only

K-Medoids

Restrict centre to data points
Centre picked up only from data points
We use same sum of squares for cost function but distance is not Euclidean distance
Use your own custom distance functions when involved with numerical and categorical variables
Example (25 languages, 24 columns, M/F/N - 2 columns) - Compute your own custom distance functions. It is one less because all zero combinations will also be treated as one attribute

Distance measure for numerical variables

Euclidean based distance
Correlation based distance
Mahalanobis distance

Distance measure for category variables

Matching coef and Jaquard’s coef

Happy Learning!!!

February 22, 2016

R and SQL Server

This post is example for querying SQL Server and visualizing data using twitter. Package used is ROBDC. Sample walk through code snippet provided.

Happy Learning!!!

February 19, 2016

R Kaggle Exercise - Baby Names

Happy Learning!!!

R plot examples, matrix, aggregates, conditions examples

Happy Learning!!!

January 02, 2016

The Following course material is very useful for R + Stats Combinations. It's a great material for R learning. Captured below are notes from 5,6,7,8 chapters

What is a central limit theorem?

The central limit theorem states that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal. In practice, some statisticians say that a sample size of 30 is large enough when the population distribution is roughly bell-shaped

Binomial Probability - Only two mutually exclusive events often referred as success, failure. Also called bernouli trial (Link )

R commands - The dbinom and pbinom functions

Discrete Probability Distributions

R command - pnorm

Command Syntax - pnorm(x, mean = , sd = , lower.tail= )

Two-Tailed Tests - Testing for the possibility of the relationship in both directions. This means that .025 is in each tail of the distribution

One-Tailed Tests - one-tailed test allots all of your alpha to testing the statistical significance in the one direction of interest. This means that .05 is in one tail of the distribution of your test statistic.

Alternative hypothesis has the > operator, right-tailed test

Right-Tailed Tests: P-value = pnorm(zx¯, lower.tail=FALSE)

Alternative hypothesis has the < operator, left-tailed test

Left-Tailed Tests: P-value = pnorm(zx¯, lower.tail=TRUE)

Alternative hypothesis has the ≠ operator, two-tailed (left and right) test

Two-Tailed Tests: P-value = 2 * pnorm( abs(zx¯), lower.tail=FALSE)

pnorm(x, µ, σ),

x is an observation from a normal distribution
mean µ
standard deviation σ

Computing P value from t value

pt(abs(t-value), df=degree of freedom)

Reference

Link1

Link2

Happy Learning!!!

March 10, 2016

February 29, 2016

February 23, 2016

February 22, 2016

February 19, 2016

January 02, 2016

About Me

What is your Expertise

Search This Blog

Git Code Repository

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts