"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

September 07, 2015

Working with R - InterQuartile Range

Concept - IQR - InterQuartile Range

IQR = Q3 - Q1 = 3rd Quartile - 1st Quartile
  • Median - Arrange data from lowest to highest
  • On Even dataset - Average of two most middle numbers
  • On Odd dataset - Single Number that is halfway into the set
Dataset - 5,6,12,13,15,18,22,50

Q2 = (13+15)/2 = 14 - Median of Data Value

Q1 = (6+12)/2 = 9 - Median Before Q2

Q3 = (18+22)/2 = 20 - Median After Q2

IQR = Q3-Q1 = 20-9 = 11

BoxPlot is used to identify outliers

For Above Dataset
  • Minimum Value - 5
  • Q1 - 9
  • Q2 - 14
  • Q3 - 20
  • Maximum Value - 50
This is the mathematical concept. This is used for finding outliers.

Outlier - Much larger or smaller than other values in data set. IQR obtained by subtracting third vs first quartile. 

Finding Outliers
1. Any value < Q1-1.5(IQR) or > Q3+1.5(IQR) is an outlier
2. Any Value < (9-1.5(11)) = -7.5
    Any Value > 20+1.5(11) = 20+16.5 = 36.5

This Video was useful to understand the concept before trying out in R


Computing using R

IQR between 25th percentile and the 75th

dataset <-c( 5,6,12,13,15,18,22,50 )
quantile(x=dataset, probs= c(.25,.75))
IQR(x=dataset)
boxplot(dataset)

Sample Output


Outlier highlighted in circles

Happy Learning!!!

No comments: