"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

August 25, 2021

Anamoly Reads

Ref - Link

Summary
  • Point Anomalies - Value is far outside the entirety of the data set
  • Conditional Outliers - With respect to context, Same value may not be anamoly in another time 
  • Collective Outliers - Set of 1 or more points that deviate from dataset



Key Notes
  • Clustering methods do not require the data to be labeled, making it a good fit for our unsupervised task. Very sensitive to outlier data points
Two-Step Process
  • The number of clusters can be set to 2 (one anomalous and one normal)
  • Summarized by taking averages across an interval of one hour
  • Rolling Window Sequences







Key Notes
  • Calculate Automatic correlation based on timeseries values
  • Identify local maxima
  • The seasonal trend identification module
  • Data store for Normal data, Anamoly data
  • Scoring module
  • Human in loop feedback system
Sklearn Models for Supervised Anomaly Detection. Some popular scikit-learn models for supervised anomaly detection include:
  • KNeighborsClassifier
  • SVC (SVM classifier)
  • DecisionTreeClassifier
  • RandomForestClassifier
  • Interquartile Range
  • Isolation Forest
  • Median Absolute Deviation
  • K-Nearest Neighbours
More Reads

#Z Score
#Mean, Variance, Standard Deviation
#Example - Data 2,4,6
#Sum = 12
#Average = 12/3 = 4
#Variance = squares of differences between all numbers and means.
#2 = (2-4) = 2*2 = 4
#4 = (4-4) = 0 = 0
#6 = (6-4) = 3*2 = 4
#variance = 8/3 = 2.66
#standard deviation = sqrt(2.66)
#z score = (x-mean)/std
#Ref - https://www.geeksforgeeks.org/z-score-for-outlier-detection-python/
import numpy as np
import math
x = [10, 12, 22, 45, 36, 14, 10, 125]
mean = np.mean(x)
std = np.std(x)
print('mean of the dataset is', mean)
print('std. deviation is', std)
print('1st SD',mean - 1 * std, mean + 1 * std)
print('2nd SD',mean - 2 * std, mean + 2 * std)
print('3rd SD',mean - 3 * std, mean + 3 * std)
for i in data:
z = (i-mean)/std
print(z)
view raw Zscorebasics.py hosted with ❤ by GitHub

Keep Reading!!!

No comments: