Ref - Link
Summary
- Point Anomalies - Value is far outside the entirety of the data set
- Conditional Outliers - With respect to context, Same value may not be anamoly in another time
- Collective Outliers - Set of 1 or more points that deviate from dataset
Key Notes
- Clustering methods do not require the data to be labeled, making it a good fit for our unsupervised task. Very sensitive to outlier data points
Two-Step Process
- The number of clusters can be set to 2 (one anomalous and one normal)
- Summarized by taking averages across an interval of one hour
- Rolling Window Sequences
Key Notes
- Calculate Automatic correlation based on timeseries values
- Identify local maxima
- The seasonal trend identification module
- Data store for Normal data, Anamoly data
- Scoring module
- Human in loop feedback system
Sklearn Models for Supervised Anomaly Detection. Some popular scikit-learn models for supervised anomaly detection include:
- KNeighborsClassifier
- SVC (SVM classifier)
- DecisionTreeClassifier
- RandomForestClassifier
- Interquartile Range
- Isolation Forest
- Median Absolute Deviation
- K-Nearest Neighbours
More Reads
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Z Score | |
#Mean, Variance, Standard Deviation | |
#Example - Data 2,4,6 | |
#Sum = 12 | |
#Average = 12/3 = 4 | |
#Variance = squares of differences between all numbers and means. | |
#2 = (2-4) = 2*2 = 4 | |
#4 = (4-4) = 0 = 0 | |
#6 = (6-4) = 3*2 = 4 | |
#variance = 8/3 = 2.66 | |
#standard deviation = sqrt(2.66) | |
#z score = (x-mean)/std | |
#Ref - https://www.geeksforgeeks.org/z-score-for-outlier-detection-python/ | |
import numpy as np | |
import math | |
x = [10, 12, 22, 45, 36, 14, 10, 125] | |
mean = np.mean(x) | |
std = np.std(x) | |
print('mean of the dataset is', mean) | |
print('std. deviation is', std) | |
print('1st SD',mean - 1 * std, mean + 1 * std) | |
print('2nd SD',mean - 2 * std, mean + 2 * std) | |
print('3rd SD',mean - 3 * std, mean + 3 * std) | |
for i in data: | |
z = (i-mean)/std | |
print(z) | |
Keep Reading!!!
No comments:
Post a Comment