-
异常值检测Detect Outliers
In statistics, outliers are data points that don’t belong to a certain population. It is an abnormal observation that lies far away from other values. An outlier is an observation that diverges from othervise well-structured data.
There are several ways to detect anomalies.
Detect Outlier这个概念,更多是用在machine learning处理数据时。
去极值是一个更广泛的概念,极值是异常值的一种,先找出极值(异常值),再去掉极值,算是一个完整的“去极值”过程。
-
Detect Anomalies
-
1.Standard Deviation
For a data distribution is approximately normal then about 68% of the data values lie within one standard deviation of the mean and about 95% are within two standard deviations, and about 99.7% lie within three standard deviations.
-
2.Boxplots
Interquartile Range
-
3.DBScan Clustring
DBScan is a clustering algorithm that’s used cluster data into groups.
-
4.Isolation Forest
Isolation Forest is an unsupervised learning algorithm that belongs to the ensemble decision trees family.
-
5.Robust Random Cut Forest
Random Cut Forest (RCF) algorithm is Amazon’s unsupervised algorithm for detecting anaomalies.
-
6.Minimum Covariance Determinant
-
7.Local Outlier Factor
The local outlier factor(LOF) is a technique that attempts to harness the idean of nearest neighbors for outlier detection.
-
8.One-Class SVM
-
9.Z-Score
-
Anomaly Detection vs. Outlier detection
去极值Detect Outliers的几种方案:MAD、3sigma
最新推荐文章于 2023-10-05 06:00:00 发布