机器学习应用方向(二)～概念漂移(concept drift)

最新推荐文章于 2025-05-03 19:56:36 发布

天狼啸月1990

最新推荐文章于 2025-05-03 19:56:36 发布

阅读量2.9k

点赞数

分类专栏：机器学习～machine learning 文章标签：机器学习

本文链接：https://blog.csdn.net/qq_33419476/article/details/105546094

版权

机器学习～machine learning 专栏收录该内容

14 篇文章

订阅专栏

本文探讨了数据流中潜在数据分布随时间变化的概念漂移现象，分析了其在机器学习和数据挖掘中的影响，并介绍了监督学习和非监督学习两种检测方法。提出了基于语义折叠的概念漂移检测新方法，通过比较不同概念的语义相似度来识别概念漂移。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. 概念漂移(concept drift)

　　背景：概念漂移指的是数据流中的潜在数据分布随时间发生不可预测的变化，使原有的分类器分类不准确或决策系统无法正确决策，常见于推荐系统、金融领域、决策等

　　　　Concept drift refers to unforeseeable changes in the underlying data distribution of data streams over time.

　　定义：Concept drift in machine learning and data mining refers to the change in the relationships between input and output data in the underlying problem over time. (https://machinelearningmastery.com/gentle-introduction-concept-drift-machine-learning/)

　　我的理解：目标函数target随时间发生不可预测性变化。比如：input(x1) --> target(x1) 概念漂移: input(x1) --> target(x2).

2. 概念漂移检测(concept drift detection method)

　　2.1 Supervised learning method

The supervised learning method usually depends on the underlying data distribution to compute the classification error rate, relative entropy, linear four rates(true positive rate, true negative rate, positive predictive value and negative predictive value). Although these methods can get high accuracy, they over-rely the distribution of underlying data and labeled data.

2.2 Unsupervised learning method

The unsupervised learning method usually computes the difference of adjacent data block to confirm whether the concept drift occurs, such as: the distance of topic feature space, the similarity of feature of time series and the Fuzzy Competence Model.

Although these methods don’t need prior knowledge of the underlying data and can output when, how, where concept drift occurs, the semantic information were absent to the detection of concept drift. A small number of samples will limit the application of unsupervised learning method.

3. Expected Method

core: 利用语义信息和算法表征不同的概念，进行相似度比较，如果不同名称的概念相似，则它们发生了概念漂移，因为它们的语义本质没有发生变化，e.g. 计算机和电脑，如果它发生了概念漂移，但它们的本质都是指代同一件事物。

To overcome the above limitations, I proposed a concept drift detection method based on semantic folding. Semantic folding can represent the semantic information and the of underlying context data by generating 128/256-bit hash vector. It will be more advantageous than topic feature space and maximum likelihood estimation to detect the concept drift. The following is the method steps:

(1) an initial sematic folding vector v1 extracted from original underlying data. (2) generate a new semantic folding vector v2 when new samples are available (3) compute the similarity or distance of two vectors v1 and v2.
(4) concept drift occurs when feature vectors differ significantly.

4. References

[1] FAN D, JIE L, GUANGQUAN Z, et al. Active fuzzy weighting ensemble for dealing with concept drift[J]. International journal of computational intelligence systems, 2018, 11: 438- 450.

[2] FAN D, GUANGQUAN Z, JIE L, et al. Fuzzy competence model drift detection for data- driven decision support systems(DSSs)[J]. Knowledge-Based systems, doi: 10.1016/j.knosys.2017.08.018.

[3] GUANG C, XUEGANG H, YUHONG Z. Semantic-based concept drift detection algorithm for data stream[J]. Computer Engineering, 2018, 44(2): 24-30.

[4] RODOLFO C, LEANDRO M, ADRIANO O. FEDD: Feature extraction for explicit concept drift detection in time series[C]. 2016 International joint conference on neural networks(IJCNN), 24-29/07/2016.

[5] SHUJIAN Y, ABRAHAM Z. Concept drift detection with hierarchical hypothesis testing[C]. 2017 SIAM International conference on data mining, 2017.