Covariate shift and Concept drift
一、Covariate shift
What is covariate shift? 参考资料:(What is Covariate Shift? - Seldon)
Covariate shift is a specific type of dataset shift often encountered in machine learning. It is when the distribution of input data shifts between the training environment and live environment. Although the input distribution may change, the output distribution or labels remain the same. Covariate shift is also known as covariate drift, and is a very common issue encountered in machine learning. Models are usually trained in offline or local environments on a sample of labelled training data. It’s not unusual for the distribution of inputs in a live and dynamic environment to be different from the controlled training environment.
简单来说就是训练数据的分布和真实世界数据的分布不同导致的数据偏移
(The algorithms will have been trained to map input to output data, and may not recognise input features on a different distribution)
Covariate shift 可能是模型缺乏充分泛化能力的标志。
例如:不同光照下导致的图像数据的偏移、语音识别现实中口音的影响
二、Concept drift(概念漂移)
参考链接:(Machine Learning Concept Drift - What is it and Five Steps to Deal With it - Seldon)
Concept drift is a specific type of model drift, and can be understood as changes in the relationship between the input and target output
随着时间的推移,真实世界输入变量的属性可能改变,静态数据训练出来的模型会受影响。
(概念漂移是一个更难检测到的问题,不是因为数据损坏而导致模型破裂或失败)
例如:消费者的行为时时刻刻都在改变,而模型训练的数据是基于历史上消费者的,随着时间的推移,
模型可能不再适用于预测消费者的行为。
检测方法:
1、持续监控机器学习模型的准确性和性能,以了解性能是否随着时间的推移而恶化。
2、随着时间的推移,监控机器学习模型预测的平均置信度得分。这特别用于对图像或文本等数据进行分类的模型。如果平均置信度随时间变化,就可能发生概念漂移。
解决方法:
1、建立一个概念漂移检测过程。
2、维护一个静态模型作为比较的基线。
3、定期对模型进行再培训和更新。
4、对新数据的重要性进行加权。
5、创建新模型来解决突然或反复出现的概念漂移。