Learning with Feature Evolvable Stream学习笔记

最新推荐文章于 2022-04-16 21:36:05 发布

weixin_43700045

最新推荐文章于 2022-04-16 21:36:05 发布

阅读量527

点赞数 1

分类专栏：机器学习

本文链接：https://blog.csdn.net/weixin_43700045/article/details/103078841

版权

机器学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Learning with Feature Evolvable Stream

摘要

现实工程中，目标的某些特征会消失，也会产生某些新特征。我们将消失的特征结合现有特征进行训练，得到两个模型。使用两种方法进行预测，一种方法是结合两个模型的输出结果；另一种是动态选择单次预测较好的模型，确保模型效果最好。两种方法都交叉验证了我们的理论。

简介

现有的基于数据流的学习方法：hoeffding tree；Bayes tree；evolving granular neural network(eGNN)；Core Vector Machine(CVM)。这些方法共同的假设：数据流有稳固的特征空间，即样本数据总是被相同的特征集描述。
我们假定：特征不会随心所欲地改变，新旧特征在时间上会有重叠。
Thus, the data stream arrives in a way as shown in Figure 1, where in period T1, the original set of features are valid and at the end of T1, period B1 appears, where the original set of features are still accessible, but some new features are included; then in T2, the original set of features vanish, only the new features are valid but at the end of T2, period B2 appears where newer features come. This process will repeat again and again. Note that the T1 and T2 periods are usually long,whereas the B1 and B2 periods are short because,as in the ecosystem protection example, the B1 and B2 periods are just used to switch the sensors and we do not want to waste a lot of lifetime of sensors for such overlapping periods.
在这里插入图片描述
我们通过分析新旧数据之间的关系，并且在只有新数据之后仍然利用消失的数据来解决FESL问题。在新旧数据重叠的时候，构建新数据到旧数据的映射，这样，我们就可以通过新数据重构就特征数据。

准备工作

每训练完一轮，模型都会预测一个实例与真实标签比较，得到一个损失，反应预测和真实值之间的误差。
我们定义的特征空间：特征空间改变意味着特征集的底层分布和特征的数量都发生改变
一个循环：Consider the process with three periods where in the first period large amount of data stream come from the old feature space; then in the second period named as overlapping period, few of data come from both the old and the new feature space; soon afterwards in the third period, data stream only come from the new feature space.each cycle merely includes two feature spaces.
我们假设，一个循环中的旧特征数据会同时消失，所有的传感器将在同一时刻失效。
基于以上假设，我们规定 $S 1, S 2$ 分别为新旧特征空间，特征空间重叠时间一共接收到 $B$ 轮传感器信号，即 $B$ 个样本，过程可以概括为以下内容：

For $t=1,...,T_1-B$ , in each round, the learner observes a vector $x_t^{s_1}\in R^{d_1}$ sampled from $S_1$ where $d_1$ is the number of features of $S_1$ , $T_1$ is the number of total rounds in $S_1$ .
For $t=T_1-B+1,...,T_1$ , in each round, the learner observes two vectors $x_t^{s_1}\in R^{d_1}$ and $x_t^{s_2}\in R^{d_2}$ from $S_1$ and $S_2$ , respectively where $d_2$ is the number of features of $S_2$ .
For $t=T_1+1,...,T_1+T_2$ , in each round, the learner observes a vector $x_t^{s_2}\in R^{d_2}$ sampled from $S_2$ where $T_2$ is the number of rounds in $S_2$ . Note that B is small, so we can omit the streaming data from $S_2$ on rounds $T_1-B+1,...,T_1$ since they have minor effect on training the model in $S_2$ .

算法

由于两个空间共存的时间很短，我们求得从 $S_2$ 到 $S_1$ 的线性映射，可以使用最小方差求的映射：
在这里插入图片描述
上述公式的最优解为:

当 $t>T_1$ 时，可以计算出基于两个模型 $w_{1,t},w_{2,t}$ 的两个基础预测值。基于两个基础预测值，我们提出了两种方法进行最终预测。

权重结合法

第一种方法是基于累计误差的权重结合法。在t时刻的预测值为此时权重的加权平均值：
在这里插入图片描述
其中， $\alpha_{i,t}$ 是第 $i$ 轮基础预测的权重。通过基础模型的损失，我们用以下公式计算两个基础模型的权重：

其中， $\eta$ 是一个调谐参数。公式表明，如果一个模型的损失值很大，权重在下一轮预测中将会呈指数型下降。这个方法称为FESL-c。我们用在线梯度下降法在 $1,...,T_1$ 上训练模型 $w_{1,T_1}$ ；在 $t=T_1-B+1,...,T_1.$ 上学习映射关系 $\psi$ 。在 $t=T_1+1,...,T_1+T_2$ 上，我们学习模型 $w_{2,t}$ ，并且使用数据 $\psi(X_t^{s_2})$ 更新权重 $w_{1,t}$ 。其中， $\tau_t$ 是更新的步长：
在这里插入图片描述

动态选择

当基础模型性能较好时，使用权重结合法将几个基础模型的输出结合一般会比单一模型得到的结果要好，但当基础模型的性能不能保证时，采用动态选择的方法选择性能较好的基础模型的输出作为结果会得到较好的结果，这种方法称为FESL-s。根据权重的分布选择最佳的模型：
在这里插入图片描述
权重更新的公式为：

其中， $W_t=v_{1,t}+v_{2,t}，\delta=1/(T_2-1)，\eta=\sqrt{8/T_2(2ln2+(T_2-1)H(1/(T_2-1)))}，H(x)=-xlnx-(1-x)ln(1-x)$ 。