IoU overlap

最新推荐文章于 2023-06-20 19:01:34 发布

Santiago11

最新推荐文章于 2023-06-20 19:01:34 发布

阅读量1.5k

点赞数

lee philip，神经网络/自然语言处理
GarfieldEr007、知乎用户、Ally 等人赞同
机器学习包括有监督学习(supervised learning)，无监督学习(unsupervised learning)，和半监督学习（semi-supervised learning）.

在*有监督学习中，数据是有标注的，以(x, t)的形式出现，其中x是输入数据，t是标注.正确的t标注是ground truth，错误的标记则不是。（也有人将所有标注数据都叫做ground truth）

由模型函数的数据则是由(x, y)的形式出现的。其中x为之前的输入数据，y为模型预测的值。

标注会和模型预测的结果作比较。在损耗函数(loss function / error function)中会将y 和 t 作比较，从而计算损耗(loss / error)。比如在最小方差中：

\frac{1}{2m} \sum_{i=1}^{m} (y - t)^2

因此如果标注数据不是ground truth，那么loss的计算将会产生误差，从而影响到模型质量。

比如输入三维，判断是否性感：

1. 错误的数据

标注数据1 ( (84,62,86) , 1)，其中x =(84,62,86), t = 1 。
标注数据2 ( (84,162,86) , 1)，其中x =(84,162,86), t = 1 。

这里标注数据1是ground truth，而标注数据2不是。

预测数据1 y = -1
预测数据2 y = -1

Loss = \frac{1}{2\times 2} ((-1-1)^2 + (-1-1)^2) = 2

2. 正确的数据

标注数据1 ( (84,62,86) , 1)，其中x =(84,62,86), t = 1 。
标注数据2 ( (84,162,86) , 1)，其中x =(84,162,86), t = -1 。（改为ground truth）

这里标注数据1和2都是ground truth。

预测数据1 y = -1
预测数据2 y = -1

Loss = \frac{1}{2\times 2} ((-1-1)^2 + (-1+1)^2) = 1

由于使用错误的数据，对模型的估计比实际要糟糕。另外，标记数据还被用来更新权重，错误标记的数据会导致权重更新错误。因此使用高质量的数据是很有必要的。

* 在半监督学习中，对标记数据也要进行比较
编辑于 2014-01-06 1 条评论感谢分享收藏 • 没有帮助 • 举报 • 作者保留权利收起

8
赞同反对，不会显示你的姓名
知乎用户
GarfieldEr007、江新月、汪骏祥等人赞同
就是参考标准，一般用来做error quantification。比方说要根据历史数据预测某一时间的温度，ground truth就是那个时间的真实温度。error就是(predicted temperature - real temprature)。

Ground truth当然还可以用来做reinforcement learning，就是在学习中加入奖励机制。比方说程序的输出越接近ground truth，用来产生这个结果的数据的weight越大。

Wiki中的解释是：
In machine learning, the term "ground truth" refers to the accuracy of the training set's classification for supervised learning techniques. This is used in statistical models to prove or disprove researchhypotheses. The verb "ground truthing" refers to the process of gathering the proper objective data for this test. Compare with gold standard (test).
Bayesian spam filtering is a common example of supervised learning. In this system, the algorithm is manually taught the differences between spam and non-spam. This depends on the ground truth of the messages used to train the algorithm; inaccuracies in that ground truth will correlate to inaccuracies in the resulting spam/non-spam verdicts.

Santiago11

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
IoU overlap

lee philip，神经网络/自然语言处理GarfieldEr007、知乎用户、Ally 等人赞同机器学习包括有监督学习(supervised learning)，无监督学习(unsupervised learning)，和半监督学习（semi-supervised learning）.在*有监督学习中，数据是有标注的，以(x, t)的形式出现，其中x是输入数据，t是标注.正
复制链接

扫一扫