实用机器学习笔记(二):数据标注

数据标注

1. 数据标注流程

  1. Have enough data? 数据是否充足
  2. improve label,data, or model? 标签、数据和模型哪个部分需要提升
  3. enough label? 标签是否充足

    semi-supervised learning 半监督学习

  4. enough budget ? 预算是否充足

    label via crowd sourcing 有钱任性

  5. user weak label? 是否可以使用比较差的标签

    weak supervising 弱监督学习

2. semi-supervised learning (SSL) : 半监督学习

推荐阅读:

  1. focus on the scenario where there is a small amount of labeled data, along with large amount of unlabeld data (少部分有标签的数据,大部分数据没有标签

  2. make assumptions on data distribution to use unlabeled data (对没有标签的数据的分布进行假设

    • continuity assumption:examples with similar features are more likely to have the same label(相似假设)
    • cluster assumption : data have inherent cluster structure ,example in the same cluster tend to have the same label (聚类假设)
    • mainifold assumption: the data lie on a manifld of much lower dimension than the input space (流行假设)
  3. self-training (自训练)

    self training is a SSL method

  4. we can use expensive models (大佬游戏:加深网络、模型集成,一个模型不行就n个)

    • deep neural networks ,model ensemble/bagging

3. 数据标注的挑战

  1. simplify user interaction: design easy tasks ,clear instructions and simple to use interface (设计简单、清楚的标注任务)
    • THE user instruction and task used by the MIT place365 dataset
  2. cost:active learning + self-training (考虑标注的成本问题)
    • focus on same scenario as SSL but with human intervention (有人工干预的SSL)
    • uncertainty sampling chooses an example whose prediction is most uncertain (抽样筛查标签)
    • similar to self-training we can use expensive models
      • query by committee trains multiple models and performs major voting
  3. quality control :label qualities generated by different labels vary(控制标注信息的质量)
    • simplest but most expensive : sending the same task to multiple labeled then determinne the label by majority voting (最简单却也是最贵的方法就是,将同一个任务交给不同的人去做,然后进行投票)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

留小星

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值