机器学习演算法 第三讲 Types of Learning——学习笔记

Learning with Different Output Space

Binary Classification 二元分类

二元分类的应用:

credit approve/disapprove

email spam/non-spam

patient sick/not sick

ad profitable/not profitable(广告会不会赚钱)

answer correct/incorrect(KDDCup 2010)


Multiclass Classification: Coin Recognition Problem

多元分类的应用:

written digits =》 0, 1 ... 9

pictures =》 apple, orange, strawberry

emails =》 spam, primary, social, promotion, update(Google gmail)


Regression: Patient Recovery Prediction Problem 回归分析

binary classification: patient features => sick or not

multiclass classification: patient features => which type of cancer

regression: patient features => how many days before recovery

回归分析应用:

company data => stock price(预测明天的股市情况)

climate data => temperature


Structured Leaning: Sequence Tagging Problem 很大的多类别问题/结构化学习

multiclass classification: word => word class

structured learning:

sentence => structure(class of each word)

huge multiclass classification problem(structure = hyperclass) without 'explicit' class definition

应用

protein data => protein folding 

speech data => speech parse tree


summary

binary classification: y={-1, +1}

multiclass classification: y={1, 2, 3 ... k}

regression: y=R

structured learning: y=structures


Learning with Different Data Label

supervised learning 监督式学习

告诉你铜板是什么

unsupervised learning 未监督式学习

不告诉你铜板是什么

unsupervised multiclass classification =》 clustering 分群

分群的应用

articles => topics

consumer profiles => consumer groups


unsupervised: Learning without yn

clustering: {xn} => cluster{x}

density estimation: {xn} => density(x)

outlier detection: {xn} => unusual(x)


semi-supervised: Coin Recognition with Some yn

半监督式问题应用

face images with a few labeled => face identifier(Facebook)

medicine data with a few labeled => medicine effect predictor

特点:要找到标记很贵


Reinforcement Learning 增强式学习

a very difficult but natural way of learning

在另外一个输出进行奖励或者惩罚

增强式的应用

(customer, ad choice, ad click earning) => ad system 广告系统,顾客训练广告系统

(cards, strategy, winning amout) => black jack agent 棋类游戏

考的不是如何输出,而是另外的输出的好坏训练

——通常会序列的发生


summary

supervised: all yn

unsupervised: no yn

semi-supervised: some yn

reinforcement: implicit yn by goodness(yn)


Learning with Different Protocol

batch learning

batch supervised multiclass classification: learn from all known data

将资料整批整批的训练

应用:

batch of (email, spam?) => spam filter

batch of (patient, cancer) => cancer classifier

batch of patient data => group of patients

batch learning: a very common protocol


online learning线上学习

--- hypothesis 'improves' through receiving data instances sequentially


active learning主动学习:

----improve hypothesis with fewer labels (hopefully) by asking questions strategically

用在取得标记很贵的地方


batch: 'duck feeding' 填鸭式教育

online: 'passive sequential' 老师教书,一条条按照顺序来

以上两个都是被动的

active: 'question asking' (sequentially) 机器问问题 --- query the yn of the chosen xn


Mini Summary

batch: all known data

online: sequential (passive) data

active: strategically - observed data


Learning with Different Input Space

Credit Approval Problem Revisited

concrete features: each dimension of x belongs to Rd represents 'sophisticated physical meaning'

——有domain knowledge,专业知识的描述

应用:

(size, mass) for coin classification

customer info for credit approval

patient info for cancer diagnosis

often including 'human intelligence' on the learning task


Raw Features: Digit Recognition Problem

digit recognition problem: features => meaning of digit

a typical supervised multiclass classification problem

by concrete features: x = (symmetry, density)

by raw features: 16 by 16 gray image x = (0, 0, 0.9, 0.6 ...) 是一个256维的向量R

raw比concrete抽象,而越抽象就表示对机器来说解决起来越困难

raw features: often need human or machines to convert to concrete ones

deep learning: 大量的资料从中抽取出比较具体的数据


Abstract Features: Rating Prediction Problem

rating prediction problem (KDDCup 2011)

given previous (userid, itemid, rating) tuples, predict the rating that some userid would give to itemid?

a regression problem with y belongs to R as rating and x belongs to N * N as (userid, itemid)

'no physical meaning'; thus even more difficult for ML

其他应用

student ID in online tutoring system (KDDCup 2010)

advertisement ID in online ad system

人自己规定一部分特征,然后机器自己学习一部分特征

abstract: again need 'feature conversion/extraction/construction'


Mini Summary

concrete: sophisticated (and related) physical meaning

raw: simple physical meaning

abstract: no (or little) physical meaning

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值