Learning with Different Output Space
Binary Classification 二元分类
二元分类的应用:
credit approve/disapprove
email spam/non-spam
patient sick/not sick
ad profitable/not profitable(广告会不会赚钱)
answer correct/incorrect(KDDCup 2010)
Multiclass Classification: Coin Recognition Problem
多元分类的应用:
written digits =》 0, 1 ... 9
pictures =》 apple, orange, strawberry
emails =》 spam, primary, social, promotion, update(Google gmail)
Regression: Patient Recovery Prediction Problem 回归分析
binary classification: patient features => sick or not
multiclass classification: patient features => which type of cancer
regression: patient features => how many days before recovery
回归分析应用:
company data => stock price(预测明天的股市情况)
climate data => temperature
Structured Leaning: Sequence Tagging Problem 很大的多类别问题/结构化学习
multiclass classification: word => word class
structured learning:
sentence => structure(class of each word)
huge multiclass classification problem(structure = hyperclass) without 'explicit' class definition
应用
protein data => protein folding
speech data => speech parse tree
summary
binary classification: y={-1, +1}
multiclass classification: y={1, 2, 3 ... k}
regression: y=R
structured learning: y=structures
Learning with Different Data Label
supervised learning 监督式学习
告诉你铜板是什么
unsupervised learning 未监督式学习
不告诉你铜板是什么
unsupervised multiclass classification =》 clustering 分群
分群的应用
articles => topics
consumer profiles => consumer groups
unsupervised: Learning without yn
clustering: {xn} => cluster{x}
density estimation: {xn} => density(x)
outlier detection: {xn} => unusual(x)
semi-supervised: Coin Recognition with Some yn
半监督式问题应用
face images with a few labeled => face identifier(Facebook)
medicine data with a few labeled => medicine effect predictor
特点:要找到标记很贵
Reinforcement Learning 增强式学习
a very difficult but natural way of learning
在另外一个输出进行奖励或者惩罚
增强式的应用
(customer, ad choice, ad click earning) => ad system 广告系统,顾客训练广告系统
(cards, strategy, winning amout) => black jack agent 棋类游戏
考的不是如何输出,而是另外的输出的好坏训练
——通常会序列的发生
summary
supervised: all yn
unsupervised: no yn
semi-supervised: some yn
reinforcement: implicit yn by goodness(yn)
Learning with Different Protocol
batch learning
batch supervised multiclass classification: learn from all known data
将资料整批整批的训练
应用:
batch of (email, spam?) => spam filter
batch of (patient, cancer) => cancer classifier
batch of patient data => group of patients
batch learning: a very common protocol
online learning线上学习
--- hypothesis 'improves' through receiving data instances sequentially
active learning主动学习:
----improve hypothesis with fewer labels (hopefully) by asking questions strategically
用在取得标记很贵的地方
batch: 'duck feeding' 填鸭式教育
online: 'passive sequential' 老师教书,一条条按照顺序来
以上两个都是被动的
active: 'question asking' (sequentially) 机器问问题 --- query the yn of the chosen xn
Mini Summary
batch: all known data
online: sequential (passive) data
active: strategically - observed data
Learning with Different Input Space
Credit Approval Problem Revisited
concrete features: each dimension of x belongs to Rd represents 'sophisticated physical meaning'
——有domain knowledge,专业知识的描述
应用:
(size, mass) for coin classification
customer info for credit approval
patient info for cancer diagnosis
often including 'human intelligence' on the learning task
Raw Features: Digit Recognition Problem
digit recognition problem: features => meaning of digit
a typical supervised multiclass classification problem
by concrete features: x = (symmetry, density)
by raw features: 16 by 16 gray image x = (0, 0, 0.9, 0.6 ...) 是一个256维的向量R
raw比concrete抽象,而越抽象就表示对机器来说解决起来越困难
raw features: often need human or machines to convert to concrete ones
deep learning: 大量的资料从中抽取出比较具体的数据
Abstract Features: Rating Prediction Problem
rating prediction problem (KDDCup 2011)
given previous (userid, itemid, rating) tuples, predict the rating that some userid would give to itemid?
a regression problem with y belongs to R as rating and x belongs to N * N as (userid, itemid)
'no physical meaning'; thus even more difficult for ML
其他应用
student ID in online tutoring system (KDDCup 2010)
advertisement ID in online ad system
人自己规定一部分特征,然后机器自己学习一部分特征
abstract: again need 'feature conversion/extraction/construction'
Mini Summary
concrete: sophisticated (and related) physical meaning
raw: simple physical meaning
abstract: no (or little) physical meaning