机器学习系列之coursera week 6 Advice for Applying Machine Learning

目录

1. Evaluating a learning Algorithm

1.1 Deciding what to try next

1.2 Evaluating a hypothesis

1.3 Model selection and training/validation/test

2. Bias VS Variance

2.1 Diagnosing bias VS variance

2.2 Regulatization and Bias/Variance

2.3 Learning curves

2.4 Deciding waht to do next

3. Buildig a spam classifier

3.1 Prioitizing what to work on: Spam classification example

3.2 Error analysis

4. Handling skewed data

4.1 Error metrices for skewed classes

4.2 Trading off precision and recall

5. Using large data sets


1. Evaluating a learning Algorithm

1.1 Deciding what to try next

Debugging what to try next:

suppose you have implemented regularized linear regression to predict housing price

However, when you test your hypothesis on a new set of houses, you find that it makes unacceptably large error in its predictions.what should you do next?

- Get more training examples

- Try smaller sets of features

- Try getting addional features

- Try adding polynomial features

- Try decreasing λ

- Try increasing λ

Machine learning diagnostic:

Diagnostic: A test that you can run to gain insight what is/isn't working with a learning algorithm. and gain guidance as to how best to improve its performance.

Diagnostic can take time to implement, but dong so can be a very good use of your time.

1.2 Evaluating a hypothesis

Evaluating your hypothesi:

图1

(引自coursera week 6 Evaluating a hypothesis)

图2

(引自coursera week 6 Evaluating a hypothesis)

Training/testing procedure for linear regression:

- Learn parameter θ from training data(minizing training error J(θ))

- Compute test error: J_test(θ)

 

Training/testingprocedure for LR:

- Learn parameter θ from training data

- Compute test set error: J_test(θ)

Misclassification error:

1.3 Model selection and training/validation/test

Model selection:

图3

(引自coursera week 6 Model selection and training/validation/test)

How well does the model generalize? Report test set error J_test(θ_5)

Problem: J_test(θ_5) is likely to be an optimistic estimate of generalization error. I.e. our extra parameter(d) is fit to test set.(也就是说用测试集找出的最佳d,这样得到的不是最佳泛化误差,用训练集也不行)

图4

(引自coursera week 6 Model selection and training/validation/test)

图5

(引自coursera week 6 Model selection and training/validation/test)

2. Bias VS Variance

2.1 Diagnosing bias VS variance

图6

(引自coursera week 6 Diagnosing bias VS variance)

图7

(引自coursera week 6 Diagnosing bias VS variance)

Diagnosing bias VS variance:

Suppose your learning algorithm is performing less well than you were hoping(J_test or J_CV is high). Is it a bias problem or variance problem?

图8

(引自coursera week 6 Diagnosing bias VS variance)

Bias(underfit): training error(high) approximately equal to CV error(high)

Variance(overfit): training error(low) << CV error(high)

 

 

2.2 Regulatization and Bias/Variance

Linear regression with regularization

E.g.

model:

图9

(引自coursera week 6 Regulatization and Bias/Variance)

Choosing the regularization parameter λ

E.g:

图10

(引自coursera week 6 Regulatization and Bias/Variance)

choose the λ with lowest J_CV

Plotting bia/variance as a function of the regularization parameter λ.

图11

(引自coursera week 6 Regulatization and Bias/Variance)

2.3 Learning curves

图12

(引自coursera week 6 Learning curves)

m通常比样本数少

High bias:

图13

(引自coursera week 6 Learning curves)

High variance:

图14

(引自coursera week 6 Learning curves)

2.4 Deciding waht to do next

Debugging a learn algorithm:

Suppose you have implemented regularized linear regression to predict housing price. However, when you test your hypothesis in a new set of houses, you finc that it makes unacceptably large errors in its prediction. What should you try next?

- Get more training examples       -----------> high variance

- Try smaller sets of features        ----------->high variance

- Try getting addional features      ----------->high bias

- Try adding polynomial features  ----------->high bias

- Try decreasing λ                         ----------->high bias

- Try increasing λ                          ----------->high variance

图15

(引自coursera week 6 Deciding waht to do next)

使用大型神经网络的性能往往更好,如果出现过拟合,则使用正则化。

 

3. Buildig a spam classifier

3.1 Prioitizing what to work on: Spam classification example

Building a spam classifier:

supervised learning x = features of email. y = Spam(1) or not Spam(0)

Features x: choose 100 wordsindicative of spam/not spam

E.g: deal, buy, discount, andrew, now.....

图16

(引自coursera week 6 Prioitizing what to work on: Spam classification example)

Note: In practice, take most frequently occurring n words(10000 to 50000) in training set, rather than manually pick 100 words.

How to spend your time to make it have low error?

- Collect lots of data

- Develope sophisticated features based on email routing information(from email header)

- Develope sophisticated features for message body. E.g: should "discount" and "discounted" be treated as the same word?

- Develope sohisticated algorithm to detect misspellings.

3.2 Error analysis

m_CV = 500 examples in cross validation set.

Algorithm miscalssifies 100 emails.

Manually examine the 100 errors, and categorize them base on "

(1) what type of email it is

(2) what features you think would have helped the algorithm classify them correctly:

- Deliberate misspellings

- Unusual email routing(来源)

- Unuaual punctuation(标点)

The importance of numerical evaluation:

Should discount/discounts/discounted/discounting e treated as the same word?

Can use "stemming" software(词干提取软件)(e.g. porter stemmer)

Error analysis may nit be helpful for deciding if this is likely to improve performance. Only solution is to try it and see it works.

Need numerical evaluation(e.g. cross validation error) of algorithms performance with and without stemming.

在CV set 上进行error analysis

 

4. Handling skewed data

4.1 Error metrices for skewed classes

Cancer classification example.

Train logistic regression model h(x). (y=1 if cancer, y=0 otherwise)

Find that you got 1% error on test set.

But only 0.05% of patients have cancer. 正负样本只比非常大称为skewed classes.

当在skewed classes上使用accuracy通常不那么好,因此需要新的误差度量指标。

Precision/Recall:

y = 1 in presemce of rare calss that we want to detect

图17

(引自coursera week 6 Error metrices for skewed classes)

4.2 Trading off precision and recall

LR: 0<= h(x) <= 1

predict 1 if h(x) >= 0.5

predict 0 if h(x) < 0.5

(1) suppose we want to  predict y = 1 (cancer) only if very confident

------> h(x) >= 0.7, y = 1

          h(x) < 0.7, y = 0

------> higher precision, lower recall

(2) suppose we want to avoid missing too many cases of cancer(avoid false negatives)

------> h(x) >= 0.3, y = 1

          h(x) < 0.3, y = 0

------> lower precision, higher recall

Change threshold to plot P-R cruve:

图18

(引自coursera week 6 Trading off precision and recall)

F1 score(F score):

How to caompare P/R numbers?

图19

(引自coursera week 6 Trading off precision and recall)

note: 在CV set 上计算

 

5. Using large data sets

(1) Features x have sufficient information to predict y accurately.

useful test: given the input x, a human expert can confidently predict y.

(2) Use a learning algorithm with many parameters.

(3) Use a very large training set(unlikely to overfit)

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值