机器学习实战 - Chapter 1 - ML Landscape - Question Exercises

1. How would you define Machine Learning?

Machine Learning is about building systems that can learn from data, Learning means getting better at some task, given some performance measure.

2. Can you name four types of problems where it shines?

Machine Learning is great for complex problems for which we have no algorithmic solution, to replace long list of hand-tuned rules, to build systems that adapt a fluctuating environments, and finally to help humans learn(e.g., data mining)

3. What’s a labeled training set?

A labeled training set is a training set that contains the desired solution(a.k.a. a label)for each instance.

4. What are he two most common supervised tasks?

Regression and Classification

5. Can you name four common unsupervised tasks?

clustering, visualization, dimensionality reduction and association rule learning (聚类、可视化、降维和关联规则学习)

6. What type of Machine Learning algorithm would you use to allow a robot to walk in various unknown terrains(地形)?

Reinforcement Learning is likely to perform best if we want a robot to learn to walk in various unknown terrains since this is typically the type of problem that Reinforcement Learning tackles. It might be possible to express the problem as a supervised or semi-supervised learning problem, but it would be less natural.

7. What type of algorithm would you use to segment your customers into multiple groups?

If you don’t know how to define the groups, then you can use a clustering algorithm(unsupervised learning) to segment your customers into clusters of similar customers. However, if you know what groups you would like to have, then you can feed many examples of each group to classification algorithm(supervised learning), and it will classify all your customers into these groups.

8. Would you frame the problem of spam detection as a supervised learning problem or an unsupervised learning problem

Spam detection is a typical supervised learning problem: the algorithm is fed many emails along with their label(spam or not spam)

9. What is an online learning system?

An online learning system can learn incrementally, as opposed to a batch learning system. This makes it capable of adapting rapidly to both changing data and autonomous systems, and of training on very large quantities of data.

10. What is out-of-core learning?

Out-of-core algorithms can handle vast quantities of data that cannot fit in a computer’s main memory. An out-of-core learning algorithm chops the data into mini-batches and uses online learning techniques to learn from these mini-batches.

11. What type of learning algorithm relies on a similarity measure to make predictions?

An instance-based learning system learns the training data by heart; then, when given a new instance, it uses a similarity measure to find the most similar learned instances and uses them to make predictions.

12. What is difference between a model parameter and a learning algorithm’s hyper-parameter?

A model has one or more model parameters that determine what it will predict given a new instance(e.g.,the slope of a linear model). A learning algorithm tries to find optimal values for these parameters such that the model generalize well to new instance. A hyper-parameter is a parameter of the learning algorithm itself, not of the model(e.g., the amount of regularization to apply).

13. What do model-based learning algorithms search for? What is the most common strategy they use to succeed? How do they make predictions?

Mode-based learning algorithms search for an optimal value for the model parameters such that the model will generalize well to new instances. We usually train such systems by minimizing a cost function that measures how bad the system is at making predictions on the training data, plus a penalty(惩罚) for the model complexity if the model is regularized. To make predictions, we feed the new instance’s features into the model’s prediction function, using the parameter values found by the learning algorithm.

14. Can you name four of the main challenges in Machine Learning?

Some of the main challenges in ML are the lack of data, poor data quality, non-representative data, uninformative(信息量不足) features, excessively(过于) simply models that underfit the training data, and excessively complex models that overfit the data.

15. If your model performs great on the training data but generalizes poorly to new instances, what is happening? Can you name three possible solutions?

If a model performs great on the training data but generalizes poorly to new instances, the model is likely overfitting the training data(or we got extremely lucky on the training data). Possible solutions to overfitting are getting more data, simplifying the model(selecting a simpler algorithm, reducing the number of parameters or features used, or regularizing the model), or reducing the noise in the training data.

16. What is a test set and why would you want to use it?

A test set is used to estimate the generalization error that a model will make on new instances, before the model is launched in production.

17. What is the purpose of a validation set?

A validation set is used to compare models. It makes it possible to select the best model and tune the hyper-parameters.

18. What can go wrong if you tune hyper-parameters using the test data set?

If you tune hyper-parameters using the test data set, you risk overfitting the test set, and the generalization error(泛化误差) you measure will be optimistic(you may launch a model that performs worse than you expect.)

19. What is cross-validation and why would you prefer it to a validation set?

Cross-validation is a technique that makes it possible to compare models(for model selection and hyper-parameter tuning) without the need for a separate validation set. This saves precious training data.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值