吴恩达《Machine Learning》-概念(一)

入坑吴恩达大佬的机器学习课程。纯英文教学和作业还有编程作业还是很挑战的,希望早日结课拿证。

在这里插入图片描述

机器学习定义

在这里插入图片描述

T是此次机器学习的任务
测试题:

“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.”
Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting?

A.Classify emails as spam or not spam.

B.Watching you label emails as spam or not spam.

C.The number (or fraction) of emails correctly classified as spam/not spam.

D.None of the above, this is not a machine learning algorithm.

问题:此次的任务是什么?
正确答案:

选择A,将邮件分为垃圾邮件与非垃圾邮件
为Task

B,看你把邮件贴上垃圾邮件或者不是垃圾邮件的标签
为Experience

C,正确分类为垃圾邮件/非垃圾邮件的电子邮件的数量
为Performance


详细定义

Two definitions of Machine Learning are offered. Arthur Samuel described it as: “the field of study that gives computers the ability to learn without being explicitly programmed.” This is an older, informal definition.
Tom Mitchell provides a more modern definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
Example: playing checkers.
**E = the experience of playing many games of checkers
T = the task of playing checkers.
P = the probability that the program will win the next game.

机器学习就是给定一些数据集,通过学习算法。根据输入的训练数据x与已经标定好的结果y,学习产生

在这里插入图片描述

机器学习问题分类

监督学习与无监督学习

In general, any machine learning problem can be assigned to one of two broad classifications:
Supervised learning and Unsupervised learning.**

在这里插入图片描述

监督学习给出已经存在的正确的数据集,通过算法预测出更多数据。

回归问题 连续值预测
在这里插入图片描述
分类问题 答案给出0,1,2,3等离散值
在这里插入图片描述
特征:
为将二(n)维的映射下来,并用不同符号代替。

在这里插入图片描述

测试题:

You’re running a company, and you want to develop learning algorithms to address each of two problems. Problem 1:You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months.
Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it has been hacked/compromised. Should you treat these as classification or as regression problems?

A.Treat both as classification problems.

B.Treat problem 1 as a classification problem, problem 2 as a regression problem.

C.Treat problem 1 as a regression problem, problem 2 as a classification problem.

D.Treat both as regression problems.

正确答案:

选择C。连续值为回归问题,离散值为分类问题

无监督学习,没有数据集的正负表名。相当于聚类。

在这里插入图片描述

聚类用处,将混有背景音乐的人声分离。分成背景音乐与人类。
octave解决分类问题。

在这里插入图片描述

测试题:

Of the following examples, which would you address using an unsupervised learning algorithm? (Check all that apply.)

A.Given email labeled as spam/not spam, learn a spam filter.
B.Given a set of news articles found on the web, group them into sets of articles about the same stories.
C.Given a database of customer data, automatically discover market segments and group customers into different market segments.
D.Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not.

正确答案:

选B,D。

无监督学习,没有标定好的数据集,并且自动发现类别。
给定一个数据库的客户数据,自动发现和集团客户市场的段段到不同的市场。

测试题:

Suppose you are working on stock market prediction. You would like to predict whether or not a certain company will declare bankruptcy within the next 7 days (by training on data of similar companies that had previously been at risk of bankruptcy). Would you treat this as a classification or a regression problem? ( A ) \color{red}{(A)} (A)

A.Classification

B.Regression

Which of these is a reasonable definition of machine learning? ( B ) \color{red}{(B)} (B)
A.Machine learning learns from labeled data.

B.Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed.

C.Machine learning is the science of programming computers.

D.Machine learning is the field of allowing robots to act intelligently.

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
吴恩达机器学习系统设计选择题主要包含以下几个方面: 1. 训练集和开发/测试集:选择合适的训练集和开发/测试集对于构建有效的机器学习系统非常重要。我们需要确保训练集和开发/测试集能够代表真实的数据分布,并且在划分数据集时要考虑到数据的随机性和一致性。 2. 性能指标选择:根据具体的问题和需求,选择合适的性能指标来评估机器学习系统的表现。如分类问题可以选择准确率、精确率、召回率等指标,回归问题可以选择均方误差或相关系数等指标。 3. 偏差和方差的平衡:在机器学习系统中,我们通常会面临偏差和方差之间的权衡。通过增加模型的复杂度可以降低偏差,但容易引起方差过高;通过减小模型的复杂度可以减小方差,但容易导致偏差过高。需要根据具体情况选择适当的模型复杂度。 4. 错误分析:在构建机器学习系统时,我们需要进行错误分析来深入了解模型在不同数据集上的表现。通过错误分析,我们可以找出模型存在的问题,并采取相应的措施进行修正和优化。 5. 学习曲线:学习曲线可以帮助我们了解模型的训练过程。通过绘制训练集和开发/测试集的误差随着训练集大小变化的曲线,我们可以判断模型是否出现高偏差或高方差的情况,从而决定是否需要增加更多的训练数据或者调整模型复杂度。 吴恩达强调了以上几个方面的重要性,并提供了相应的选择题帮助我们更好地设计和调整机器学习系统,以获得更好的性能和效果。这些选择题的回答需要结合具体问题和数据情况进行分析和判断,从而做出最合理的决策。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值