机器学习学习笔记——1.1.1.2.2 Supervised learning part 1（监督学习——第1部分）

预见未来to50

于 2024-09-13 20:53:01 发布

阅读量352

点赞数 3

分类专栏：机器学习、深度学习（ML/DL) 文章标签：人工智能深度学习机器学习

本文链接：https://blog.csdn.net/hpdlzu80100/article/details/142219943

版权

机器学习、深度学习（ML/DL) 专栏收录该内容

127 篇文章 12 订阅

订阅专栏

Machine learning is creating tremendous economic value today. I think 99 percent of the economic value created by machine learning today is through one type of machine learning, which is called supervised learning. Let's take a look at what that means. Supervised machine learning or more commonly, supervised learning, refers to algorithms that learn x to y or input to output mappings. The key characteristic of supervised learning is that you give your learning algorithm examples to learn from. That includes the right answers, whereby right answer, I mean, the correct label y for a given input x, and is by seeing correct pairs of input x and desired output label y that the learning algorithm eventually learns to take just the input alone without the output label and gives a reasonably accurate prediction or guess of the output.

Let's look at some examples. If the input x is an email and the output y is this email, spam or not spam, this gives you your spam filter. Or if the input is an audio clip and the algorithm's job is to output the text transcript, then this is speech recognition. Or if you want to input English and have it output to corresponding Spanish, Arabic, Hindi, Chinese, Japanese, or something else translation, then that's machine translation. Or the most lucrative form of supervised learning today is probably used in online advertising. Nearly all the large online ad platforms have a learning algorithm that inputs some information about an ad and some information about you and then tries to figure out if you will click on that ad or not. Because by showing you ads they're just slightly more likely to click on, for these large online ad platforms, every click is revenue, this actually drives a lot of revenue for these companies.

This is something I once done a lot of work on, maybe not the most inspiring application, but it certainly has a significant economic impact in some countries today. Or if you want to build a self-driving car, the learning algorithm would take as input an image and some information from other sensors such as a radar or other things and then try to output the position of, say, other cars so that your self-driving car can safely drive around the other cars. Or take manufacturing. I've actually done a lot of work in this sector at learning AI. You can have a learning algorithm takes as input a picture of a manufactured product, say a cell phone that just rolled off the production line and have the learning algorithm output whether or not there is a scratch, dent, or other defect in the product. This is called visual inspection and it's helping manufacturers reduce or prevent defects in their products.In all of these applications, you will first train your model with examples of inputs x and the right answers, that is the labels y. After the model has learned from these input, output, or x and y pairs, they can then take a brand new input x, something it has never seen before, and try to produce the appropriate corresponding output y.

Let's dive more deeply into one specific example. Say you want to predict housing prices based on the size of the house. You've collected some data and say you plot the data and it looks like this. Here on the horizontal axis is the size of the house in square feet. Yes, I live in the United States where we still use square feet. I know most of the world uses square meters. Here on the vertical axis is the price of the house in, say, thousands of dollars. With this data, let's say a friend wants to know what's the price for their 750 square foot house. How can the learning algorithm help you? One thing a learning algorithm might be able to do is say, for the straight line to the data and reading off the straight line, it looks like your friend's house could be sold for maybe about, I don't know, $150,000. But fitting a straight line isn't the only learning algorithm you can use. There are others that could work better for this application. For example, routed and fitting a straight line, you might decide that it's better to fit a curve, a function that's slightly more complicated or more complex than a straight line. If you do that and make a prediction here, then it looks like, well, your friend's house could be sold for closer to $200,000.

One of the things you see later in this class is how you can decide whether to fit a straight line, a curve, or another function that is even more complex to the data. Now, it doesn't seem appropriate to pick the one that gives your friend the best price, but one thing you see is how to get an algorithm to systematically choose the most appropriate line or curve or other thing to fit to this data. What you've seen in this slide is an example of supervised learning. Because we gave the algorithm a dataset in which the so-called right answer, that is the label or the correct price y is given for every house on the plot. The task of the learning algorithm is to produce more of these right answers, specifically predicting what is the likely price for other houses like your friend's house. That's why this is supervised learning. To define a little bit more terminology, this housing price prediction is the particular type of supervised learning called regression. By regression, I mean we're trying to predict a number from infinitely many possible numbers such as the house prices in our example, which could be 150,000 or 70,000 or 183,000 or any other number in between. That's supervised learning, learning input, output, or x to y mappings.

You saw in this video an example of regression where the task is to predict a number. But there's also a second major type of supervised learning problem called classification. Let's take a look at what that means in the next video.

机器学习如今正在创造巨大的经济价值。我认为，目前通过机器学习创造的经济价值中，有99%是通过一种叫做监督学习的机器学习方式实现的。让我们看看这意味着什么。监督机器学习，或者更常见的说法是监督学习，指的是学习从x到y或从输入到输出映射的算法。监督学习的关键特征是，你给学习算法提供示例以供学习。这包括正确答案，即给定输入x的正确标签y，通过看到正确的输入x和期望输出标签y的组合，学习算法最终学会只根据输入本身（无需输出标签）给出相当准确的预测或猜测输出。

让我们看一些例子。如果输入x是一封电子邮件，输出y是这封邮件是否是垃圾邮件，这就为你提供了垃圾邮件过滤器。或者，如果输入是一个音频片段，算法的任务是输出文本转录，那么这就是语音识别。或者，如果你想输入英语并输出相应的西班牙语、阿拉伯语、印地语、中文、日语或其他语言的翻译，那就是机器翻译。或者，当今最赚钱的监督学习形式可能用于在线广告。几乎所有大型在线广告平台都有一个学习算法，输入关于广告的一些信息和你的一些信息，然后尝试判断你是否会点击该广告。因为通过向你展示稍微更有可能点击的广告，对于这些大型在线广告平台来说，每一次点击都是收入，这实际上为这些公司带来了大量收入。

这是我曾做过很多工作的一个领域，也许不是最有启发性的应用，但在某些国家今天确实产生了显著的经济影响。或者，如果你想制造一辆自动驾驶汽车，学习算法会将图像和其他传感器的信息（如雷达等）作为输入，然后尝试输出其他车辆的位置，以便你的自动驾驶汽车可以安全地绕开它们。或者考虑制造业。我实际上在这个领域的学习AI方面做了很多工作。你可以让一个学习算法把一张刚从生产线上下来的手机图片作为输入，让学习算法输出产品是否有划痕、凹痕或其他缺陷。这被称为视觉检查，它帮助制造商减少或防止产品中的缺陷。

在所有这些应用中，你首先会用输入x和正确答案即标签y的例子来训练你的模型。在模型从这些输入、输出或x和y组合中学到东西之后，它们就可以接受一个全新的输入x，这是它们以前从未见过的，并尝试产生相应的输出y。让我们更深入地研究一个具体的例子。假设你想根据房屋的大小预测房价。你已经收集了一些数据，假设你绘制了数据图表，它看起来像这样。在这里，水平轴是房屋的大小，单位是平方英尺。是的，我住在美国，我们仍然使用平方英尺。我知道世界上大多数地方使用平方米。垂直轴是房子的价格，单位是千美元。有了这些数据，假设一个朋友想知道他们750平方英尺的房子价格是多少。学习算法如何帮助你？学习算法可能能做的一件事是，比如说，对数据拟合一条直线，并根据这条直线判断，看起来你朋友的房子可能可以卖到大约150,000美元。但是拟合一条直线并不是你唯一可以使用的学习算法。还有其他可能更适合此应用的算法。例如，而不是拟合一条直线，你可能决定拟合一条曲线，一个比直线稍微复杂一点的函数。如果你这样做并在这里做出预测，那么看起来你朋友的房子可以卖到接近200,000美元。

你在这门课后面会看到的一件事是，你如何决定是拟合一条直线、一条曲线还是另一个更复杂的函数到数据上。现在，选择给你朋友最好价格的那个似乎不合适，但你看到的一件事是如何让算法系统地选择最适合拟合这些数据的直线、曲线或其他东西。你在这张幻灯片上看到的是监督学习的一个例子。因为我们给算法提供了一个数据集，其中所谓的正确答案，即每个房子在图上的标签或正确价格y都给出了。学习算法的任务是产生更多这样的正确答案，特别是预测其他像你朋友的房子可能的价格。这就是为什么这是监督学习。为了定义更多的术语，这种房价预测是一种特殊的监督学习类型，称为回归。我的意思是，我们试图从无限多个可能的数字中预测一个数字，比如我们的例子中的房价，可能是150,000或70,000或183,000或介于两者之间的任何其他数字。这就是监督学习，学习输入、输出或x到y的映射。

在这段视频中，你看到了一个回归的例子，任务是预测一个数字。但是还有第二种主要的监督学习问题类型叫做分类。让我们在下一段视频中看看这意味着什么。