无监督学习 k-means_监督学习-它意味着什么？

最新推荐文章于 2021-07-13 11:30:00 发布

weixin_26752765

最新推荐文章于 2021-07-13 11:30:00 发布

阅读量388

点赞数

文章标签： python 机器学习人工智能 java 深度学习

原文链接：https://medium.com/ai-in-plain-english/supervised-learning-what-does-it-entail-e7e265ea7868

版权

无监督学习 k-means

When we think of a machine, we often think of it in the engineering sense — an actual physical device (with moving parts) that makes some work easier. In machine learning, we use the term machine much more liberally, such as in the support vector machine or the restricted Boltzmann machine — do not worry about these for now. Luckily, none of these machines comes any close to the kind we see in Terminator movies or the Marvel cinematic universe.

当我们想到一台机器时，我们通常会从工程的角度来考虑它-一种实际的物理设备(带有运动部件)，使某些工作变得容易。在机器学习中，我们更广泛地使用机器一词，例如在支持向量机或受限的玻尔兹曼机器中 -暂时不用担心。幸运的是，这些机器都不比我们在《终结者》电影或《漫威电影世界》中看到的那种机器更接近。

Instead, what we refer to as a machine is often an unassuming computer programme that you may feed with some kind of data, and it would, in turn, be able to make some predictions about the future, derive some insights about the past, or take some optimal decisions. Such a computer programme may be stored on your PC or your smartphone, or in the brain of a robot — it really doesn’t matter where — and it’d still be a machine, regardless. The most basic ingredient, however, is data.

取而代之的是，我们所说的机器通常是一个不带假设性的计算机程序，您可能会提供一些数据，从而可以对未来做出一些预测，对过去有所了解，或者做出一些最佳决定。这样的计算机程序可以存储在您的PC或智能手机上，也可以存储在机器人的大脑中–并不重要，在任何地方–仍然可以是一台机器 。但是，最基本的要素是数据。

This data could come in many diverse forms: it could be data obtained from a survey or a poll, a physical or chemical experiment, medical records or diagnostics, images of food on the internet, or one’s Facebook posts, really. The data could as well be biometrics such as one’s fingerprints. For example, you may recall when you had a new smartphone and you had to set up fingerprint recognition. You provide the computer programme or machine residing inside the phone your fingerprint data (including those you rotate and deliberately distort); the machine then identifies a pattern in your fingerprint data that distinguishes it from everybody else’s; subsequently, it is able to predict whether any new fingerprint belongs to you or an intruder. This is the stuff of a subfield of machine learning known as semi-supervised learning, which combines elements of supervised and unsupervised learning principles. In this post, we will focus only on supervised learning.

这些数据可以有多种形式：可以是从调查或民意测验，物理或化学实验，病历或诊断程序，互联网上的食物图像或Facebook帖子中获取的数据。数据也可以是生物特征，例如一个人的指纹。例如，您可能想起了何时拥有新的智能手机，并且必须设置指纹识别。您向手机内的计算机程序或机器提供指纹数据(包括您旋转和故意扭曲的数据)；然后，机器会在您的指纹数据中识别出一种模式，以区别于其他模式；随后，它可以预测是否有任何新指纹属于您或入侵者。这是机器学习称为半监督学习 ，它结合的监督和无监督学习原理的元素的子场的东西。在这篇文章中，我们将只关注监督学习。

To think more broadly of supervised learning, it may be useful to imagine this dialogue with a much younger sibling who encounters a dog for the first time on TV.

要更广泛地考虑监督学习，可以想象一下与第一次在电视上遇到狗的年轻同胞的对话。

“What is this?” your sibling asks you, pointing to a group of dogs on the programme.

“这是什么？” 您的兄弟姐妹问您，指着该计划中的一群狗。

“It’s what we call a dog,” you respond.

您回答：“这就是我们所说的狗。”

The innocent child is content, because this is the first time they’re seeing this animal; they can’t disagree with you, at least until one week later when you’re watching the same TV programme again, and they see some cats.

无辜的孩子很满足，因为这是他们第一次看到这种动物。他们不会不同意您的意见，至少要等到一个星期后，当您再次观看同一电视节目时，他们才会看到一些猫。

“Look! Here’s a group of small dogs,” they say.

“看！下面是一组小型犬的，”他们说。

“No, those are cats,” you say, smiling, seeing the confusion in their face. Yet, your sibling raises no objections, because they probably reason, not so incorrectly, that a dog is generally large, and a cat is generally small… until the following week when you watch this programme again, and they see a group of puppies.

“不，那是猫。”看到他们的困惑，微笑着说。但是，您的兄弟姐妹没有提出异议，因为他们可能(不是很错误地)认为狗通常很大，猫通常很小……直到下周您再次观看此程序时，他们才看到一群小狗。

“Hey look, here’s a group of brown cats,” your young sibling says.

“嘿，看，这是一群棕猫，”你的小兄弟姐妹说。

You smile again. “Those are actually dogs, believe it or not,” you say.

你又笑了您说：“这些狗实际上是狗，信不信由你。”

Now they don’t know if you’re messing with them or not, so they lean in closer toward the TV, and then observe that the dogs have prominent snouts while the cats they saw the week before had more or less flatter faces. That must be it, the child decides.

现在，他们不知道您是否在和他们开玩笑，所以他们靠在电视机前，然后观察狗的鼻子是突出的，而前一周看到的猫的脸则或多或少地变得平坦。那一定是，孩子决定。

Several things stand out from this analogy: first, and in fact the main thing that distinguishes supervised learning from other fields of machine learning, is the simple fact that you actually tell your sibling what animal it is, whenever they come across one. This may seem a rather trivial distinction, but consider the contrasting scenario where your sibling didn’t have you around, and they probably end up assuming that the universe is populated with dogs, and that a cat is just a small dog. We refer to this paradigm of machine learning as supervised learning because you essentially act like some kind of teacher or a supervisor who puts a label or an annotation (i.e., “dog” or “cat”) on any new animal (i.e., data) your sibling (who’s acting as our machine) comes across. For this reason, we often refer to the data that is employed in supervised learning settings as labelled or annotated data. In the fingerprint recognition example, what label is used to train the machine to detect an intruder’s prints may not be so obvious. But if one considered it critically, with you giving the machine many examples of your fingerprints, the machine learns to associate your prints to a label which is a binary indicator, i.e., 1 for your fingerprints, and 0 for all other fingerprints it did not see during the setup phase of the phone. This falls under yet another subfield known as anomaly detection, since an intruder’s prints are considered as anomalies to what the machine has come to know.

从这个类比中可以看出几点：首先，事实上，有监督的学习与其他机器学习领域之间的区别是，一个简单的事实，就是当兄弟姐妹碰到动物时，您实际上告诉了它是什么动物。这看起来似乎是微不足道的区别，但是考虑一下相反的情况，即您的兄弟姐妹没有您在附近，而他们最终可能会假设宇宙中充满了狗，而猫只是一只小狗。我们将这种机器学习范式称为监督学习，因为您的行为本质上就像是在任何新动物(即数据)上贴上标签或注释(即“狗”或“猫”)的某种老师或主管一样您的兄弟姐妹(充当我们机器的兄弟姐妹)遇到了。因此，我们通常将在有监督的学习设置中使用的数据称为标记或注释数据。在指纹识别示例中，用来训练机器以检测入侵者的指纹的标签可能不是那么明显。但是，如果认为这是关键的是，你给机器指纹的例子很多，机器学会了你的指纹，因为它没有其他的指纹关联到一个标志，是一个二进制值即，1代表你的指纹，和0请参阅手机的设置阶段。这属于另一个称为“ 异常检测”的子领域，因为入侵者的打印被认为是机器已知信息的异常。

The second thing that stands out from the analogy is that the child is never explicitly told precisely what defines a dog or a cat; if they were told, it wouldn’t really be learning, but more like memorising. Instead, they have to figure out themselves by observing the characteristics of the two animals: they identify the size of the animal, as well as the presence of a prominent snout, as being indicative of the target, i.e., whether the animal is a dog or a cat. These things that help in identifying the animal, i.e., the size of the animal and the presence of a prominent snout, are often referred to as features in machine learning. As you may expect, the set of features that are indicative of the target, i.e., the animal being a dog or a cat, are not limited to just those two, but can possibly be quite large. For example, a meticulous child may also observe differences in features such as the lengths of the tails of the two animals, the sizes of their ears or the length of their paws. All these characteristics may constitute the feature set. In machine learning, we think of the set of all features as a vector, and the dimension or size of this vector (which is just the number of features) is referred to as the dimensionality.

从类推中脱颖而出的第二件事是，从来没有明确地告诉孩子确切的定义是狗还是猫。如果被告知，那将不是真正的学习，而更像是回忆。相反，他们必须通过观察两只动物的特征来弄清楚自己：他们确定了动物的体型以及突出的鼻子的存在，以此作为目标的指示，即动物是否是狗。或猫。这些东西，在识别所述动物，即帮助，动物的大小和一个突出的口鼻部的存在，常常被称为在机器学习功能。如您所料，指示目标的一组特征(即，动物是狗还是猫)不仅限于这两个特征，而且可能很大。例如，一个细心的孩子可能还会观察到特征上的差异，例如两只动物的尾巴长度，耳朵的大小或爪子的长度。所有这些特征可以构成特征集 。在机器学习中，我们将所有特征的集合视为一个向量，并且该向量的维数或大小(即特征的数量)称为维数。

Eventually, the child learns certain rules on their own — we will later see in subsequent posts exactly how this is done — about these features with which they are able to predict on their own whether a given animal is a dog or a cat. Such a rule might be: if the height of the animal is less than twenty centimetres, and it has no prominent snout, and its tail is at most ten centimetres long, and its ear is at most three centimetres in diameter, then it is a cat; otherwise, it’s a dog.

最终，孩子会自己学习某些规则-我们稍后将在后续文章中确切地了解这是如何完成的-有关这些功能的信息，他们可以自己预测给定动物是狗还是猫。这样的规则可能是：如果动物的身高小于20厘米，并且没有明显的鼻子，并且其尾巴最多10厘米长，耳朵的直径最多3厘米，那么它就是猫; 否则，它是一只狗。

Learning such rules about the features is usually only the first of three main phases in machine learning, and is known as the training phase; the second phase involves validating how correct the rules learned in the training phase are, and is known as validation; in this validation stage, we test the learned rules on new or unseen data in the hopes of tweaking the rules, if those rules don’t really apply. For example, after your sibling saw the cat, they must have learned a rule like so: “a dog is generally large, while a cat is generally small”. However, upon coming across a small dog, they changed the rules and then included the presence of a prominent snout. This is what happens in the validation phase of machine learning — adjusting the learned rules usually via adjusting certain high-level parameters known as hyperparameters. The third and final phase is known as testing and is very similar to the validation phase, in that the rules learned in the training/validation phases are put to the test again on new or unseen data. However, unlike in the validation phase, there are usually no (or restricted) avenues to tweak the learned rules at this stage, because the rules (which now constitute the machine), are often deployed in a product such as your smartphone or computer system. There are, of course, systems or machines that are designed so that they are capable of constantly training themselves using the data they encounter even during the testing phase.

学习有关这些功能的规则通常只是机器学习三个主要阶段中的第一个阶段，称为训练阶段。第二阶段涉及验证在训练阶段学到的规则的正确性，称为验证；在此验证阶段，我们将在新数据或看不见的数据上测试学习到的规则，以期对规则进行调整(如果这些规则并非真正适用的话)。例如，在您的兄弟姐妹看到猫之后，他们一定学会了这样的规则：“狗通常很大，而猫通常很小”。但是，当遇到一只小狗时，他们改变了规则，然后加入了一个突出的鼻子。这就是在机器学习的验证阶段发生的事情-通常通过调整某些称为“ 超参数”的高级参数来调整学习的规则。第三也是最后一个阶段称为测试，它与验证阶段非常相似，因为在训练/验证阶段中学习的规则将根据新数据或看不见的数据再次进行测试。但是，与验证阶段不同，在此阶段，通常没有(或受限制的)方法来调整学习的规则，因为规则(现在构成了机器 )通常部署在智能手机或计算机系统等产品中。当然，有些系统或机器的设计使其即使在测试阶段也能够使用遇到的数据不断进行自我训练。

So far, it may not be obvious what makes supervised machine learning challenging, if all it entails is learning rules from features about some targets. (Recall based on the analogy used that the targets are the labels on the animals, i.e., “dog” or “cat”, and the features are the characteristics of the animals by which we can decide that it is a dog or a cat, i.e., its size, the presence of a prominent snout, etc.) Yet, the peculiarities of many real-world problems for which we wish to employ machine learning are such that: (1) the rules we want to learn from the features about the targets may be rather too complex; (2) the targets and/or features may be noisy; (3) the features on which the rules ought to be based may not even be so obvious to us. I will now describe these specific challenges in a little more detail.

到目前为止，使受监督的机器学习更具挑战性的可能不是很明显，只要它所需要的就是从某些目标的特征中学习规则。 (根据类推，回想一下，目标是动物的标签，即“狗”或“猫”，特征是动物的特征，通过它们我们可以确定它是狗还是猫，例如，它的大小，突出的鼻子等的存在。)然而，我们希望采用机器学习的许多现实世界问题的特点是：(1)我们想从以下特征中学习的规则：目标可能太复杂了； (2)目标和/或功能可能嘈杂； (3)规则应以其为基础的功能对我们来说可能并不那么明显。我现在将更详细地描述这些具体挑战。

First, recall the rules by which your sibling learned to distinguish a dog from a cat? If the height of the animal is less than twenty centimetres, and it has no prominent snout, and its tail is at most ten centimetres long, and its ear is at most three centimetres in diameter, then it is a cat; otherwise, it’s a dog. This rule uses only four features related to: height, snout, tail and ears. Now imagine you had a million features — yes, that’s a realistic number in some machine learning applications such as computer vision — with which to train a machine that can identify all the objects in your house, then you can very well imagine that the rules about these million features aren’t going to be as trivial as the one we have seen. Just for the avoidance of doubt, these rules are, in fact, mathematical relationships between the features and the targets. Except for simple machine learning problems, these mathematical relationships are rarely simple comparators like “if the height is greater than twenty centimetres, then it is a dog”, but are often ones involving complex operations such as exponentiation of these features.

首先，还记得您的兄弟姐妹学会了区分狗和猫的规则吗？ 如果动物的身高不到20厘米，并且没有明显的鼻子，并且尾巴最长不超过10厘米，耳朵直径不超过3厘米，则说明它是猫； 否则，它是一只狗。 该规则仅使用与以下四个特征相关的特征：身高，鼻子，尾巴和耳朵。现在，假设您拥有一百万个功能-是的，在某些机器学习应用程序(例如计算机视觉)中，这是一个现实的数字，通过这些功能训练一台可以识别房屋中所有物体的机器，那么您可以很好地想象一下有关这100万个功能不会像我们所看到的那样微不足道。为了避免疑问，这些规则实际上是特征和目标之间的数学关系。除了简单的机器学习问题外，这些数学关系很少是简单的比较器，例如“如果高度大于二十厘米，那么它就是一条狗”，但经常涉及复杂的操作，例如对这些特征求幂。

It is often joked that the engineer thinks their equations are an approximation to reality, and the physicist that reality is an approximation to their equations, while the mathematician just doesn’t care. In coming up with these complex mathematical relationships between the features and the target for any given problem, our machine often balances a tradeoff between being an engineer and being a mathematician. If we didn’t care that our rules or mathematical relationships are close to our understanding of reality, then we may possibly come up with very accurate relationships. But if we insist on the rules being explainable based on our rough approximation of reality, then this may be at the expense of some loss in accuracy in our machine’s output. This is known as the accuracy-explainability tradeoff.

经常开玩笑的是，工程师认为他们的方程是对现实的近似，而物理学家认为现实是对方程的近似，而数学家根本不在乎。在针对任何给定问题提出特征与目标之间的这些复杂数学关系时，我们的机器通常会在工程师和数学家之间进行权衡。如果我们不在乎我们的规则或数学关系是否接近我们对现实的理解，那么我们可能会提出非常准确的关系。但是，如果我们坚持基于对现实的粗略近似就可以解释这些规则，那么这可能是以牺牲机器输出的准确性为代价的。这称为精度-可解释性折衷。

Furthermore, in the analogy we used, we have assumed that you always correctly tell your sibling what the right animal is, whenever they encounter it. Thus, your sibling always has the right label or target to reason about the features. In practice, this is hardly the case; the targets can be deliberately or inadvertently flipped. For example, if while watching the TV programme, you were seated quite far from the TV when the puppy came on, your sibling might have shouted, “Is this another cat?” And because you’re probably myopic and couldn’t see the animal quite clearly, you might have simply responded “Yes.” Alternatively, you might have actually seen the puppy quite clearly, but when you shouted back to your sibling: “No, it’s a dog!”, this response got lost in some ongoing conversation in the room, and your sibling heard you as saying, “Yes, it’s a cat!”. Thus, your sibling ends up learning the wrong rules to distinguish a dog from a cat. In this case, we refer to the targets as being noisy, because they are no longer error-free. The features may also be noisy, for example, the image of the cats and dogs your sibling saw on TV might have been distorted or occluded around the snout of a dog. Due to such noisy observations we cannot learn rules that are absolutely correct; we can only be probably approximately correct (PAC), which is a mathematical framework for analysing machine learning methods. There are even worse scenarios where whole noisy inputs are introduced into the machine learning deliberately by adversaries with malicious intents. For example, the machine in a self-driving car was fooled by adversarial inputs to drive 50 mph over the speed limit. This has led to research into what’s referred to as adversarial machine learning, dealing with how to simulate and detect adversarial examples.

此外，在我们所使用的类比中，我们假设您总是在正确的时候告诉您的兄弟姐妹什么是对的动物。因此，您的兄弟姐妹始终具有正确的标签或目标来推理特征。实际上，情况并非如此。可以故意或无意地翻转目标。例如，如果在看电视节目的时候，当小狗来时您坐在电视旁边很远的地方，您的兄弟姐妹可能会喊道：“这是另一只猫吗？” 而且，由于您可能是近视者，而且看不清动物的身影，因此您可能只是回答“是”。或者，您实际上可能已经很清楚地看到了这只小狗，但是当您对同胞大喊：“不，那是一条狗！”时，在房间里正在进行的对话中，这种回应就消失了，同胞听到了您的声音， “是的，它是只猫！” 因此，您的兄弟姐妹最终学习了错误的规则以区分狗和猫。在这种情况下，我们将目标称为“ 嘈杂” ，因为它们不再没有错误。这些功能也可能很吵，例如，您的兄弟姐妹在电视上看到的猫和狗的图像可能在狗的鼻子周围变形或被遮挡了。由于这种嘈杂的观察，我们无法学习绝对正确的规则；我们大概只能是近似正确的 (PAC)，这是一种用于分析机器学习方法的数学框架。在更糟糕的情况下，具有恶意意图的对手会故意将整个嘈杂的输入引入机器学习。例如，无人驾驶汽车欺骗了自动驾驶汽车中的机器，使其以超过每小时50英里的速度行驶。这导致对所谓的对抗机器学习的研究 ，涉及如何模拟和检测对抗示例。

Finally, in our analogy, we have made a very fundamental assumption that the child easily picks up on the relevant features by which to distinguish a cat from a dog: first, they consider the sizes of the animals — supposing that a dog is large and a cat is small — and when presented with a small dog, they adjusted the rules and considered features such as the presence of a prominent snout. While this astuteness may come easily to humans, this is not the case with machines. If we were to replace the child in our analogy with our machine, and then present it with pictures of dogs and cats, the machine would not easily know to focus on the sizes of the animals or the presence of prominent snouts or whiskers as features from the image pixels. It could, in fact, consider as features the number of legs of the animals — which obviously is irrelevant — if there was an object obstructing one of the dog’s legs in at least one of the images! In contrast, a human child might not be easily fooled by that.

最后，以类推的方式，我们做出了一个非常基本的假设，即孩子很容易掌握将猫和狗区分开的相关特征：首先，他们考虑了动物的体型-假设狗很大，而且一只猫很小-当和一只小狗一起出现时，他们调整了规则并考虑了诸如突出的鼻子之类的特征。尽管这种敏锐度对人来说很容易，但是机器却不是这种情况。如果我们要用机器代替类比中的孩子，然后用狗和猫的图片展示它，那么机器就不容易知道专注于动物的大小或突出的鼻子或胡须等特征。图像像素。实际上，如果至少有一张图像中有物体挡住了一只狗的一只腿，那么它可以考虑将动物的腿的数量作为特征，这显然是无关紧要的！相比之下，人类孩子可能不会因此而轻易被愚弄。

Thus, one painstaking step in classical machine learning is what we refer to as feature engineering or feature extraction. Basically, we need to tell the machine what features it needs to look out for; the machine may then hopefully come up with relevant rules about these features. For example, in order to train a machine to distinguish between people who identify as males or females from pictures, we may need to specify certain distances in the face, such as the separation between the eyes, the width of the nose and the location of the centres and corners of the eyes as features to the machine. In other words, we have to extract these features from the images for the machine, and sometimes we have to engineer others; for example, we might take the ratio of the x- and y-coordinates of the centres of the eyes, or the logarithm of the separation between the eyes.

因此，经典机器学习中的一个艰辛步骤就是我们所说的特征工程或特征提取 。基本上，我们需要告诉机器它需要寻找什么功能。然后，机器可能希望提出有关这些功能的相关规则。例如，为了训练机器来从图片中识别出是男性还是女性，我们可能需要指定面部的特定距离，例如眼睛之间的距离，鼻子的宽度和位置。眼睛的中心和角是机器的特征 。换句话说，我们必须从机器图像中提取这些特征，有时还需要设计其他特征。例如，我们可以取眼睛中心的x坐标和y坐标的比值，或两眼间距离的对数。

Yet, even when we extract features, we do not even know the optimal number of features to select. If they are too few, we may lose certain information necessary to build accurate rules about the problem, and if they are too many, certain problems could arise, among them the so-called curse of dimensionality and the ever-present issue of overfitting which we would certainly devote another post to discuss. For example, in our analogy, while having more features than “size” alone can arguably help us develop more accurate rules to distinguish a dog from a cat, when the features become too many, a lot of it — such as the colour of the eyes or the number of limbs — may be irrelevant, and we may face the risk of overdoing or overfitting it.

但是，即使提取特征，我们也不知道要选择的最佳特征数。如果它们太少，我们可能会丢失某些必要的信息以建立关于该问题的准确规则；如果它们太多，则可能会出现某些问题，其中包括所谓的维数诅咒和永远存在的过拟合问题。我们当然会另辟一席讨论。例如，以我们的类比来说，虽然功能本身比“大小”更多，可以说可以帮助我们制定更准确的规则以区分狗和猫，但是当这些功能变得太多时，其中的很多功能(例如眼睛或四肢的数量-可能无关紧要，我们可能面临过度或过度安装的风险。

Rather than engineer or extract features, one of the utilities of the subfield of machine learning known as deep learning is to have the machine learn the features and then learn the rules about those features. While this promises to resolve the issue about feature engineering, we will later see the unique challenges deep learning itself presents.

而不是设计或提取特征，机器学习子领域的一种实用程序(称为深度学习)是让机器学习特征，然后学习有关这些特征的规则。尽管这有望解决有关功能工程的问题，但我们稍后将看到深度学习本身所面临的独特挑战。