机器学习recall含义_机器学习的业务含义

本文探讨了机器学习的业务影响,强调了它的重要性,指出其可以使一切变得程序化,但其效果取决于训练数据的质量。机器学习的挑战在于需要大量高质量的训练数据,而获取这些数据需要创新的方法,如创建吸引用户的应用或收集独特数据。机器学习的发展也将对硬件产生影响,导致计算能力更集中在服务器端,设备变得更小、功耗更低。最后,机器学习将加剧赢家通吃的现象,初创公司需要创造独特的训练数据以竞争,并可能导致新的网络效应商业模式的出现。
摘要由CSDN通过智能技术生成

机器学习recall含义

by Drew Breunig

通过德鲁·布鲁尼格

机器学习的业务含义 (The Business Implications of Machine Learning)

这与它可以做什么无关,而是其优先级的影响 (It’s not about what it can do, but the effects of its prioritization)

As buzzwords become ubiquitous they become easier to tune out.

随着流行语变得无处不在,它们变得更容易调出。

We’ve finely honed this defense mechanism, for good purpose. It’s better to focus on what’s in front of us than the flavor of the week. CRISPR might change our lives, but knowing how it works doesn’t help you. VR could eat all media, but it’s hardware requirements keep it many years away from common use.

为了良好的目的,我们已经很好地磨练了这种防御机制。 最好专注于摆在我们面前的事物,而不是本周的风格。 CRISPR可能会改变我们的生活 ,但是知道它的工作原理对您没有帮助。 VR可以吞噬所有媒体,但是它的硬件要求使它远离普通用途已经有很多年了

But please: do not ignore machine learning.

但是请: 不要忽视机器学习

Yes, machine learning will help us build wonderful applications. But that isn’t why I think you should pay attention to it.

是的,机器学习将帮助我们构建出色的应用程序。 但这不是为什么我认为您应该注意它。

You should pay attention to machine learning because it has been prioritized by the companies which drive the technology industry, namely Google, Facebook, and Amazon. The nature of machine learning — how it works, what makes it good, and how it’s delivered — ensures that this strategic prioritization will significantly change the tech industry before even a fraction of machine learning’s value is unleashed.

您应该注意机器学习,因为机器学习已被推动技术产业的公司(即Google,Facebook和Amazon)所重视。 机器学习的本质-它如何工作,如何使其出色以及如何交付-确保这种战略优先级将在甚至释放一小部分机器学习价值之前,显着改变技术行业。

To understand the impact of machine learning, let’s first explore it’s nature.

要了解机器学习的影响,让我们首先探索它的本质。

(I am going to use deep learning and machine learning interchangeably. Forgive me, nerds.)

(我将交替使用深度学习和机器学习。请原谅我,书呆子。)

机器学习使一切成为程序化 (Machine Learning Makes Everything Programmatic)

The goal of machine learning, or deep learning, is to make everything programmatic. As I wrote in January:

机器学习或深度学习的目标是使所有内容都具有程序性。 正如我在一月份写道

In a nutshell, deep learning is human recognition at computer scale. The first step to create an algorithm is providing a program with lots and lots of data which has been organized by humans, like tagged photos. The program then analyzes the bits of the raw data and notes patterns which correlate with the human organized data. The program then looks for these known patterns in the wild. This is how Facebook suggests friends to tag in photos and Google Photos searches by people.
简而言之,深度学习是计算机规模的人类认可。 创建算法的第一步是为程序提供由人类组织的大量数据,例如带标签的照片。 然后,程序分析原始数据的位并记录与人类组织的数据相关的模式。 然后,程序会在野外寻找这些已知模式。 Facebook就是这样建议朋友在人中标记照片和Google相册搜索的方式。

So far, most of the deep learning applications people use are essentially toys: smarter photo albums and better speech recognition. These early applications are forgiving. If a learning algorithm misses a face or forces you edit a tricky word, it’s okay (usually). But as our investment continues and these algorithms become more dependable we’ll see them deployed in more interesting environments, with more interesting use cases.

到目前为止,人们使用的大多数深度学习应用实质上都是玩具: 更智能的相册更好的语音识别 。 这些早期的应用是宽容的。 如果学习算法错过了一张脸或强迫您编辑一个棘手的单词, 通常就可以了。 但是随着我们投资的不断增加以及这些算法变得更加可靠,我们将看到它们部署在更有趣的环境中,并且使用了更多有趣的用例。

The takeaway here is the machine learning allows companies to build better applications that interact with things people create: pictures, speech, text, and other messy things. This allows companies to create software which understands us. The potential is there to solve the user interface problems that’ve been keeping people from computing since the Eniac. And major UI advancements tend to kick off major eras of computing.

这里要说的是,机器学习使公司可以构建更好的应用程序,以与人们创建的事物进行交互:图片,语音,文本和其他杂乱的事物 。 这使公司可以创建了解我们的软件 自从Eniac以来,解决用户界面问题一直是人们无法进行计算的潜力。 UI的重大改进往往会开启计算的主要时代。

The mouse and graphic interfaces made computers accessible, household objects.

鼠标和图形界面使计算机可以用作家用物品。

Touch interfaces made computers normal, everyday tools.

触摸界面使计算机成为日常的日常工具。

Interfaces powered by machine learning will make computing omnipresent. (Eventually)

由机器学习提供支持的界面将使计算无所不在。 (最终)

But there’s a catch:

但是有一个陷阱:

机器学习与其培训数据一样好 (Machine Learning is Only as Good as its Training Data)

To make a machine learning model you need three things, in order of importance:

要建立机器学习模型,您需要满足重要性的三件事:

  1. Training Data: Data which has been tagged, categorized, or otherwise sorted by humans.

    训练数据:已被人类标记,分类或以其他方式分类的数据。

  2. Software: The software library which builds the machine learning models by evaluating training data.

    软件:通过评估训练数据来构建机器学习模型的软件库。

  3. Hardware: The CPUs and GPUs which run the software’s calculations.

    硬件:运行软件计算的CPU和GPU。

Hardware is easy enough to acquire. Rent it, buy it, whatever.

硬件很容易获得。 买它 ,不管。

Software is even easier to acquire! If you rented, you may have accidentally done so already. If not, almost all of it is available free.

软件更容易获得! 如果您租了房子,可能已经不小心这样做 。 如果没有, 几乎所有的东西都是免费的

Now all you need is training data. And lots of it!

现在,您只需要训练数据即可。 还有很多!

Good luck.

祝你好运

Before we get into how exactly screwed you are, let’s first understand why you need so much training data in the first place.

在深入了解您的问题之前,首先让我们了解一下为什么您首先需要这么多的培训数据。

Our deep and machine learning software is good. Better than it was! But to work well it requires tons of training data to produce good results. This cannot be overstated: the quality of the models you make is directly correlated to the quantity and quality of the training data the software intakes. Until we have better software we’re unable to build good models from small datasets. (And when I say “small” I mean, not ginormous.)

我们的深度和机器学习软件很好。 比以前更好! 但是要做好工作,需要大量的培训数据才能产生良好的结果。 这不能夸大其词:您制作的模型的质量与软件获取的训练数据的数量和质量直接相关。 在拥有更好的软件之前,我们无法从小型数据集中构建良好的模型。 (当我说“小”时,我是说,不是笨拙的 。)

Unfortunately, better software is not going to arrive overnight. While most software gets incrementally better, as developers squash bugs week by week, machine learning will likely advance in a punctuated equilibrium fashion: in a few, hard-won, big leaps.

不幸的是,更好的软件不会一overnight而就。 尽管大多数软件逐渐变得越来越好,但是随着开发人员每周都在解决漏洞,机器学习可能会以一种刻不容缓的平衡方式发展:以几次来之不易的巨大飞跃。

The reason for this is deep learning software is nearly impossible to debug because we don’t fully understand how it works. To me, this is the weirdest thing about machine learning. We don’t really know what makes it tick. We can’t debug it systematically, we can only guess and check.

造成这种现象的原因是深度学习软件几乎不可能调试,因为我们还不完全理解它是如何工作的 。 对我来说,这是关于机器学习的最奇怪的事情。 我们真的不知道是什么使它滴答作响。 我们不能系统地调试它,我们只能猜测和检查。

Pete Warden, machine learning evangelist extraordinaire, explains:

机器学习布道士Pete Warden 解释说

Even though the Krizhevsky approach won the 2012 Imagenet competition, nobody can claim to fully understand why it works so well, which design decisions and parameters are most important. It’s a fantastic trial-and-error solution that works in practice, but we’re a long way from understanding how it works in theory. That means that we can expect to see speed and result improvements as researchers gain a better understanding of why it’s effective, and how it can be optimized. As one of my friends put it, a whole generation of graduate students are being sacrificed to this effort, but they’re doing it because the potential payoff is so big.

即使Krizhevsky方法在2012年Imagenet竞赛中获胜,也没有人可以声称完全理解为什么它如此有效,其中哪些设计决策和参数最为重要。 这是在实践中可行的出色的反复试验解决方案,但是距离理论上的工作方式还有很长的路要走。 这意味着随着研究人员更好地了解它为何有效以及如何对其进行优化,我们可以期望看到速度和结果的改进。 正如我的一位朋友所说, 这一努力牺牲了整整一代的研究生 ,但他们之所以这样做,是因为潜在的回报如此之大。

Until we understand how deep learning works, we need to make up for its inadequacies with big piles of training data.

在我们了解深度学习的工作原理之前,我们需要使用大量的训练数据来弥补深度学习的不足。

Training data is the lifeblood of machine learning.

训练数据是机器学习的命脉。

So how do we get it?

那么我们如何得到它呢?

学习使用水牛城的每个部分(或用户) (Learning to Use Every Part of the Buffalo (or User))

If computers are to understand messy, human things they need to be taught by messy humans. Makes sense. But when we remember how much data we’re going to need to make our models, we’re faced with a challenge: where are we going to find tons of people willing to spend their spare time to create our training data?

如果计算机要理解凌乱的人类事物,则需要由凌乱的人类来教它们。 说得通。 但是,当我们记住建立模型所需的数据量时,我们将面临一个挑战: 我们将在哪里找到愿意花大量时间创建训练数据的人?

If you said, “I’ll hire them,” I have some bad news. At this scale paying them is pretty much out of the question.

如果您说“我会雇用他们”,我有个坏消息。 以这样的规模支付他们几乎是不可能的。

If you said, “I’ll trick them,” you’re getting warmer.

如果您说:“我会欺骗他们”,那么您会越来越热。

A frequent refrain among people who write about the Internet is: “if you’re not paying, you’re the product.” These writers are commenting on ad-supported products — like Facebook, Google , Tumblr, SnapChat, and most everything else online— that package up your attention and sell it to advertisers. But their refrain works just as well for machine learning.

在撰写有关Internet的人们中,一个常见的说法是:“如果您不付款,那便是产品。” 这些作者对诸如Facebook,Google,Tumblr,SnapChat等广告支持的产品以及大多数在线其他产品发表评论,这些产品将您的注意力集中起来并出售给广告商。 但是他们的克制对机器学习同样有效。

Users of free services are the humans who will train computers in order to build better products and services. The ‘free’ part is crucial because it allows for the massive amounts of users which our data needs require.

免费服务的用户是将培训计算机以构建更好的产品和服务的人员。 “免费”部分至关重要,因为它可以满足我们数据需求所需的大量用户。

All of this makes me think of the old line about Native Americans using every part of the buffalo. Online services are learning how to use more parts of their users. Our attention creates their advertising and our knowledge fuels their deep learning models.

所有这些使我想起了有关美洲印第安人使用水牛城各个部分的旧思路。 在线服务正在学习如何使用更多用户。 我们的注意力创造了他们的广告,而我们的知识则推动了他们的深度学习模型。

The trick to obtaining sufficient training data, then, is twofold. You need to:

因此,获得足够的训练数据的技巧是双重的。 你需要:

  1. Attract a bunch of people.

    吸引一群人。
  2. Convince them to create your training data.

    说服他们创建您的训练数据。

It’s Tom Sawyer and picket fences, just multiplied by several hundred million.

是汤姆·索耶(Tom Sawyer)和栅栏,乘以几亿。

互惠数据应用(RDA)的兴起 (The Rise of Reciprocal Data Applications (RDAs))

A new category of application, or application feature, has emerged to facilitate your fence painting. These applications are designed to spur the creation of training data as well as deliver the products powered by the data captured. People get better apps and companies get better data.

出现了一种新的应用程序类别或应用程序功能,以方便您进行栅栏绘画。 这些应用程序旨在促进培训数据的创建以及交付由捕获的数据提供支持的产品。 人们可以获得更好的应用程序,而公司可以获得更好的数据。

The clearest example of such a reciprocal data application (or RDA, for short) is Facebook Photos.

这样的互惠数据应用程序(简称RDA)最明显的例子是Facebook Photos。

Facebook Photos has been designed to prompt viewers to tag people in photos, easily and quickly. A clear call to action frames the faces of your friends and family after uploading an image. Tagging provides clear benefits to you, both for later searching and alerting those tagged in photos. Tagging garners attention and starts a conversation, which (non-coincidentally) are two of the main reasons why people use Facebook.

Facebook Photos旨在提示观众轻松,快速地标记照片中的人物。 上传图片后,清晰的号召性用语会构图您的朋友和家人的脸。 标记为您提供明显的好处,以便以后搜索和提醒照片中标记的人。 标记引起注意并开始对话,这(偶然)是人们使用Facebook的两个主要原因。

Meanwhile, all this tagging creates a massive pool of training data which can be used to train machine learning models. With better models, come better tagging suggestions and other features. Thanks to this RDA, Facebook likely has one of the best human image recognition models in the world.

同时,所有这些标记创建了大量的训练数据,可用于训练机器学习模型。 有了更好的模型,就会有更好的标记建议和其他功能。 由于有了RDA, Facebook可能拥有世界上最好的人类图像识别模型之一

Google Search is another RDA. Your searches and selections provide training data to Google, which helps make its search even better.

Google搜索是另一个RDA。 您的搜索和选择将为Google提供培训数据,从而有助于使其搜索效果更好。

Like their other products, both Google Search and Facebook Photos demonstrate how RDAs generate significant network effects. The more people use an app, the more data is generated, the better the app becomes, the more people use the app…

与其他产品一样,Google搜索和Facebook Photos都展示了RDA如何产生显着的网络效果 。 使用某个应用程序的人越多,生成的数据越多,该应用程序变得越好,使用该应用程序的人就越多……

Network effects are the engine needed for venture-backed companies in winner-take-all markets. Previously, the default network effect methods in the Valley was social/chat (you go where your friends are) or marketplaces (sellers go where the buyers are). This is why almost every non-marketplace venture-backed app or service shoehorns in sharing or communication features — even if it didn’t make sense in the app.

网络效应是赢家通吃市场中由风险投资支持的公司所需要引擎。 以前,Valley中的默认网络效应方法是社交/聊天(您去朋友的地方)或市场(卖方去买主的地方)。 这就是为什么几乎所有非市场风险投资支持的应用程序或服务都无法共享或通信功能的原因-即使在应用程序中没有意义。

RDAs are a new method for creating network effects which is just now becoming understood. As awareness of its business value grows, expect RDAs to propagate throughout the landscape.

RDA是一种用于创建网络效果的新方法,目前这种方法已经为人们所了解。 随着对其业务价值意识的增强,可以预期RDA会在整个环境中传播。

This propagation of RDAs will be the first major business impact of machine learning. Not only because they’ll divert resources, but because the qualities and requirements of of RDAs will influence the hardware and software which deploy them.

RDA的传播将是机器学习的第一个主要业务影响。 不仅因为它们会转移资源,而且因为RDA的质量和要求将影响部署它们的硬件和软件。

Here are the qualities of a Reciprocal Data Application:

以下是互惠数据应用程序的质量:

  1. Apps must be networked, preferably always. Otherwise, it cannot send the data it captures back home.

    应用必须联网,最好始终联网。 否则,它将无法将其捕获的数据发送回本地。

  2. Nearly all computation takes place off-device. The bulk of computation is the creation of the models, which requires access to the massive dataset created by all users. Hence, model construction cannot take place on the device. Comparing new data to computed models (for example, identifying an object or person in a picture or recognizing a spoken phrase) is computationally cheap.

    几乎所有计算都在设备外进行。 计算的大部分是模型的创建,这需要访问所有用户创建的海量数据集。 因此,无法在设备上进行模型构建。 将新数据与计算模型进行比较(例如,识别图片中的物体或人或识别口头短语)在计算上很便宜。

  3. Good apps need big audiences. More people equals more workers creating training data.

    好的应用需要大量的受众。 更多的人等于更多的工人创建培训数据。

  4. Good apps need lots of usage. More time spent using the app means each user has more opportunities to create training data.

    好的应用程序需要大量使用。 使用该应用程序花费的时间更多,意味着每个用户都有更多机会创建培训数据。

  5. Good apps encourage the creation of accurate data. If an app is designed in a way that coding errors occur often, the data will be weaker. App design needs to make it easy for users to enter accurate data, quickly.

    好的应用程序鼓励创建准确的数据。 如果应用的设计方式经常导致编码错误,那么数据将较弱。 应用程序设计需要使用户易于快速输入准确的数据。

So how do we build good one?

那么我们如何建立一个好人呢?

建立有价值的RDA的途径 (Paths Toward Building a Valuable RDA)

The data value of an RDA can be expressed as a product of the latter three points above.

RDA的数据值可以表示为以上后三点的乘积。

For example, you can have a relatively meager install base if those users spend hours a day coding data in a reliable fashion (see: Tinder, who’s sitting on an amazing training set of data to determine the attractiveness of photos). Or, you could have a giant install base which only occasionally codes data (Facebook, whose users tag photos usually when they’re uploaded).

例如,如果这些用户每天花费数小时以可靠的方式编码数据,则您的安装基础可能相对较少(请参阅:Tinder,他正坐在一组令人惊叹的数据培训上以确定照片的吸引力)。 或者,您可能拥有庞大的安装基础,仅偶尔对数据进行编码(Facebook,Facebook的用户通常在上传照片时会标记照片)。

The challenge here is that qualities #3 and #4 are a zero sum game (like advertising, the other part of the buffalo). If 50% of the world spends 20% of their time on Facebook, there’s not very much oxygen left for you to work with. Even if you scape up a few hundred million users and borrow 2 minutes of their daily time, Facebook’s data collection will outpace whatever gains you make by many, many factors. Because data is collected constantly, the value of RDAs should not be thought of in absolute terms but as a velocity.

这里的挑战是质量#3和#4是零和游戏(例如广告,水牛城的另一部分)。 如果世界上有50%的人在Facebook上花费了20%的时间,那么工作所剩无几。 即使您赚了几亿用户并每天借用他们2分钟的时间,Facebook的数据收集也将在许多方面超过您所获得的收益。 因为数据是不断收集的,所以不应绝对地考虑RDA的价值,而应将其视为一种速度

But, if in the above scenario you’re able to collect training data from your users Facebook cannot collect by design you cannot be outpaced, despite your smaller size. Small companies and other upstarts must pursue unique datasets if they want to compete.

但是 ,如果在上述情况下您能够从用户那里收集培训数据,则Facebook不能通过设计来收集,尽管您的规模较小,但您的速度不会超出预期。 小公司和其他新贵要想竞争就必须追求独特的数据集。

We can see three paths towards building a valuable Reciprocal Data Application:

我们可以看到构建有价值的互惠数据应用程序的三种途径:

  1. Get Lots of People: Create a compelling app that attracts tons of users. This is the model the Valley knows and loves. Build something disruptive, gain traction, and invest like hell to go big. In a way, this path is the accidental RDA path. Once big, tweaking your app to better collect training data is merely a way to diversify the value you gain from your users. This path is ridiculously hard and requires a ton of luck, then a ton of money. Plus, it’s kind of a catch 22. Once you’re this big, advertising is likely the lower hanging fruit. You probably shouldn't choose this path.

    吸引很多人:创建吸引大量用户的引人入胜的应用程序。 这是硅谷认识和喜爱的模型。 建立一些破坏性的东西,获得牵引力,并像地狱般投资以扩大规模。 在某种程度上,该路径是偶然的RDA路径。 规模庞大后,调整您的应用程序以更好地收集培训数据仅是使您从用户那里获得的价值多样化的一种方式。 这条路很难,需要大量的运气,然后是大量的钱。 另外,这很容易被抓住。22.规模如此之大时,广告很可能没有那么成功。 您可能不应该选择此路径。

  2. Get Lots of Time: Create an app that convinces a reasonable amount of people to spend an extraordinary amount of time using it. In many cases, these sorts of apps or services will passively used. Think a navigation app that captures driver input or an always-on digital assistant. Ambient apps are always available to observe or prompt users, increasing the velocity of the data they produce.

    获得大量时间:创建一个能够说服相当多的人花费大量时间使用的应用程序。 在许多情况下,这类应用程序或服务将被被动使用。 考虑一个捕获驾驶员输入或始终在线的数字助理的导航应用。 环境应用程序始终可用于观察或提示用户,从而提高了他们生成数据的速度。

  3. Collect Unique Data: Create an app which collects training data others can’t collect. Here, your app doesn’t need to be massive at launch, but a vision must exist for how the unique data you collect will later be used to build completely unique functions. These new functions need to be compelling enough to drive increased installs and usage to keep the velocity of your RDA sufficiently high prior to a large competitior changing the design of their apps and entering the market. This is how you might outrun Google and Facebook.

    收集唯一数据:创建一个应用程序来收集其他人无法收集的培训数据。 在这里,您的应用程序不必在启动时就变得庞大,但是必须存在一个愿景,即以后如何使用收集到的独特数据来构建完全独特的功能。 这些新功能必须足够引人注目,以推动更多的安装和使用,以在大型竞争对手改变其应用程序设计并进入市场之前,保持RDA的速度足够高。 这就是您可能超越Google和Facebook的方式。

You may have noticed that path #2 suggested examples which might not run on smartphones. Good eye! By taking computing into new contexts we can create RDAs which are more persistent, increasing the time spent with them. Better, these new contexts bring access to new types of data, which often merges path #2 into path #3.

您可能已经注意到路径2建议的示例可能无法在智能手机上运行。 好眼睛! 通过将计算纳入新的上下文中,我们可以创建更持久的RDA,从而增加了使用它们的时间。 更好的是,这些新上下文带来了对新型数据的访问,这些数据通常将路径2合并到路径3中。

Thankfully, since nearly all the functional value of RDAs is produced by far away servers crunching on massive datasets, individual devices have very little to do. Their brains are elsewhere so they can fit in more places.

值得庆幸的是,由于RDA的几乎所有功能价值都是由遥远的服务器处理大量数据集所产生的,因此单个设备几乎无事可做。 他们的大脑在其他地方,因此可以容纳在更多地方。

机器学习如何影响硬件 (How Machine Learning Impacts Hardware)

With most of the thinking taking place in server farms, devices which deliver RDAs can be low powered. Their CPUs can be slow, since comparing input to pre-calculated models requires little computation. Slower CPUs can be small, since they require less transistors and less heat dissapation. And slower CPUs require less power, meaning batteries can smaller (or remain the same size and spend their capacity elsewhere, like on cellular connectivity). Plus: they’re cheap!

由于大多数想法都发生在服务器场中,因此提供RDA的设备可能功耗很低。 它们的CPU速度可能很慢,因为将输入与预先计算的模型进行比较需要很少的计算。 较慢的CPU可能很小,因为它们需要更少的晶体管和更少的散热。 而且速度较慢的CPU需要较少的功率,这意味着电池可以更小(或保持相同的大小,并在其他地方使用其容量,例如在蜂窝连接上)。 另外:它们很便宜!

All this means devices which can deliver RDAs will propogate madly. If we can fit a cheap computer with wifi into a product and capture good data from that context we probably will build it. RDA capable computers will be injected everywhere: in your car, on your wrist, in your browser, through your portable speakers, in your TV, and more.

所有这些意味着可以提供RDA的设备将会疯狂地传播。 如果我们可以将带有wifi的廉价计算机安装到产品中,并从该环境中捕获良好的数据,我们可能会构建它。 具有RDA功能的计算机将被注入到各处:汽车,手腕,浏览器,便携式扬声器,电视等等。

The purest example of this is the Pebble Core. Positioned as a device for run tracking and music, the Core is really more of a generic computing dongle. It’s cheap, starting at $69. It has a low-powered CPU, WiFi, cellular connectivity, Bluetooth, a bit of storage, a headphone jack, two buttons, and a battery. That’s it. It’s interface is voice controlled and––most importantly for our discussion––Amazon’s Alexa is integrated. Alexa is an RDA.

Pebble Core是最纯粹的例子。 Core被定位为跑步跟踪和音乐的设备,实际上更像是一个通用的计算加密狗。 它很便宜,起价为69美元。 它具有低功耗的CPU,WiFi,蜂窝连接,蓝牙,少量存储,耳机插Kong,两个按钮和一个电池。 而已。 它的界面是语音控制的,而且-在我们的讨论中最重要的是-亚马逊的Alexa已集成。 Alexa是RDA。

By moving the computation required for Alexa to the server side, Amazon can deploy Alexa almost anywhere. Alexa now is delivered via Bluetooth speakers, HDMI sticks, and by whatever the Core is. Auto integration is inevitable.

通过将Alexa所需的计算移至服务器端,Amazon几乎可以在任何地方部署Alexa。 现在,可以通过蓝牙扬声器, HDMI棒以及任何Core来交付Alexa。 自动集成是不可避免的。

Amazon and others are incentivized to diversify their distribution to increase their ubiquity and the time you spend with your app. Further, new integrations bring new data, enabling better models.

亚马逊和其他公司受到激励以使其分布多样化,以增加其普遍性以及您在应用程序上花费的时间。 此外,新的集成带来了新的数据,从而实现了更好的模型。

Importantly, companies prioritizing machine learning are not incentivized to develop for the most powerful devices. Distribution of powerful, consumer devices is limited due to it’s expense and newness, limiting it’s value to RDAs which require massive pools of users. Expect device computing power to stagnate as the industry focuses on diverse, ubiquitous, cheap devices rather than powerful ones.

重要的是, 没有优先考虑机器学习的公司去开发功能最强大的设备。 功能强大的消费类设备由于其昂贵和新颖性而受到限制,从而限制了它对需要大量用户群的RDA的价值。 随着行业将重点放在多样化,无处不在的廉价设备上,而不是功能强大的设备上,预计设备计算能力将停滞不前。

机器学习的业务含义 (The Business Implications of Machine Learning)

To recap, this is how machine and deep learning investment will likely impact the tech industry:

概括地说,这是机器和深度学习投资可能会影响技术行业的方式:

  1. Winners Will Win More: Existing big players like Facebook and Google have a massive advantage. They have tons of users, tons of their time, and war chests filled with both training data and money. Competing with these companies head on, creating the same training data they generate, is futile.

    赢家将赢得更多: Facebook和Google等现有的大型企业拥有巨大的优势。 他们有大量的用户,大量的时间,以及充斥着培训数据和金钱的作战宝箱。 与这些公司正面竞争,创建他们生成的相同培训数据是徒劳的。

  2. Successful Startups Will Create Unique Training Data: Challengers can negate much of Google and Facebook’s advantages by pursuing new frontiers of training data. This can involve mobile apps, but will often involve new hardware to bring RDAs to new contexts. Successful challengers might build such a beachhead and be acquired for it before they ever get to develop models (see: Nest). The hard part for these companies will be transitioning from developing a product that generates lots of unique, good training data to building unique RDAs to generate and maintain velocity.

    成功的初创企业将创建独特的培训数据:挑战者可以通过追求培训数据的新领域来否定Google和Facebook的许多优势。 这可能涉及移动应用程序,但通常会涉及新硬件,以将RDA带入新环境。 成功的挑战者可能会建立这样的滩头堡,并在开发模型之前被其抢先(请参阅:Nest)。 这些公司的难处将是从开发可生成大量独特,良好培训数据的产品过渡到构建独特的RDA以生成并保持速度。

  3. RDAs are a New Network Effect Model: As RDAs emerge and mature, companies and investors will better understand how RDAs can build business models with network effects. Once there’s a clear example, the same explosion of marketplace business (“Uber for X”) and social companies (“Facebook for X”) will occur for machine learning start ups.

    RDA是一种新的网络效应模型:随着RDA的出现和成熟,公司和投资者将更好地了解RDA如何建立具有网络效应的业务模型。 一旦有了一个明确的例子,对于机器学习初创企业来说,市场业务(“ Uber for X”)和社交公司(“ Facebook for X”)将发生同样的爆炸。

  4. Machine Learning Will Accelerate the Internet of Things: Hardware capabilities will stagnate but form factors will diversify. Computers will colonize every context that can fit sensors and network connectivity in search of training data.

    机器学习将加速物联网:硬件功能将停滞不前,但形式因素将多样化。 在搜索训练数据时,计算机将在适合传感器和网络连接的所有环境中进行殖民化处理。

翻译自: https://www.freecodecamp.org/news/the-business-implications-of-machine-learning-11480b99184d/

机器学习recall含义

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
机器学习中,查准率(precision)和查全率(recall)是用来评估分类模型性能的重要指标。查准率指的是在所有被模型预测为正例的样本中,实际为正例的比例。而查全率指的是在所有实际为正例的样本中,被模型正确预测为正例的比例。 通常情况下,查准率和查全率是相互矛盾的,提高查准率会导致查全率下降,反之亦然。这是因为在分类问题中,我们可以通过调整阈值来决定将样本划分为正例还是负例。当我们降低阈值时,会增加正例的数量,从而提高查全率,但可能会降低查准率。相反,当我们提高阈值时,会减少正例的数量,从而提高查准率,但可能会降低查全率。 为了综合考虑查准率和查全率,我们通常使用F1分数来评估模型的性能。F1分数是查准率和查全率的调和平均值,可以通过以下公式计算:F1 = (2 * precision * recall) / (precision + recall)。 因此,在机器学习中,我们可以使用查准率、查全率和F1分数来评估分类模型的性能。 #### 引用[.reference_title] - *1* *2* [【机器学习】精确率(precision)、召回率(recall)以及F1分数的概念讲解加python代码实现](https://blog.csdn.net/weixin_42163563/article/details/116697828)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [机器学习之——Precision与Recall](https://blog.csdn.net/educationer/article/details/99580979)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值