深度学习网络每一层维度_每个人都可以使用深度学习

最新推荐文章于 2022-05-13 19:10:29 发布

weixin_26630173

最新推荐文章于 2022-05-13 19:10:29 发布

阅读量586

点赞数

文章标签：深度学习人工智能机器学习 python tensorflow

原文链接：https://towardsdatascience.com/everyone-can-use-deep-learning-now-8d683f92bce7

版权

深度学习网络每一层维度

A year ago, a few of us started working on Cortex, an open source platform for building machine learning APIs. At the outset, we assumed all of our users—and all of the companies actually applying ML in production, for that matter—would be large companies with mature data science teams.

一年前，我们中的一些人开始研究Cortex ，这是一个用于构建机器学习API的开源平台。首先，我们假设所有用户-以及所有将ML实际用于生产中的公司-都是拥有成熟数据科学团队的大公司。

We were wrong.

我们错了。

Over the last year, we’ve seen students, solo engineers, and small teams ship models to production. Just as surprisingly, these users frequently deploy large, state of the art deep learning models for use in real world applications.

在过去的一年中，我们已经看到学生，单独工程师和小型团队将模型交付生产。出乎意料的是，这些用户经常部署大型的最先进的深度学习模型，以供在实际应用中使用。

A team of two, for example, recently spun up a 500 GPU inference cluster to support their application’s 10,000 concurrent users.

例如，由两个人组成的团队最近启动了500个GPU推理集群，以支持其应用程序的10,000个并发用户。

Not long ago, this kind of thing only happened at companies with large budgets and lots of data. Now, any team can do it. This transition is the result of many different factors, but one component, transfer learning, stands out as important.

不久前，这种情况仅发生在预算大，数据量大的公司中。现在，任何团队都可以做到。这种转变是多种因素共同作用的结果，但其中一项重要内容是转移学习。

什么是转学？ (What is transfer learning?)

Let’s start with a high-level explanation of transfer learning (if you’re already familiar, feel free to skip ahead to the next section).

让我们从迁移学习的高级解释开始(如果您已经很熟悉，请随时跳到下一部分)。

Broadly speaking, transfer learning refers to techniques for “transferring” the knowledge of a deep neural network trained for one task to a different network, trained for a related task.

从广义上讲，转移学习是指将经过一项任务训练的深度神经网络的知识“转移”到经过相关任务训练的另一网络的技术。

For example, someone might use transfer learning to take a model trained for object detection, and “fine tune” it to detect something more specific—like hot dogs—using a small amount of data.

例如，某人可能会使用转移学习来建立训练有素的对象检测模型，然后对其进行“微调”，以使用少量数据检测更具体的事物(例如热狗)。

These techniques work because of the architecture of deep neural nets. The lower layers of a network are responsible for more basic knowledge, while more task-specific knowledge is typically contained at the top layers:

这些技术之所以起作用，是因为深度神经网络的体系结构。网络的较低层负责更多的基础知识，而高层通常包含更多的特定于任务的知识：

With the lower layers already being trained, the higher layers can be fine tuned with less data. An object detection model like YOLOv4, for example, can be fine tuned to recognize something specific, like license plates, with a very small dataset (the below was fine tuned with < 1,000 images):

在已经训练了较低层的情况下，可以使用较少的数据对较高层进行微调。例如，可以对微不足道的对象检测模型(如YOLOv4)进行微调，以识别具有很小数据集的特定事物(如车牌)(以下微调后的图像数小于1,000)：

演示地址

The techniques for transferring knowledge between networks vary, but recently, there have been many new projects aimed at making this simpler. gpt-2-simple, for example, is a library that allows anyone to fine tune GPT-2 and generate predictions with a few Python functions:

在网络之间传输知识的技术各不相同，但是最近有许多新项目旨在简化这一过程。例如， gpt-2-simple是一个库，它允许任何人微调GPT-2并使用一些Python函数生成预测：

https://gist.github.com/caleb-kaiser/dd40d16647b1e4cda7545837ea961272

转移学习如何阻止机器学习 (How transfer learning unblocks machine learning)

Most teams aren’t blocked from using machine learning because of a lack of knowledge. If you’re building an image classifier, for example, there are many well-known models that accomplish the task, and modern frameworks make it fairly straightforward to train one.

大多数团队不会因为缺乏知识而被禁止使用机器学习。例如，如果要构建图像分类器，则有许多众所周知的模型可以完成任务，而现代框架使训练一个模型变得相当简单。

For most teams, machine learning is never considered as a realistic option because of its cost.

对于大多数团队而言，机器学习由于其成本而从未被视为现实的选择。

Let’s use GPT-2, the (until recently) best-of-its-kind language model from OpenAI, as an example.

让我们以GPT-2为例(直到最近)，它是来自OpenAI的同类最佳语言模型。

GPT-2's training cost alone is estimated to be over $40,000, assuming you use a public cloud. Beyond that cloud bill, GPT-2 was trained on 40 GB of text (over 20 million pages, conservatively). Scraping and wrangling that much text is a massive project in and of itself.

假设您使用公共云，仅GPT-2的培训费用就估计超过40,000美元。除了云法案之外，GPT-2还接受了40 GB的文本训练(保守地超过2000万页)。刮取和整理大量文本本身就是一个巨大的项目。

For most teams, this puts training their own GPT-2 out of reach. But what if you fine tuned it? Let’s look at a project that did that.

对于大多数团队而言，这使他们无法轻松地训练自己的GPT-2。但是，如果您对其进行了微调，该怎么办？让我们看一个完成该任务的项目。

AI Dungeon is a choose-your-own-adventure game, styled after old command line dungeon crawlers. The game works about as you’d expect—the player inputs commands, and the game responds by advancing their adventure, except in this case, the adventure is written by a GPT-2 model trained to write choose-your-own-adventure texts:

AI Dungeon是一款自行选择的冒险游戏，其风格遵循旧的命令行地牢爬虫。该游戏可以按您预期的方式运行-玩家输入命令，并且游戏通过推进冒险进行响应，但在这种情况下，冒险是由训练有素的GPT-2模型编写的，可以编写自己选择冒险的文字：

AI Dungeon was developed by a single engineer, Nick Walton, who fine tuned GPT with gpt-2-simple and text scraped from chooseyourstory.com. According to Walton, fine tuning GPT-2 took 30 MB of text and about 12 hours of training time on a DGX-1 — roughly $374.62 on AWS’s equivalent instance type, the p3dn.24xlarge.

AI Dungeon是由单个工程师Nick Walton开发的，他通过gpt-2-simple和来自choiceyourstory.com的文字对GPT进行了微调。根据Walton的说法，对GPT-2进行微调在DGX-1上花费了30 MB的文本和大约12个小时的培训时间-在AWS的等效实例类型p3dn.24xlarge上约为374.62美元。

While $40,000 in cloud bills and 40 GB of scraped text might be beyond most teams, $375 and 30 MB is doable even for the smallest projects.

虽然40,000美元的云账单和40 GB的抓取文本可能超出大多数团队的水平，但对于最小的项目而言，375美元和30 MB的价格也是可行的。

And the applications of transfer learning go beyond language models. In drug discovery, there often isn’t enough data on particular diseases to train a model from scratch. DeepScreening is a free platform that solves this problem, allowing users to upload their own datasets, fine tune a model, and then use it to screen libraries of compounds for potential interactions.

迁移学习的应用超越了语言模型。在药物发现中，通常没有足够的关于特定疾病的数据来从头训练模型。 DeepScreening是一个免费的平台，可以解决此问题，使用户可以上传自己的数据集，微调模型，然后使用它来筛选化合物库以进行潜在的相互作用。

Training a model like this from scratch would be beyond the resources of most individual researchers, but because of transfer learning, it is suddenly accessible to everyone.

从头开始训练这样的模型将超出大多数研究人员的能力，但是由于迁移学习，每个人都突然可以使用它。

新一代深度学习模型依赖于迁移学习 (The new generation of deep learning models relies on transfer learning)

It’s important to note that here that though my examples so far have focused on the economic benefits, transfer learning isn’t just a scrappy tool for small teams. Teams of all sizes use transfer learning to train deep learning models. In fact, new models are being released specifically for transfer learning.

重要的是要注意，尽管到目前为止，尽管我的示例都侧重于经济利益，但迁移学习不仅是小型团队的松散工具。各种规模的团队都使用转移学习来训练深度学习模型。实际上，新模型专门针对迁移学习而发布。

For example, OpenAI recently released GPT-3, the appropriately named successor to GPT-2. The initial demos are impressive:

例如，OpenAI最近发布了GPT-3，它是GPT-2的适当命名的后继者。最初的演示令人印象深刻：

Remember that when GPT-2 was first released, its raw size generated headlines. A 1.5 billion parameter model was unheard of. GPT-3, however, dwarfs GPT-2, clocking in at 175 billion parameters.

请记住，首次发布GPT-2时，其原始尺寸会引起头条新闻。闻所未闻的有15亿个参数模型。但是，GPT-3使GPT-2相形见,，其参数为1750亿个 。

Training a 175 billion parameter language model is beyond the scope of just about every company besides OpenAI. Even deploying a model that large is questionable. So, OpenAI broke their tradition of releasing open source, pretrained versions of new models, and instead released GPT-3 as an API—which, of course, enables users to fine tune GPT-3 with their own data.

除了OpenAI之外，几乎每家公司都无法训练一种1,750亿个参数语言模型。即使部署如此大的模型也是有问题的。因此，OpenAI打破了发布新模型的开源，预训练版本的传统，而是将GPT-3作为API发行了—当然，这使用户可以使用自己的数据微调GPT-3。

In other words, GPT-3 is so large that transfer learning isn’t just an economical way to train it for new tasks, it is the only way

换句话说，GPT-3如此之大，以至于转移学习不仅是培训新任务的经济方法，而且是唯一的方法

This transfer-learning-first approach is becoming increasingly common. Google just released Big Transfer, an open source repository of state of the art computer vision models. While computer vision models have typically remained smaller than their language model counterparts, they’re starting to catch up — the pretrained ResNet-152x4 was trained on 14 million images and is 4.1 GB.

这种以转移学习为主导的方法变得越来越普遍。 Google刚刚发布了Big Transfer ，它是最新的计算机视觉模型的开源存储库。尽管计算机视觉模型通常仍比语言模型小，但它们已经开始追赶-预先训练的ResNet-152x4接受了1400万张图像的训练，容量为4.1 GB。

As the name suggests, Big Transfer was built to encourage the use of transfer learning with these models. As part of the repository, Google has also provided the code to easily fine tune each model.

顾名思义，Big Transfer旨在鼓励在这些模型中使用转移学习。作为存储库的一部分，Google还提供了用于轻松微调每个模型的代码。

As the chart below shows, models are only getting bigger over time (GPT-3, were it charted here, would increase the chart’s size 10x):

如下图所示，模型只会随着时间的推移而变大(如此处所示，GPT-3会使图表的大小增加10倍)：

If this trend continues, and there are no signs that it won’t, transfer learning will be the primary way teams use cutting edge deep learning.

如果这种趋势持续下去，并且没有迹象表明这种趋势不会持续下去，那么转移学习将成为团队使用前沿深度学习的主要方式。

设计一个处理大型模型的平台 (Designing a platform to handle massive models)

We’re biased, but when we look at charts like the above, we immediately think “How are we going to deploy this?”

我们有偏见，但是当我们看上面的图表时，我们立即想到“我们将如何部署它？”

As models have gotten bigger, and as transfer learning has made them accessible to every team, the number of huge deep learning models going into production has shot up. Serving these models is a challenge—they require quite a bit of space and memory just to serve inference, and they typically can’t handle many requests at once.

随着模型变得越来越大，以及迁移学习使每个团队都可以使用它们，投入生产的大型深度学习模型的数量猛增。服务这些模型是一个挑战-它们仅需要大量的空间和内存来进行推理，并且通常无法一次处理许多请求。

Already, we’ve introduced major features to Cortex specifically because of these models (GPU/ASIC inference, request-based autoscaling, spot instance support), and we’re constantly working on more as models get bigger.

由于这些模型(GPU / ASIC推理，基于请求的自动缩放，现场实例支持)，我们已经为Cortex引入了主要功能，并且随着模型的不断扩大，我们将不断努力。

Still, the difficulty of the infrastructure challenges is minuscule compared to the potential of a world in which every engineer can solve problems using state of the art deep learning.

与世界上每个工程师都可以使用最先进的深度学习解决问题的潜力相比，基础设施挑战的难度仍然很小。