深度学习算法和机器学习算法_将深度学习算法即服务

深度学习算法和机器学习算法

现实世界中的DS (DS IN THE REAL WORLD)

So, You want to serve Deep Learning Algorithms as a service.

因此,您想将深度学习算法作为服务来提供。

You have a really cool algorithmic library written in Python and TensorFlow/Keras/Some other platform that requires running workloads on a GPU and you want to be able to serve it at scale and have it up and running fast.

您拥有一个使用Python和TensorFlow / Keras /其他平台编写的非常酷的算法库,该库要求在GPU上运行工作负载,并且希望能够大规模提供服务并使其快速启动并运行。

Celery is an open-source asynchronous task queue which is based on distributed message passing. After reading all possible blog posts and seeing all the Youtube videos about Celery, I decided it’s the right solution for the task at hand.

Celery是基于分布式消息传递的开源异步任务队列。 在阅读了所有可能的博客文章并查看了所有有关Celery的Youtube视频之后,我认为这是解决当前任务的正确解决方案。

Please welcome our main characters of the plot:

请欢迎我们剧情的主要人物:

  1. The API: Gets a request, creates a Celery async task, and puts it in a queue. (I recommend flask for this task, it’s light but extendable)

    API :获取请求,创建Celery异步任务,并将其放入队列中。 (我推荐烧瓶来完成这项任务,它很轻但是可以扩展)

  2. The message queue: AKA Celery’s Broker. Stores the tasks created by the API in queues. The best practice is to choose RabbitMQ.

    消息队列 :AKA芹菜经纪人。 将API创建的任务存储在队列中。 最佳实践是选择RabbitMQ。

  3. Workers: a python/celery process which we will run on a GPU and will take tasks from the queues. This is where all the heavy lifting gets done.

    Workers :一个python / celery进程,我们将在GPU上运行该进程,并从队列中提取任务。 这是所有繁重的工作完成的地方。

  4. The result’s backend: Will store the tasks return values. The best practice is to use redis which enables complicated workflows (one task that is dependant on another) without polling.

    结果的后端 :将存储任务的返回值。 最佳实践是使用redis,它无需轮询即可启用复杂的工作流(一个任务依赖于另一个任务)。

The best practice is to use Celery with RabbitMQ as the broker of the messages and with redis as the result’s backend, in order to use all the unique features that Celery can provide. We know software requirements often change faster than we expect, this should provide us with maximum flexibility so we can use even the most complicated features of Celery. When choosing RabbitMQ and redis, each new task is transformed into a message which Celery then posts to a queue in RabbitMQ, and each return value of a task performed by a worker will automatically be written back to redis (you can easily host RabbitMQ on GCP using “click to deploy”, and redis using AWS Elastic Cache).

最佳实践是将Celery与RabbitMQ一起用作消息的代理,并将redis作为结果的后端,以便使用Celery可以提供的所有独特功能。 我们知道软件需求的变化通常快于我们的预期,这应该为我们提供最大的灵活性,因此我们甚至可以使用Celery的最复杂功能。 选择RabbitMQ和redis时,每个新任务都会转换为一条消息,然后Celery将其发布到RabbitMQ中的队列中,并且工作人员执行的任务的每个返回值都会自动写回到redis(您可以轻松地在GCP上托管RabbitMQ使用“ 单击部署 ”,然后使用AWS Elastic Cache重做。

Once the message representing the task is in the queue, we need a GPU worker to compute it. The GPU worker will read a message from the queue and perform the task. For example, if it’s a computer vision algorithm, a worker will download the original image from AWS S3, manipulate it, and upload the new image back to S3. The URL of the image will be passed as part of the task.

代表任务的消息进入队列后,我们需要GPU工人对其进行计算。 GPU工作人员将从队列中读取一条消息并执行任务。 例如,如果是计算机视觉算法,则工作人员将从AWS S3下载原始图像,进行操作,然后将新图像上传回S3。 图片的网址将作为任务的一部分传递。

But wait, there’s a catch.

但是,等等,有一个陷阱。

There’s always a catch.
总有一个陷阱。

GPUs, are very expensive machines. One instance of P2.Xlarge in AWS costs over 2,000$ a month (3.06$ an hour while writing these lines) or ~600$ if it’s a spot instance. This obviously means we do NOT want them to be constantly up if there’s no need. They have to be turned on, on-demand, and then turned off. The thing is, Elastic Beanstalk has no feature of auto-scaling according to RabbitMQ queue metrics.

GPU是非常昂贵的机器。 AWS中的一个P2.Xlarge实例每月花费超过2,000 $(编写这些行时每小时花费3.06 $ ),如果是现货实例,则花费约600 $。 显然,这意味着我们不希望它们在不需要的情况下持续不断地运转。 必须先按需打开它们,然后再关闭它们。 事实是,Elastic Beanstalk没有根据RabbitMQ队列指标自动缩放的功能。

What are we gonna do?
我们该怎么办?

We have to write our own custom Auto Scaler. It’s a big name for a small Python script that runs and polls RabbitMQ for the number of tasks in the queue every 30 seconds. If there’re messages in the queue, it calls the AWS API and makes it start GPU workers accordingly.

我们必须编写自己的自定义Auto Scaler。 这是一个小型Python脚本的大名,该脚本运行并每30秒轮询RabbitMQ队列中的任务数。 如果队列中有消息,它将调用AWS API并相应地启动GPU工作器。

Each worker is booted with the docker container of the algorithmic repository (stored in ECR, Elastic Container Registry). Once the container is up and running, it connects to RabbitMQ and redis. It then takes a task from the queue and calculates it. The output is written by the worker to S3. If the task was completed successfully then the return value of the Celery task, is a json containing a URL to the output saved to S3 and its metadata. That return value is automatically saved to redis by Celery and also saved to Postgres DB. If the task failed to finish, an exception is saved to redis.

每个工作程序都使用算法存储库的docker容器启动(存储在ECR中,Elastic Container Registry中)。 一旦容器启动并运行,它将连接到RabbitMQ和Redis。 然后,它从队列中提取任务并进行计算。 工人将输出写入S3。 如果任务成功完成,则Celery任务的返回值是一个json,其中包含指向保存到S3的输出及其元数据的URL。 该返回值由Celery自动保存到Redis,也保存到Postgres DB。 如果任务未能完成,则将异常保存到redis。

Check out the diagram below to understand the architecture explained above:

查看下面的图以了解上述架构:

Image for post
Running Deep Learning Algorithms as a Service by Nir Orman
Nir Orman将深度学习算法作为服务运行

Sounds breezy so far? One of the main challenges using Celery is to configure it the right way.

到目前为止听起来很清风? 使用Celery的主要挑战之一是正确配置它。

Here’s a good configuration that’ll save you time and tears when you try to perform Deep Learning tasks at scale. Check it out below and then we’ll dive into every detail of it:

这是一个很好的配置,当您尝试大规模执行深度学习任务时,可以节省您的时间和精力。 在下面查看它,然后我们将深入研究它的每个细节:

from celery import Celery
from api.celery_jobs_app.celery_config import BROKER_URI, BACKEND_URI
APP = Celery(
'celery_app',
broker=BROKER_URI,
backend=BACKEND_URI,
include=['api.celery_jobs_app.tasks']
)
APP.conf.update({
'imports': (
'api.celery_jobs_app.tasks.tasks'
),
'task_routes': {
'calculate-image-task': {'queue': 'images-queue'}
}
},
'task_serializer': 'json',
'result_serializer': 'json',
'accept_content': ['json'],
'worker_prefetch_multiplier': 1,
'task_acks_late': True,
'task_track_started': True,
'result_expires': 604800, # one week
'task_reject_on_worker_lost': True,
'task_queue_max_priority': 10
})

Note: The configuration has been simplified in order to make it easier to understand.

注意 :配置已简化,以使其更易于理解。

Let’s break it down.

让我们分解一下。

Break it Down!
分解吧!

The first paragraph in the snippet is just some imports, trivial.

摘录中的第一段只是一些重要的输入。

The second paragraph defines the celery app itself, which has a broker and backend (As stated before, the best practice is to use RabbitMQ and redis).

第二段定义celery应用程序本身,该应用程序具有代理和后端(如前所述,最佳实践是使用RabbitMQ和Redis)。

The third paragraph updates the configuration of Celery, This is the interesting part.

第三段更新了Celery的配置,这是有趣的部分。

The ‘imports’ section says in which of our python packages Celery should look for tasks.

imports ”部分说明了Celery应该在哪个python软件包中查找任务。

The ‘tasks_routes’ part maps between the task’s name and the queue in which it should be stored. In the code snippet above, all tasks that are of type “calculate-image-task” will be pushed into a queue named “images-queue”. If you do not write which queue your task should be routed to, it will by default go to the default queue named ‘celery’. BTW, you can change the name of the default queue if you want by defining ‘task_default_queue’ property.

task_routes ”部分在任务名称和应存储任务的队列之间映射。 在上面的代码片段中,所有类型为“ calculate-image-task的任务都将被推送到名为“ images-queue的队列中 。 如果您没有写任务应该路由到哪个队列,则默认情况下它将进入名为“ celery”的默认队列。 顺便说一句,您可以通过定义'task_default_queue'属性来更改默认队列的名称。

FYI: The queue itself is automatically created on RabbitMQ once the first task is routed to it. Cool :)

仅供参考:一旦将第一个任务路由到队列,队列本身就会自动在RabbitMQ上创建。 酷:)

Cool.
凉。

task_serializer’: This is how tasks will be serialized once they are put in the queue and deserialized once they reach the worker. In the image processing case, we do not want the image itself to be serialized and deserialized, The best practice is to store it and only pass its location or URL. We’ll use json as the serializer.

' task_serializer ':这是将任务放入队列后进行序列化,并在到达工作线程后进行反序列化的方式。 在图像处理的情况下,我们不希望图像本身被序列化和反序列化。最佳实践是存储图像,仅传递其位置或URL。 我们将使用json作为序列化器。

result_serializer’: Keep in mind, if you declare the serialization type as json and return a result that is an object or an exception (which is the return type in case there was an exception which wasn’t caught) then your result serialization will throw an exception since any object that’s not a json would throw an exception of failing to serialize. You can always read more about serializers here.

' result_serializer ':请记住,如果您将序列化类型声明为json并返回作为对象或异常的结果(如果未捕获到异常,则返回类型),那么结果序列化将引发异常,因为任何不是json的对象都会引发无法序列化的异常。 您可以随时在此处阅读有关序列化器的更多信息。

accept_content’: A white-list of content-types/serializers to allow.

' accept_content ':允许的内容类型/序列化器白名单。

Tip: It’s not recommended to use ‘pickle’ serializer since it is known to have security issues. Since Celery version 4.0, json is actually the default option of serialization, but “Explicit is better than implicit” (The Zen of Python).

提示 :由于已知存在安全问题,因此不建议使用“ pickle”序列化程序。 从Celery版本4.0开始,json实际上是序列化的默认选项,但是“显式优于隐式”( Python的Zen )。

Image for post
The Zen of Python
Python的禅宗

worker_prefetch_multiplier’: The default of Celery is that each worker takes 4 tasks and computes all of them before it comes back for the next tasks. Their idea was to optimize network roundtrips. In our case, Deep Learning tasks tend to be long ones (way longer than the network time). This means we do not want a worker to take a bunch of tasks and perform them one after the other. We want each worker to take a single task at a time, and then come back to take the next task when it’s done with the previous one. That way, if one task requires a very long computation, other workers could work on the next tasks simultaneously, since as long as the first worker is not working on them, they are kept in the queue.

' worker_prefetch_multiplier ':Celery的默认设置是每个工作人员执行4个任务并计算所有任务,然后再返回下一个任务。 他们的想法是优化网络往返。 在我们的案例中,深度学习任务往往比较长(比网络时间长)。 这意味着我们不希望工人承担一堆任务并接连执行它们。 我们希望每个工作人员一次执行一项任务,然后在完成前一项任务后又回来执行下一项任务。 这样,如果一个任务需要很长的计算时间,则其他工作人员可以同时处理下一个任务,因为只要第一个工作人员不对它们进行处理,就将它们排在队列中。

task_acks_late’: By default, when a worker takes a task, the task will be “acked” just before its execution. In the case of Deep Learning tasks, which take a long time to compute, we would want them to be “acked” only after they are computed. This is especially useful when we use spot instances, which lower our average task price but may also be lost if there’s a shortage of GPU instances and our bidding price wasn’t competitive enough.

' task_acks_late ':默认情况下,当工作人员执行任务时,任务将执行之前被“确认”。 对于需要较长时间计算的深度学习任务,我们希望仅计算它们之后对其进行“确认”。 当我们使用竞价型实例时 ,这特别有用,它降低了我们的平均任务价格,但是如果GPU实例不足并且我们的竞标价格没有足够的竞争力,也可能会丢失。

task_track_started’: Good for tracking that a task has started, cause when your task is long-running, you want to know it’s no longer in the queue (which would be marked as ‘pending’.) I recommend using Flower as the monitor solution for Celery, which allows you to see exactly what’s the status of every task.

' task_track_started ':非常适合跟踪任务已开始,导致当您的任务长时间运行时,您想知道它不再在队列中(将被标记为“待处理”。)我建议使用Flower作为监视器Celery的解决方案,使您可以准确查看每个任务的状态。

result_expires’: by default, Celery keeps your results on redis only for 1 day. If you want it to be kept longer, define`result_expires` differently in the configuration file. I would recommend keeping it for at most 1 week and writing the results to a more organized DB that has a schema, such as PostgreSQL.

' result_expires ':默认情况下,Celery将结果保留在redis上仅1天。 如果您希望保留更长的时间,请在配置文件中以不同的方式定义`result_expires`。 我建议将其最多保留1周,然后将结果写入具有Schema的更有条理的DB中,例如PostgreSQL。

task_reject_on_worker_lost’: We’ll set this to True. When we use spot instances, there’s a chance a worker will be lost when a spot instance is taken away from us. We want the task to be put back into the queue and be computed by another worker. Be careful, if a worker was lost due to hardware errors like ‘out of memory’ etc., then the task will get partially calculated again and again in a loop since the worker will be lost every time it tries to compute it. If you see a task in an endless loop, this is the configuration you should be suspicious about.

' task_reject_on_worker_lost ':我们将其设置为True。 当我们使用竞价型实例时,当从我们那里拿走竞价型实例时,工人很可能会迷路。 我们希望将任务放回队列中,并由其他工作人员进行计算。 请注意,如果某个工作程序由于诸如“内存不足”等硬件错误而丢失,那么该任务将一次又一次地被部分计算,因为该工作程序每次尝试计算时都会丢失。 如果您看到一个无休止的任务,这是您应该怀疑的配置。

task_queue_max_priority’: This is where you can make sure that important tasks get done first. You can set a priority for every Celery task (by assigning it with some int representing its priority). If you set this property, you must also set it to your RabbitMQ queue, it does not get set automatically. If a task with priority enters a queue that doesn’t have the priority property, an exception will be thrown and the task will not enter the queue. This property is useful if you have a premium customer whose tasks should be computed first.

' task_queue_max_priority ':在这里您可以确保首先完成重要任务。 您可以为每个Celery任务设置优先级(通过为它分配一些表示其优先级的整数)。 如果设置了此属性,则还必须将其设置为RabbitMQ队列,它不会自动设置。 如果具有优先级的任务进入不具有priority属性的队列,则将引发异常,并且该任务也不会进入队列。 如果您有一个高级客户,应首先计算其任务,则此属性很有用。

If you’re thinking about using this property in order to prioritize fast running tasks over slow ones (such as long GPU computing tasks) then consider adding another group of workers, that are CPU workers and not expensive GPU workers. It would be cheaper and faster.

如果您正在考虑使用此属性来将快速运行的任务优先于慢速任务(例如长GPU计算任务),请考虑添加另一组工作线程,即CPU工作线程而不是昂贵的GPU工作线程。 这样会更便宜,更快。

As you can see in the architecture diagram at the top, You can also have workers running on a totally different cloud.

如您在顶部的架构图中所看到的,您还可以让工作人员在完全不同的云上运行。

A totally different cloud
完全不同的云

For example, you could run your workers on Azure AKS which is the Kubernetes of Azure. But that’s a totally different blog post.

例如,您可以在Azure AKS(Azure的Kubernetes)上运行您的工作程序。 但这是完全不同的博客文章。

Good luck serving your Deep Learning Algorithms with Celery, if you have any questions, feel free to contact me on LinkedIn.

祝您使用Celery为您的深度学习算法服务,如果您有任何疑问,请随时通过LinkedIn与我联系。

翻译自: https://towardsdatascience.com/serving-deep-learning-algorithms-as-a-service-6aa610368fde

深度学习算法和机器学习算法

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值