通化红灯_我们如何构建廉价,可扩展的架构来对世界进行卡通化!

通化红灯

A lot of people were interested in the architecture behind Cartoonizer. So Tejas and I (Niraj) have tried explaining the process we employed to make it work. Kudos to Algorithmia for powering our video inference pipeline. 😇

许多人对 Cartoonizer 背后的体系结构感兴趣 因此, Tejas 和我( Niraj )试图解释我们用来使其工作的过程。 感谢Algorithmia为我们的视频推理流程提供支持。 😇

In today’s fast paced world, ML experts are expected to put on multiple hats in the ML workflow. One of the critical tasks in the workflow is to serve the models in production! This seemingly important piece in the pipeline tends to get overlooked thus faltering to provide value to the customers.

在当今快速发展的世界中,机器学习专家有望在机器学习工作流程中脱颖而出。 工作流中的关键任务之一就是为生产中的模型提供服务! 流水线中看似重要的部分往往被忽视,因此步履蹒跚,无法为客户提供价值。

The engineering discipline clearly can’t exist without the work of the (data) scientists — MLE (Machine Learning Engineering) is built on the work of data science — but the engineering is how the the science gets applied to the world.- Caleb Kaiser

没有(数据)科学家的工作,工程学科显然是不存在的-MLE(机器学习工程)是建立在数据科学的基础上的,但是工程学是科学如何应用于世界的方法。- Caleb Kaiser

This article is going to explain our attempt to not only serve a computationally intensive GAN model in production inexpensively but also scale it horizontally.

本文将解释我们的尝试,该尝试不仅可以廉价地在生产中使用计算密集型GAN模型,而且可以水平扩展它。

ML Woes😅 (ML Woes 😅)

If you are familiar with hosting a REST API, it warrants these basic things -

如果您熟悉托管REST API,则可以保证以下基本内容-

  1. A fast protoype in Flask

    Flask的快速原型
  2. Setting up an environment

    搭建环境
  • GCP or AWS instance

    GCP或AWS实例
  • System dependencies as well as python specific dependencies (pip)

    系统依赖性以及python特定依赖性(pip)
  • Proxy server

    代理服务器
  • Multiple workers to scale horizontally

    多名工人水平扩展

As an ML engineer, the 2nd point is tedious and less than satisfactory in terms of scalability and server costs. Gone are those days when the responsibility of maintaining servers rests on your shoulders! I am talking about outsourcing and automating the 2nd point completely. Enter Google Cloud Run!

作为ML工程师,第二点很繁琐,并且在可伸缩性和服务器成本方面并不令人满意。 那些日子,维护服务器的责任就落在了肩上! 我说的是外包和完全自动化第二点。 输入Google Cloud Run!

运行云运行! (Run Cloud Run!)

Before I go into how the architecture works, I would like to indulge you in some user statistics and give you a feel of the traffic we could cater to with minimal costs!

在介绍该体系结构的工作原理之前,我想向您介绍一些用户统计信息,并让您感受到我们可以用最少的费用满足的流量!

我们收到的流量😮 (Traffic we received 😮)

Since we launched our demo web-app on 26th July, we have had around 12,000 users in less than 3 weeks! 6,000 of those coming in the first 4 days — most of the traffic coming from our Reddit post and TheNextWeb article which was then picked up by other blogs from various countries as well.

自7月26日启动演示网络应用程序以来,我们在不到3周的时间内拥有约12,000名用户! 前6,000 中有6,000访问者 -大部分流量来自我们的Reddit帖子TheNextWeb文章 ,随后来自各个国家/地区的其他博客也将其吸引

At any given point during this peak time, we had around 50+ users requesting our image and video services.

在这个高峰时间的任何时候,我们都有大约50多个用户在请求我们的图像和视频服务。

Image for post
Users over a period of over 2 weeks
用户超过2周

我们已经准备好交通 (Traffic we are ready for 💪)

Out of the box, Cloud Run lets us spawn 1000 instances based on incoming traffic. It defaults to a max of 80 requests per container instance. So ideally we can cater to 80,000 requests/second!

开箱即用,Cloud Run使我们可以根据传入流量生成1000个实例。 默认情况下,每个容器实例最多80个请求。 因此, 理想情况下,我们可以每秒处理80,000个请求!

BUT since the cartoonization process was already heavy, we limited our program to 8 workers per instance. That means one instance was limited to 8 concurrent requests. The 9th request will be routed to a second instance if at all. So essentially we can cater to 8000 requests/sec!

但是,由于卡通化过程已经很繁重,因此我们将程序的实例数限制为8个实例。 这意味着一个实例仅限于8个并发请求。 第九个请求将被路由到第二个实例。 因此,基本上我们可以满足8000个请求/秒

在CPU或GPU上进行视频处理? 🎥 (Video processing on CPU or GPU? 🎥)

Our unique selling point was putting together an architecture which would allow us to serve videos along with images at minimal cost. Videos are nothing but a collection of images (frames) and we have to cartoonize each frame.

我们独特的卖点是建立一种体系结构,使我们能够以最低的成本提供视频和图像。 视频不过是图像(帧)的集合,我们必须将每个帧卡通化。

Cartoonized at 30 fps with 720p resolution
以720p分辨率以30 fps进行卡通化

On a 8-core i7 CPU, it takes around 1 second to cartoonize a 1080p resolution image. Unfortunately, Google Cloud Run provides only a maximum of 2 vCPUs which brings up the time to 3 seconds/image! You can imagine the horror of processing a video on that kind of compute! A 10 second video at 30 frames per second (fps) would take 15 minutes! 😱

在8核i7 CPU上,卡通化1080p分辨率图像大约需要1秒钟。 不幸的是,Google Cloud Run最多只能提供2个vCPU,这会使时间缩短到3秒/图像 ! 您可以想象在这种计算下处理视频的恐怖! 以每秒30帧(fps)的速度播放10秒的视频需要15分钟 ! 😱

We employed 2 techinques to bring down the video inference time.

我们雇用了2名技术专家来缩短视频推理时间。

  • Reduce the resolution of image to 480p: This essentially lessened the load per frame without any conspicuous change in the quality. This helped us reach 1 second/image inference time.

    将图像分辨率降低到480p :这实质上减轻了每帧的负载,而质量没有明显变化。 这帮助我们达到了1秒/图像推断时间。

  • Decrease the frame rate of the video: We downplayed it from 30 fps to 15 fps which drastically reduced our video computation time.

    降低视频的帧频 :我们将视频的帧率从30 fps降低到15 fps,从而大大减少了视频计算时间。

We experimented with tensorflow lite for weight quantization to speed up inference pipeline, but we faced issues with serving models for dynamic input image sizes. While it worked for a fixed image size, we didn’t find the results to latency and computation tradeoff justified.

我们使用tensorflow lite进行了权重量化实验,以加快推理流程,但是我们面临着动态输入图像尺寸的服务模型问题。 虽然它适用于固定的图像大小,但我们没有发现延迟和计算折衷的结果是合理的。

Even by downplaying resolution and reducing frames per second, video cartoonization was taking 2.5 minutes for a 10 second video. It was still too high considering user experience. And hence, converting a video into a cartoon required some additional artillery.

即使降低分辨率并减少每秒帧数,视频卡通化也需要2.5分钟才能播放10秒的视频。 考虑到用户体验,它仍然太高。 因此,将视频转换为卡通需要额外的大炮。

GPU上的速度优势⏩ (Speed advantage on GPU ⏩)

Using a GPU gave a 10x increase in speed for an image. Inference time came down to 50 ms/image. That meant we could cartoonize and serve a 10 second video in 7 seconds! Now we are in business.😉

使用GPU可使图像速度提高10倍。 推理时间降至50毫秒/图像。 这意味着我们可以将其卡通化并在7秒钟内播放10秒的视频! 现在我们经商了。😉

Or so we thought. There were 2 questions that haunted us -

还是我们认为。 有两个困扰我们的问题-

  1. How do we scale the GPU service to keep up with Cloud-Run-like scaling?

    我们如何扩展GPU服务以跟上类似Cloud-Run的扩展?
  2. How can we do this cost effectively?

    我们如何才能有效地做到这一点?

One way would have been to deploy the model on a Google Compute Engine instance as an API but that defeated the purpose of Cloud Run scaling. All the concurrent requests would queue up and GPU would become a bottleneck in our pipeline. Also, running an instance 24/7 is not cheap 💰

一种方法是将模型作为API部署在Google Compute Engine实例上,但这违反了Cloud Run扩展的目的。 所有并发请求将排队,GPU将成为我们管道中的瓶颈。 另外,运行实例24/7并不便宜💰

如何使用GPU进行扩展(价格便宜)? (How to scale using a GPU (inexpensively)?)

Cloud Run being a managed stateless container service cannot afford to provide GPU support. Hence, we outsourced our GPU computation to a service called Algorithmia instead of renting out an expensive Google Compute Engine server. The reason is two-folds -

Cloud Run作为托管的无状态容器服务无法提供GPU支持 。 因此,我们将GPU计算外包给名为Algorithmia的服务,而不是租用昂贵的Google Compute Engine服务器。 原因有两个:

  1. First of all, it boasts the ability to scale deployed Deep Learning models in production! It can handle 24 concurrent requests per GPU-computation instance. Additionally, it can automatically scale to 80 instances at any given point in time.

    首先,它具有在生产中扩展已部署的深度学习模型的能力! 每个GPU计算实例可以处理24个并发请求。 此外,它可以在任何给定的时间点自动扩展到80个实例。

  2. Since we were building it as a weekend hack, we wanted to justify our time invested; here Algorithmia surprised with a superb flexible platform to easily deploy ML models with GPU support and configured environments without the hassle of knowing whether the GPU driver and tensorflow version are compatible or not if you know what I mean. :P

    由于我们将其构建为周末黑客,因此我们希望证明自己投入的时间是合理的; 在这里,Algorithmicia惊讶地以其出色的灵活平台轻松部署具有GPU支持和已配置环境的ML模型,而又不用知道我的意思是否知道GPU驱动程序和tensorflow版本是否兼容。 :P

This meant we could satisfy around 1500+ video requests concurrently AND comparatively inexpensively!

这意味着我们可以同时且相对便宜地满足大约1500多个视频请求!

学问 (Learnings)

一个实例有80个请求! 🤔 (80 requests on one instance! 🤔)

Our Flask API was coded to handle 8 concurrent requests by spawning 8 workers using Gunicorn BUT we didn’t change the default setting of 80 requests per instance in Cloud Run.

我们的Flask API编码为通过使用Gunicorn But产生8个工作程序来处理8个并发请求,我们没有在Cloud Run中更改每个实例80个请求的默认设置。

This meant only one instance was spawned the whole time and perhaps user requests queued up on our Flask server. The downside being user had to wait longer to get their cartoonized images and videos ☹️

这意味着整个时间内仅生成一个实例,也许用户请求在我们的Flask服务器上排队。 缺点是用户不得不等待更长的时间才能获得卡通化的图像和视频☹️

The upside being we were billed for only one instance. Lower the number of requests per instance, greater the number of instances spawned thus increasing your billable instance time. But rerouting requests to separate instances means better and faster user satisfaction. 😉

有利的一面是,我们只被收取了一次费用。 每个实例的请求数越少,产生的实例数越多,从而增加了可计费实例的时间。 但是将请求重新路由到单独的实例意味着更好,更快的用户满意度。 😉

Image for post
Highest number of requests per second (2 req/sec)
每秒最高请求数(2 req / sec)

未来范围 (Future scope)

We envision this being used for the following —

我们设想将此用于以下目的-

  • Churn out quick prototypes or sprites for animes, cartoons and games

    制作出动画,卡通和游戏的快速原型或精灵

  • Since it subdues facial features and information in general, it can be used to generate minimal art

    由于它通常可柔化面部特征和信息,因此可用于产生最少的艺术效果

  • Games can import short cut scenes very easily without using motion-capture

    游戏无需使用运动捕捉即可非常轻松地导入捷径场景

  • Can be modelled as an assistant to graphic designers or animators.

    可以建模为图形设计师或动画师的助手。

If you have something interesting to demo, hit us up!

如果您有什么有趣的演示,请联系我们!

Code for the webapp demo is available on Github! Try the demo here!

Github上提供了webapp演示的代码! 在这里尝试演示!

翻译自: https://towardsdatascience.com/how-we-built-an-inexpensive-scalable-architecture-to-cartoonize-the-world-8610050f90a0

通化红灯

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值