saas 困难_将不可信的Javascript作为SaaS运行非常困难。 这就是我驯服恶魔的方式。...

saas 困难

by Tim Nolet ??‍?

由蒂姆·诺莱特(Tim Nolet)??

作为SaaS来运行不受信任JavaScript很难。 这就是我驯服恶魔的方式。 (Running untrusted JavaScript as a SaaS is hard. Here’s how I tamed the demons.)

Imagine the following:

想象以下情况:

  • You have a Saas service that allows users to run server side Node.js code.

    您有一个Saas服务,该服务允许用户运行服务器端Node.js代码。
  • The code is executed on your servers.

    该代码在您的服务器上执行。
  • The code can download anything from the internet.

    该代码可以从Internet下载任何内容。
  • Any output generated by the code is made available to the user.

    该代码生成的任何输出均可供用户使用。

This is a performance and security nightmare. It is also the situation I found myself in when building a new tool for my new solo SaaS endeavour.

这是一场表演 安全 噩梦 在为我的新SaaS 独创性开发工具时 ,我也遇到了这种情况。

用例 (The use case)

One of the key features of my new tool is to let anonymous users run Puppeteer scripts in a sandbox environment. Puppeteer is a project by the Google Chrome team (22k Github stars ✨ ) that allows users to run Chrome headless, as in without a screen, and automate interactions with a web page.

我的新工具的主要功能之一是让匿名用户在沙盒环境中运行Puppeteer脚本。 Puppeteer是Google Chrome小组的一个项目(22k Github stars✨),该项目允许用户无头运行Chrome,就像没有屏幕一样,并自动进行与网页的交互。

This is very useful for testing, web scraping, monitoring and a whole bunch of other use cases. The new tool’s purpose is that users can quickly try out these scripts without the hassle of installing and running Puppeteer on their own machines. Very similar to the JSFiddle, CodePen and other code playgrounds out there.

这对于测试,Web抓取,监视以及许多其他用例非常有用。 新工具的目的是使用户可以快速尝试这些脚本,而不必在自己的计算机上安装和运行Puppeteer。 与JSFiddle,CodePen和其他代码游乐场非常相似。

Key here is that the user has full access to JavaScript and Node.js , can download whatever they want from the internet and that I (well…my servers) will run that code for them! ? Yikes!

这里的关键是用户可以完全访问JavaScript和Node.js,可以从互联网上下载他们想要的任何内容,而我(以及……我的服务器)将为他们运行该代码! ? kes!

可能出什么问题了? (What could possibly go wrong?)

Here are some of the ways people (my users) can screw things up with their bits of code:

人们(我的用户)可以使用以下几种方式来用他们的代码弄乱事情:

And, as we’re running a multi-tenant Saas, there are probably ways to hijack other sessions and peak into other peoples processes and code. Yes, it’s pretty nasty.

而且,当我们运行一个多租户Saas时,可能存在劫持其他会话并进入其他人的流程和代码的方法。 是的,这很讨厌。

Whether this is due to malicious intent or just by writing buggy code doesn’t really matter. The end result is either slow/dead servers, your (and possibly other users) credentials on the street, and just a general bad time.

究竟是由于恶意的意图还是仅通过编写错误的代码,都没有关系。 最终结果是服务器速度慢/死机,您(以及可能的其他用户)在街上的凭证,以及一般的糟糕时光。

层,层……层! (Layers, layers…layers!)

The solution to this problem that I’ve come up with is as follows.

我想出的解决此问题的方法如下。

A request to run some untrusted code is first rate limited at (1), after which it is put into an AWS SQS message queue at (2). Messages are picked up by what I call a launcher process (3) which executes the work. This is a typical fan out / master-worker pattern. The launcher preps and launches a Docker container (4) which in turn executes the user’s code inside a Node.js VM2 “soft container” (5). Let’s look at each of these steps in detail.

运行某些不受信任的代码的请求的优先级限制为(1),然后将其放入(2)的AWS SQS消息队列中。 我称之为启动程序(3)的消息被拾取,该进程执行工作。 这是典型的扇出/主工人模式。 启动器准备并启动一个Docker容器(4),该容器随后在Node.js VM2“软容器”(5)中执行用户的代码。 让我们详细了解这些步骤。

1.速率限制 (1. Rate limiting)

To avoid DDOS scenarios, where users pummel your API with HTTP requests, we need to first add rate limiting, also called request throttling. This is even more important in my specific scenario. Each lightweight HTTP request can potentially trigger a much heavier background job. (Puppeteer spins up a full Chrome browser.)

为了避免DDOS场景,用户在其中使用HTTP请求来敲击您的API,我们需要首先添加速率限制,也称为请求限制。 在我的特定情况下,这一点甚至更为重要。 每个轻量级HTTP请求都可能触发大量的后台作业。 (Puppeteer启动了完整的Chrome浏览器。)

This means the API server could become unresponsive but also that the job servers could start being overwhelmed. As I’m planning to add autoscaling functionality to the job servers, more job requests equals more resource usage. This would result in ballooning servers cost. Not good for your poor solo-dev startup owner.

这意味着API服务器可能会变得无响应,而且作业服务器可能会开始不堪重负。 当我计划向作业服务器添加自动缩放功能时,更多的作业请求等于更多的资源使用。 这将导致服务器成本激增。 不适合您可怜的solo-dev初创公司所有者。

There are many rate limiting frameworks and plugins out there. As I’m using the Hapi.js framework, I opted for the hapi-rate-limit plugin. And there’s not much else to say about it. Install it, add it to the API routes you want protected et voilà, it just works. This plugins gives you some great options that cover a lot of rate limiting scenarios:

有很多速率限制框架和插件。 在使用Hapi.js框架时,我选择了hapi-rate-limit插件。 关于这件事,没有太多要说的了。 安装它,将其添加到您想要保护的API路由中它就可以正常工作。 该插件为您提供了一些不错的选择,它们涵盖了许多速率限制方案:

  • IP and user white listing.

    IP和用户白名单。
  • Limiting per user, per path or both.

    限制每个用户,每个路径或两者。
  • X-Forward-For awareness, handy for running behind a load balancer.

    X-Forward-For了解,在负载均衡器后面运行时很方便。

Furthermore, the plugin adds a couple of HTTP response headers to each request, showing the status of the rate limiting algorithm.

此外,该插件向每个请求添加了几个HTTP响应标头,以显示速率限制算法的状态。

In the image above, you can see that I’ve made one request. This request is subtracted from the maximum amount of requests I can make per my UserPathLimit. This is defined as the number of total requests that can be made on a given path per user per period. This period resets after a while.

在上图中,您可以看到我已经提出了一个请求。 从我每个UserPathLimit可以请求的最大数量中减去此请求。 这定义为每个时间段内每个用户可以在给定路径上发出的总请求数 。 一段时间后,此时间段将重置。

What happens if someone hits the rate limit? We put them in the naughty corner for a bit and serve them cheese. The motto being that the customer is always right but he/she should not be allowed to trash your system.

如果有人达到了速率限制会怎样? 我们把它们放在顽皮的角落里,然后给他们上奶酪。 座右铭是客户永远是对的,但不应允许他/她破坏您的系统。

2.异步后台作业 (2. Async background jobs)

Delegating the the actual running of the untrusted code into background jobs is a pretty common pattern. You don’t want to tie up your HTTP server’s request cycle with long running jobs. The added benefit here is that if anything bad happens while running the untrusted code, it will not take down or otherwise compromise your customer facing API server.

将不可信代码的实际运行委托给后台作业是一种很常见的模式。 您不想将HTTP服务器的请求周期与长时间运行的工作联系在一起。 这样做的附加好处是,如果在运行不受信任的代码时发生任何不良情况,则不会降低或损害您面向客户的API服务器。

In my solution, the HTTP POST request that contains the code to run is unwrapped and dumped into an SQS message queue. The message sits there until a launcher node picks the message up and attempts to process it. This is where the role of the API server ends. The motto being to never bother your user facing API server with long running and potentially dangerous requests.

在我的解决方案中,包含要运行的代码的HTTP POST请求被解包并转储到SQS消息队列中。 消息坐在那里,直到启动器节点拾取消息并尝试对其进行处理。 API服务器的角色到此结束。 座右铭是永远不会因为长时间运行和潜在危险的请求而打扰面向用户的API服务器。

3.进程隔离:将启动器和运行器分开 (3. Process isolation: splitting launcher and runner)

Late into building this architecture, I realised I needed to split the launcher and runner code AND stick the runner into a Docker container. The reason for this becomes evident when we look at what the launcher/runner combo needs to do.

在构建此架构的后期,我意识到我需要拆分启动器和运行程序代码,并将运行程序粘贴到Docker容器中。 当我们查看启动器/运行器组合需要执行的操作时,其原因显而易见。

These are the tasks assigned to the launcher process:

这些是分配给启动器进程的任务:

  • Listen to SQS, unwrap the message and extract untrusted code from it.

    收听SQS,解包消息并从中提取不受信任的代码。
  • Write the code to a dedicated work directory.

    将代码写入专用的工作目录。
  • Launch a Docker container (the runner) mounted with the work directory using the excellent Dockerode.

    使用出色的Dockerode启动安装了工作目录的Docker容器(运行程序)。

  • Read the output from the runner and relay message via AWS IOT to the waiting user.

    将运行程序的输出读取并通过AWS IOT将消息中继给等待的用户。
  • Monitor the state of the running container.

    监视正在运行的容器的状态。
  • Upload any screenshots to S3.

    将所有屏幕截图上传到S3。
  • Pass a final message after the run has finished to the database.

    运行完成后,将最终消息传递给数据库。
  • Cleanup files, temporary work dir and other debris.

    清理文件,临时工作目录和其他碎片。

To perform all this work, the launcher has quite a lot of privileges and needs access to a lot of credentials like AWS services, database access, file system access. All of these are attack vectors that are easy to exploit by anyone doing a console.log(.../configuration/config.json) , console.log(process.env) or something similar.

为了执行所有这些工作,启动器具有很多特权,并且需要访问许多凭证,例如AWS服务,数据库访问,文件系统访问。 所有这些都是攻击媒介,任何使用console.log(.../configuration/config.json)console.log(process.env)或类似工具的人都容易利用。

Again…yikes! ?

再次……赞! ?

This is why the untrusted code should never run in the same context as the launcher.

这就是为什么不受信任的代码永远不要在启动器所在的上下文中运行的原因。

Stability is also increased by splitting launcher and runner. If the launcher would hang or die, the whole system effectively loses capacity. Something like the PM2 process monitor would of course restart the process, but there would certainly be noise and friction due to these crashes.

通过分离发射器和转轮也可以提高稳定性。 如果发射器挂起或死亡,则整个系统实际上会失去容量。 当然,像PM2过程监视器之类的东西会重新启动过程,但是由于这些崩溃,肯定会有噪音和摩擦。

Ergo, in the current design the launcher is never directly exposed to any untrusted code. The motto being to always protect the server code, even at the expense of the user’s code.

因此,在当前设计中,启动器永远不会直接暴露于任何不受信任的代码中。 座右铭是始终保护服务器代码,即使以牺牲用户代码为代价。

The runner is a bit weird, let’s have a look.

跑步者有点怪,让我们看看。

4.使用Docker的操作系统沙箱 (4. OS Sandboxing with Docker)

The runner part of this equation is started by the launcher kicking of a Docker container which holds the runner process. The runner then executes the users untrusted code. Using a Docker container brings a couple of benefits:

该方程式的运行器部分是通过启动器的启动来启动的,该容器保存了运行器进程。 然后,运行程序执行用户不受信任的代码。 使用Docker容器会带来很多好处:

  1. The Node process has no access to the parent host. All environment variables, files etc. are not accessible so there is no snooping into sensitive files. Actually the reading of files is not possible but more on that later.

    节点进程无权访问父主机。 所有环境变量,文件等均不可访问,因此不会窥探到敏感文件。 实际上,无法读取文件,但稍后会更多。
  2. Job isolation: jobs from multiple users run on one machine and we want to at all times avoid any possibility of “cross pollination”.

    作业隔离:来自多个用户的作业在一台计算机上运行,​​我们希望始终避免“异花授粉”的任何可能性。
  3. Easy cleanup: every container is destroyed when it finishes running, together with all the horrible downloads, code and whatever malicious bits and bobs it dragged in.

    易于清理:每个容器在运行完毕时都会被销毁,连同所有可怕的下载内容,代码以及所拖入的所有恶意片段和浮标。

Docker in general provides pretty in depth security tweaking by using the --cap-add flags described in the Runtime privilege and Linux capabilities docs. I was happy to not have to dive into the horrible mess that is selinux…

Docker通常使用Runtime特权和Linux功能文档中描述的--cap-add标志提供了相当深入的安全性调整。 我很高兴不必陷入可怕的selinux混乱之中……

Outside of these security benefits, the Docker container also makes shipping and testing a bit easier. Getting Puppeteer to run inside a Docker environment was a bit of a challenge, requiring a lot of extra packages but there are some excellent guide lines that should help with most Debian / Ubuntu based distro’s.

除了这些安全优势之外,Docker容器还使运输和测试更加容易。 要让Puppeteer在Docker环境中运行是一个挑战,需要大量额外的软件包,但是对于大多数基于Debian / Ubuntu的发行版,有一些出色的指导方针会有所帮助。

5. VM2的节点沙箱 (5. Node Sandboxing with VM2)

The runner-inside-Docker solution is effectively a jail. But we are still allowing the inmates to use all the tools the Node.js standard library gives them to poke around. Would it not be better to severely thin out the toolbox? Yes, it is, and the first stop is Node VM.

Runner-inside-Docker解决方案实际上是监狱。 但是我们仍然允许囚犯使用Node.js标准库提供给他们的所有工具。 严格减薄工具箱会更好吗? 是的,第一站是Node VM。

Node VM is part of the standard Node distribution and provides sandboxing capabilities within the V8 engine: it just has a very limited interpretation of the the term “sandbox”, as in you can break out of it very easily and you can add packages and do whatever damage you want. Admittedly, the Node guys put this in big fat letters in the docs:

Node VM是标准Node发行版的一部分,并在V8引擎中提供沙盒功能:它对术语“沙盒”的解释非常有限,因为您可以很容易地摆脱它,可以添加软件包并执行无论您想要什么损害。 诚然,Node家伙在文档中用粗体字写了出来:

Luckily, there is VM2 a semi offshoot that is built to clamp down on untrusted code and the things it can run. Its main party tricks are that you can white list what modules the code injected into the VM has access to.

幸运的是, VM2有一个半分支,旨在抑制不受信任的代码及其可以运行的内容。 它的主要诀窍是您可以将白名单注入到VM中的代码可以访问哪些模块。

For example, you could white list just fs.write() but not fs.read() . Or you can block the usage of process to avoid the dreaded process.exit() or process.env . This is pretty amazing and full credit goes to @patricksimek

例如,你可以白名单只是fs.write()而不是fs.read() 或者,您可以阻止使用process以避免可怕的process.exit()process.env 。 这真是了不起了,对@patricksimek的贡献全归功于

External packages can also be whitelisted, giving you the option to allow the use of popular packages like lodash or other utility libraries without giving users access to npm install .

外部软件包也可以列入白名单,这使您可以选择使用流行的软件包,例如lodash或其他实用程序库,而无需允许用户访问npm install

We are now finally at the point where the untrusted code is executed. Using VM2 this is as simple as invoking the run() method with a stringified version of the untrusted code.

现在,我们终于可以执行不受信任的代码了。 使用VM2,就像使用不可信代码的字符串化版本调用run()方法一样简单。

vm.run(untrustedCode)  .then(output => {    console.log(output)})

We have however one problem left. How do we get output back to the user? We are not allowing the runner inside the VM2 process inside the Docker container to have any access to a message bus or anything else outside of its context. The process is also decoupled from the launcher process, so we cannot use a simple callback.

但是,我们还有一个问题。 我们如何将输出返回给用户? 我们不允许Docker容器内的VM2进程内部的运行程序访问消息总线或上下文以外的任何内容。 该过程也与启动器过程分离,因此我们不能使用简单的回调。

At this moment, I solved this problem by allowing the runner to only write logging to stdout and to write images to a shielded of temporary directory which gets erased after running.

此时,我通过允许运行程序仅将日志记录写入stdout并将图像写入屏蔽的临时目录(此目录在运行后会被擦除)来解决了此问题。

This means the launcher reads the stdout of the runner, basically parsing a long string and chopping out useful data based on prepended and appended control codes. This way the data is sanitized and passed into the upstream channels. Image files are read from disk and directly pushed to S3, taking into account file size and possible file corruption.

这意味着启动器读取跑步程序的标准输出,基本上是分析长字符串并根据前置和附加的控制代码切出有用的数据。 这样,数据便被清除并传递到上游通道。 考虑到文件大小和可能的文件损坏,从磁盘读取图像文件并将其直接推送到S3。

结论 (Conclusion)

Running untrusted code is a bit like building a medieval castle. It is not about one, unbreakable magic gate, one deep moat or one high tower that takes care of all your woes. It’s about layers of solutions that are annoying enough to scare of intruders and catch the mistakes of the layer above or below it.

运行不受信任的代码有点像建造一座中世纪城堡。 它不是要处理所有麻烦的一扇牢不可破的魔法门,一堵深mo沟或一栋高塔。 它涉及的解决方案层很烦人,足以吓倒入侵者并捕获其上方或下方的层的错误。

P.S. If you liked this article, please show your appreciation by clapping ? below and follow me on Twitter! But wait, there’s more!

附言:如果您喜欢本文,请通过鼓掌表示赞赏。 下面并在Twitter上关注我! 但是 ,还有更多!

I’m building an active monitoring solution for developers and startups https://checklyhq.com

我正在为开发人员和初创公司构建一个主动的监视解决方案https://checklyhq.com

Cray cray! ?

克雷克雷! ?

翻译自: https://www.freecodecamp.org/news/running-untrusted-javascript-as-a-saas-is-hard-this-is-how-i-tamed-the-demons-973870f76e1c/

saas 困难

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值