websockets_一百万个WebSockets和Go

websockets

by Sergey Kamardin

通过谢尔盖·卡玛丁(Sergey Kamardin)

一百万个WebSockets和Go (A Million WebSockets and Go)

Hi everyone! My name is Sergey Kamardin and I’m a developer at Mail.Ru.

嗨,大家好! 我叫Sergey Kamardin,我是Mail.Ru的开发人员。

This article is about how we developed the high-load WebSocket server with Go.

本文介绍了如何使用Go开发高负载WebSocket服务器。

If you are familiar with WebSockets, but know little about Go, I hope you will still find this article interesting in terms of ideas and techniques for performance optimization.

如果您熟悉WebSockets,但对Go知之甚少,希望您仍然会发现本文对性能优化的思想和技术很有趣。

1.简介 (1. Introduction)

To define the context of our story, a few words should be said about why we need this server.

为了定义我们故事的背景,应该说几句关于我们为什么需要此服务器的信息。

Mail.Ru has a lot of stateful systems. User email storage is one of them. There are several ways to keep track of state changes within a system— and about the system events. Mostly this is either through periodic system polling or system notifications about its state changes.

Mail.Ru有很多状态系统。 用户电子邮件存储是其中之一。 有几种方法可以跟踪系统中的状态更改以及有关系统事件的信息。 通常,这是通过定期系统轮询或有关其状态更改的系统通知来完成的。

Both ways have their pros and cons. But when it comes to mail, the faster a user receives new mail, the better.

两种方式各有利弊。 但是在邮件方面,用户收到新邮件的速度越快越好。

Mail polling involves about 50,000 HTTP queries per second, 60% of which return the 304 status, meaning there are no changes in the mailbox.

邮件轮询每秒涉及大约50,000个HTTP查询,其中60%返回304状态,这意味着邮箱中没有任何更改。

Therefore, in order to reduce the load on the servers and to speed up mail delivery to users, the decision was made to re-invent the wheel by writing a publisher-subscriber server (also known as a bus, message broker, or event-channel) that would receive notifications about state changes on the one hand, and subscriptions for such notifications on the other.

因此,为了减少服务器上的负载并加快向用户的邮件传递,决定通过编写发布者-订阅者服务器(也称为总线,消息代理或事件-渠道),一方面可以接收有关状态更改的通知,另一方面可以订阅此类通知。

Previously:

先前:

Now:

现在:

The first scheme shows what it was like before. The browser periodically polled the API and asked about Storage (mailbox service) changes.

第一个方案显示了以前的样子。 浏览器会定期轮询API,并询问有关存储(邮箱服务)的更改。

The second scheme describes the new architecture. The browser establishes a WebSocket connection with the notification API, which is a client to the Bus server. Upon receipt of new email, Storage sends a notification about it to Bus (1), and Bus to its subscribers (2). The API determines the connection to send the received notification, and sends it to the user’s browser (3).

第二种方案描述了新的体系结构。 浏览器使用通知API建立WebSocket连接,该API是Bus服务器的客户端。 收到新的电子邮件后,Storage会向Bus(1)发送有关该电子邮件的通知,并向其订户(2)发送Bus的通知。 API确定连接以发送接收到的通知,并将其发送到用户的浏览器(3)。

So today we’re going to talk about the API or the WebSocket server. Looking ahead, I’ll tell you that the server will have about 3 million online connections.

因此,今天我们将讨论API或WebSocket服务器。 展望未来,我将告诉您服务器将具有约300万个在线连接。

2.惯用方式 (2. The idiomatic way)

Let’s see how we would implement certain parts of our server using plain Go features without any optimizations.

让我们看看如何在不进行任何优化的情况下使用纯Go功能实现服务器的某些部分。

Before we proceed with net/http, let's talk about how we will send and receive data. The data which stands above the WebSocket protocol (e.g. JSON objects) will hereinafter be referred to as packets.

在继续net/http ,让我们谈谈如何发送和接收数据。 位于WebSocket协议之上的数据(例如JSON对象)在下文中将称为数据包

Let's begin implementing the Channel structure that will contain the logic of sending and receiving such packets over the WebSocket connection.

让我们开始实现Channel结构,该结构将包含通过WebSocket连接发送和接收此类数据包的逻辑。

2.1。 通道结构 (2.1. Channel struct)

I’d like to draw your attention to the launch of two reading and writing goroutines. Each goroutine requires its own memory stack that may have an initial size of 2 to 8 KB depending on the operating system and Go version.

我想提请您注意启动两个读写goroutine。 每个goroutine都需要其自己的内存堆栈, 根据操作系统和Go版本的不同,其初始大小可能为2至8 KB。

Regarding the above mentioned number of 3 million online connections, we will need 24 GB of memory (with the stack of 4 KB) for all connections. And that’s without the memory allocated for the Channel structure, the outgoing packets ch.send and other internal fields.

关于上面提到的300万个在线连接数,我们将需要24 GB的内存 (堆栈为4 KB)用于所有连接。 而且这没有分配给Channel结构的内存,传出数据包ch.send和其他内部字段。

2.2。 I / O例程 (2.2. I/O goroutines)

Let’s have a look at the implementation of the “reader”:

让我们看一下“阅读器”的实现:

Here we use the bufio.Reader to reduce the number of read() syscalls and to read as many as allowed by the buf buffer size. Within the infinite loop, we expect new data to come. Please remember the words: expect new data to come. We will return to them later.

在这里,我们使用bufio.Reader减少了read()系统调用的数量,并读取了buf缓冲区大小所允许的数量。 在无限循环内,我们希望有新的数据出现。 请记住这句话: 期待新数据的到来。 我们稍后再返回。

We will leave aside the parsing and processing of incoming packets, as it is not important for the optimizations we will talk about. However, buf is worth our attention now: by default, it is 4 KB which means another 12 GB of memory for our connections. There is a similar situation with the "writer":

我们将忽略传入数据包的解析和处理,因为这对于我们将要讨论的优化并不重要。 但是, buf现在值得我们注意:默认情况下,它是4 KB,这意味着我们的连接还有12 GB的内存。 “作者”也有类似情况:

We iterate across the outgoing packets channel c.send and write them to the buffer. This is, as our attentive readers can already guess, another 4 KB and 12 GB of memory for our 3 million connections.

我们遍历出站数据包通道c.send并将其写入缓冲区。 正如我们细心的读者已经可以猜到的那样,这为我们的300万个连接提供了另外4 KB和12 GB的内存。

2.3。 HTTP (2.3. HTTP)

We already have a simple Channel implementation, now we need to get a WebSocket connection to work with. As we are still under the Idiomatic Way heading, let's do it in the corresponding way.

我们已经有一个简单的Channel实现,现在我们需要使用一个WebSocket连接。 由于我们仍处于“ 惯用方式”标题下,因此以相应的方式进行操作。

Note: If you don’t know how WebSocket works, it should be mentioned that the client switches to the WebSocket protocol by means of a special HTTP mechanism called Upgrade. After the successful processing of an Upgrade request, the server and the client use the TCP connection to exchange binary WebSocket frames. Here is a description of the frame structure inside the connection.

注意:如果您不知道WebSocket的工作方式,则应提到客户端通过称为升级的特殊HTTP机制切换到WebSocket协议。 成功处理升级请求后,服务器和客户端使用TCP连接交换二进制WebSocket框架。 是连接内部的框架结构的描述。

Please note that http.ResponseWriter makes memory allocation for bufio.Reader and bufio.Writer (both with 4 KB buffer) for *http.Request initialization and further response writing.

请注意, http.ResponseWriterbufio.Readerbufio.Writer (均带有4 KB缓冲区)进行内存分配,用于*http.Request初始化和进一步的响应写入。

Regardless of the WebSocket library used, after a successful response to the Upgrade request, the server receives I/O buffers together with the TCP connection after the responseWriter.Hijack() call.

无论使用哪种WebSocket库,在成功响应Upgrade请求后, 服务器都会responseWriter.Hijack()调用之后接收 I / O缓冲区以及TCP连接。

Hint: in some cases the go:linkname can be used to return the buffers to the sync.Pool inside net/http through the call net/http.putBufio{Reader,Writer}.

提示:在某些情况下,可以使用go:linkname通过调用net/http.putBufio{Reader,Writer}将缓冲区返回到sync.Poolnet/http

Thus, we need another 24 GB of memory for 3 million connections.

因此,我们需要另外24 GB的内存用于300万个连接。

So, a total of 72 GB of memory for the application that does nothing yet!

因此,该应用程序总共有72 GB的内存,什么都不做!

3.优化 (3. Optimizations)

Let’s review what we talked about in the introduction part and remember how a user connection behaves. After switching to WebSocket, the client sends a packet with the relevant events or in other words subscribes for events. Then (not taking into account technical messages such as ping/pong), the client may send nothing else for the whole connection lifetime.

让我们回顾一下在引言部分中讨论的内容,并记住用户连接的行为方式。 切换到WebSocket之后,客户端发送带有相关事件的数据包,或者换句话说,订阅事件。 然后(在不考虑诸如ping/pong类的技术消息的情况下),客户端在整个连接生命周期中可能什么也不发送。

The connection lifetime may last from several seconds to several days.

连接寿命可能会持续几秒钟到几天。

So for the most time our Channel.reader() and Channel.writer() are waiting for the handling of data for receiving or sending. Along with them waiting are the I/O buffers of 4 KB each.

因此,在大多数时间里,我们的Channel.reader()Channel.writer()都在等待接收或发送数据的处理。 与它们一起等待的是每个4 KB的I / O缓冲区。

Now it is clear that certain things could be done better, couldn’t they?

现在很明显,某些事情可以做得更好,不是吗?

3.1。 网络投票 (3.1. Netpoll)

Do you remember the Channel.reader() implementation that expected new data to come by getting locked on the conn.Read() call inside the bufio.Reader.Read()? If there was data in the connection, Go runtime "woke up" our goroutine and allowed it to read the next packet. After that, the goroutine got locked again while expecting new data. Let's see how Go runtime understands that the goroutine must be "woken up".

您是否还记得Channel.reader()实现,该实现期望通过锁定conn.Read()内部的bufio.Reader.Read()调用获得新数据 ? 如果连接中有数据,则Go运行时会“唤醒”我们的goroutine并允许其读取下一个数据包。 之后,goroutine在等待新数据时再次被锁定。 让我们看看Go运行时如何理解goroutine必须被“唤醒”。

If we look at the conn.Read() implementation, we’ll see the net.netFD.Read() call inside it:

如果我们看一下conn.Read()实现 ,我们将看到其中的net.netFD.Read()调用

Go uses sockets in non-blocking mode. EAGAIN says there is no data in the socket and not to get locked on reading from the empty socket, OS returns control to us.

Go在非阻塞模式下使用套接字。 EAGAIN表示套接字中没有数据,并且不想从空套接字读取数据时被锁定,OS会将控制权返还给我们。

We see a read() syscall from the connection file descriptor. If read returns the EAGAIN error, runtime makes the pollDesc.waitRead() call:

我们从连接文件描述符中看到一个read()系统调用。 如果read返回EAGAIN错误 ,则运行时将调用pollDesc.waitRead()

If we dig deeper, we’ll see that netpoll is implemented using epoll in Linux and kqueue in BSD. Why not use the same approach for our connections? We could allocate a read buffer and start the reading goroutine only when it is really necessary: when there is really readable data in the socket.

如果深入研究 ,我们将看到netpoll是在Linux中使用epoll和在BSD中使用kqueue实施的。 为什么不对我们的连接使用相同的方法? 我们可以分配一个读缓冲区,并仅在确实必要时(即套接字中确实有可读数据时)启动读取goroutine。

On github.com/golang/go, there is the issue of exporting netpoll functions.

在github.com/golang/go上,存在导出netpoll函数的问题

3.2。 摆脱goroutines (3.2. Getting rid of goroutines)

Suppose we have netpoll implementation for Go. Now we can avoid starting the Channel.reader() goroutine with the inside buffer, and subscribe for the event of readable data in the connection:

假设我们有Go的netpoll实现 。 现在,我们可以避免使用内部缓冲区启动Channel.reader() goroutine,并在连接中预订可读数据的事件:

It is easier with the Channel.writer() because we can run the goroutine and allocate the buffer only when we are going to send the packet:

使用Channel.writer()更容易,因为我们可以运行goroutine并仅在要发送数据包时才分配缓冲区:

Note that we do not handle cases when operating system returns EAGAIN on write() system calls. We lean on Go runtime for such cases, cause it is actually rare for such kind of servers. Nevertheless, it could be handled in the same way if needed.

请注意,当操作系统在write()系统调用上返回EAGAIN时,我们不处理这种情况。 在这种情况下,我们依靠Go运行时,因为这种服务器实际上很少见。 但是,如果需要,可以用相同的方式处理它。

After reading the outgoing packets from ch.send (one or several), the writer will finish its operation and free the goroutine stack and the send buffer.

ch.send (一个或多个)读取传出的数据包后, ch.send将完成其操作并释放goroutine堆栈和发送缓冲区。

Perfect! We have saved 48 GB by getting rid of the stack and I/O buffers inside of two continuously running goroutines.

完善! 通过消除两个连续运行的goroutine中的堆栈和I / O缓冲区,我们节省了48 GB

3.3。 资源控制 (3.3. Control of resources)

A great number of connections involves not only high memory consumption. When developing the server, we experienced repeated race conditions and deadlocks often followed by the so-called self-DDoS — a situation when the application clients rampantly tried to connect to the server thus breaking it even more.

大量连接不仅涉及高内存消耗。 在开发服务器时,我们经历了反复的竞争状况和僵局,之后经常出现所谓的self-DDoS(自我DDoS),这种情况是应用程序客户端大量尝试连接到服务器,从而进一步破坏了服务器。

For example, if for some reason we suddenly could not handle ping/pong messages, but the handler of idle connections continued to close such connections (supposing that the connections were broken and therefore provided no data), the client appeared to lose connection every N seconds and tried to connect again instead of waiting for events.

例如,如果由于某种原因我们突然无法处理ping/pong消息,但是空闲连接的处理程序继续关闭此类连接(假设连接断开,因此不提供数据),则客户端似乎每隔N个连接就会丢失秒并尝试重新连接,而不是等待事件。

It would be great if the locked or overloaded server just stopped accepting new connections, and the balancer before it (for example, nginx) passed request to the next server instance.

如果锁定或过载的服务器只是停止接受新的连接,并且平衡器(例如nginx)将请求传递给下一个服务器实例,那将是很好的选择。

Moreover, regardless of the server load, if all clients suddenly want to send us a packet for any reason (presumably by cause of bug), the previously saved 48 GB will be of use again, as we will actually get back to the initial state of the goroutine and the buffer per each connection.

此外,无论服务器负载如何,如果所有客户端突然出于任何原因(可能是由于错误原因)向我们发送数据包,先前保存的48 GB都将再次使用,因为我们实际上将返回到初始状态每个连接的goroutine和缓冲区的大小。

Goroutine池 (Goroutine pool)

We can restrict the number of packets handled simultaneously using a goroutine pool. This is what a naive implementation of such pool looks like:

我们可以限制使用goroutine池同时处理的数据包的数量。 这是这种池的一个简单的实现:

Now our code with netpoll looks as follows:

现在,我们带有netpoll的代码如下所示:

So now we read the packet not only upon readable data appearance in the socket, but also upon the first opportunity to take up the free goroutine in the pool.

因此,现在我们不仅在套接字上出现可读数据时读取数据包,而且在有机会占用池中的免费goroutine时也读取了该数据包。

Similarly, we’ll change Send():

同样,我们将更改Send()

Instead of go ch.writer(), we want to write in one of the reused goroutines. Thus, for a pool of N goroutines, we can guarantee that with N requests handled simultaneously and the arrived N + 1 we will not allocate a N + 1 buffer for reading. The goroutine pool also allows us to limit Accept() and Upgrade() of new connections and to avoid most situations with DDoS.

而不是go ch.writer() ,我们想编写一个重用的goroutines。 因此,对于N goroutine的池,我们可以保证在同时处理N请求和到达N + 1我们不会为读取分配N + 1缓冲区。 goroutine池还允许我们限制新连接的Accept()Upgrade() ,并避免使用DDoS的大多数情况。

3.4。 零拷贝升级 (3.4. Zero-copy upgrade)

Let’s deviate a little from the WebSocket protocol. As was already mentioned, the client switches to the WebSocket protocol using a HTTP Upgrade request. This is what it looks like:

让我们与WebSocket协议稍有不同。 如前所述,客户端使用HTTP升级请求切换到WebSocket协议。 看起来是这样的:

That is, in our case we need the HTTP request and its headers only for switch to the WebSocket protocol. This knowledge and what is stored inside the http.Request suggests that for the sake of optimization, we could probably refuse unnecessary allocations and copyings when processing HTTP requests and abandon the standard net/http server.

也就是说,在我们的情况下,我们仅需要HTTP请求及其标头才能切换到WebSocket协议。 这些知识以及存储http.Request表明,为了优化起见,在处理HTTP请求并放弃标准的net/http服务器时,我们可能会拒绝不必要的分配和复制。

For example, the http.Request contains a field with the same-name Header type that is unconditionally filled with all request headers by copying data from the connection to the values strings. Imagine how much extra data could be kept inside this field, for example for a large-size Cookie header.

例如, http.Request包含一个具有相同标题头类型字段,该字段通过将数据从连接复制到值字符串中而无条件地填充了所有请求标题。 想象一下,在此字段中可以保留多少额外的数据,例如对于一个大的Cookie标头。

But what to take in return?

但是要拿什么作为回报呢?

WebSocket实施 (WebSocket implementation)

Unfortunately, all libraries existing at the time of our server optimization allowed us to do upgrade only for the standard net/http server. Moreover, neither of the (two) libraries made it possible to use all the above read and write optimizations. For these optimizations to work, we must have a rather low-level API for working with WebSocket. To reuse the buffers, we need the procotol functions to look like this:

不幸的是,在服务器优化时所有现有的库都允许我们仅对标准的net/http服务器进行升级。 此外,(两个)库都无法使用上述所有读写优化。 为了使这些优化生效,我们必须有一个用于WebSocket的底层API。 要重用缓冲区,我们需要使procotol函数看起来像这样:

func ReadFrame(io.Reader) (Frame, error)
func WriteFrame(io.Writer, Frame) error

If we had a library with such API, we could read packets from the connection as follows (the packet writing would look the same):

如果我们有一个带有此类API的库,我们可以按如下方式从连接中读取数据包(数据包的写入看起来相同):

In short, it was time to make our own library.

简而言之,是时候建立我们自己的图书馆了。

github.com/gobwas/ws (github.com/gobwas/ws)

Ideologically, the ws library was written so as not to impose its protocol operation logic on users. All reading and writing methods accept standard io.Reader and io.Writer interfaces, which makes it possible to use or not to use buffering or any other I/O wrappers.

从思想ws ,编写了ws库是为了不将其协议操作逻辑强加给用户。 所有读写方法都接受标准的io.Readerio.Writer接口,这使得可以使用或不使用缓冲或任何其他I / O包装器。

Besides upgrade requests from standard net/http, ws supports zero-copy upgrade, the handling of upgrade requests and switching to WebSocket without memory allocations or copyings. ws.Upgrade() accepts io.ReadWriter (net.Conn implements this interface). In other words, we could use the standard net.Listen() and transfer the received connection from ln.Accept() immediately to ws.Upgrade(). The library makes it possible to copy any request data for future use in the application (for example, Cookie to verify the session).

除了来自标准net/http升级请求外, ws支持零拷贝升级 ,处理升级请求并切换到WebSocket,而无需分配内存或进行复制。 ws.Upgrade()接受io.ReadWriter ( net.Conn实现此接口)。 换句话说,我们可以使用标准的net.Listen()并将接收到的连接从ln.Accept()立即传输到ws.Upgrade() 。 该库可以复制任何请求数据以供将来在应用程序中使用(例如,使用Cookie验证会话)。

Below there are benchmarks of Upgrade request processing: standard net/http server versus net.Listen() with zero-copy upgrade:

下面是升级请求处理的基准 :标准net/http服务器与零拷贝升级的net.Listen()

BenchmarkUpgradeHTTP    5156 ns/op    8576 B/op    9 allocs/op
BenchmarkUpgradeTCP     973 ns/op     0 B/op       0 allocs/op

Switching to ws and zero-copy upgrade saved us another 24 GB — the space allocated for I/O buffers upon request processing by the net/http handler.

切换到ws并进行零拷贝升级又为我们节省了24 GBnet/http处理程序处理请求后为I / O缓冲区分配的空间。

3.5。 摘要 (3.5. Summary)

Let’s structure the optimizations I told you about.

让我们来构造我所介绍的优化。

  • A read goroutine with a buffer inside is expensive. Solution: netpoll (epoll, kqueue); reuse the buffers.

    带有缓冲区的读取goroutine昂贵。 解决方案 :netpoll(epoll,kqueue); 重用缓冲区。

  • A write goroutine with a buffer inside is expensive. Solution: start the goroutine when necessary; reuse the buffers.

    内部带有缓冲区的写goroutine很昂贵。 解决方案 :必要时启动goroutine; 重用缓冲区。

  • With a storm of connections, netpoll won’t work. Solution: reuse the goroutines with the limit on their number.

    随着连接的风暴,netpoll将无法正常工作。 解决方案 :重复使用goroutines并限制其数量。

  • net/http is not the fastest way to handle Upgrade to WebSocket. Solution: use the zero-copy upgrade on bare TCP connection.

    net/http不是处理升级到WebSocket的最快方法。 解决方案 :在裸TCP连接上使用零拷贝升级。

That is what the server code could look like:

这就是服务器代码的样子:

4。结论 (4. Conclusion)

Premature optimization is the root of all evil (or at least most of it) in programming. Donald Knuth

过早的优化是编程中所有邪恶(或至少是大多数邪恶)的根源。 唐纳德·克努斯

Of course, the above optimizations are relevant, but not in all cases. For example if the ratio between free resources (memory, CPU) and the number of online connections is rather high, there is probably no sense in optimizing. However, you can benefit a lot from knowing where and what to improve.

当然,上述优化是相关的,但并非在所有情况下都如此。 例如,如果空闲资源(内存,CPU)和在线连接数之间的比率很高,那么优化可能就没有意义了。 但是,您可以从知道哪些地方以及需要改进的地方中受益匪浅。

Thank you for your attention!

感谢您的关注!

5.参考 (5. References)

翻译自: https://www.freecodecamp.org/news/million-websockets-and-go-cc58418460bb/

websockets

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值