在Ifood开发微服务以每秒处理超过3万个请求-CSDN博客

Ifood it’s a Brazilian food tech company delivering more than 1 million orders per day and growing around 110% every year. As a food tech, the platform peeks hours are mostly around lunch and dinner and it gets even higher during the weekends.

Ifood是一家巴西食品技术公司，每天提供超过100万张订单，并且每年以110％的速度增长 。作为食品技术人员，该平台的窥视时间主要在午餐和晚餐左右，而在周末时会更高。

On some special days (eg: due to marketing campaigns) we beat the last record and see the platform getting as its highest peek of all times. Last June 12th was that day. We saw a microservice reach 2MM requests per minute.

在某些特殊的日子(例如：由于营销活动)，我们打破了最后的记录，并看到该平台成为有史以来最高的偷看记录。去年6月12日是那天。 我们看到微服务每分钟达到2MM请求。

一些背景故事 (Some background story)

I’ve been working in the Account & Identity team in the Platform Engineering area in Ifood for around one and a half years. It’s been quite a journey and from time to time we face a lot of challenges due to the fast-growing of the company. When designing new solutions we always have to keep in mind the idea that in a couple of months that the system use will grow 2 or 3 times.

我已经在Ifood平台工程领域的帐户和身份团队工作了大约一年半。这是一段漫长的旅程，由于公司的快速发展，我们有时会面临很多挑战。在设计新解决方案时，我们始终必须牢记一个想法，即几个月后系统使用量将增长2或3倍。

The story I’m gonna tell today is an example of the case above. The system was developed around 2018 when the company was delivering 13 million orders per month. Nowadays, it’s over 30 million. That’s not always true, but in this case, the system usage has grown with the same proportion of the company grow, and later on it started to grow even more aggressively.

我今天要讲的故事就是上述案例的一个例子。该系统于2018年左右开发，当时公司每月交付1300万个订单。如今，它已超过3000万。并非总是如此，但是在这种情况下，系统使用率的增长与公司增长的比例相同，后来又开始更加激进地增长。

Internally we call this microservice as account-metadata. Even though it’s kind of a generic name it also explains what’s the service is up to: it deals with the accounts’ metadata. What is account metadata? Well, mostly what’s not a main/basic customer information. To give you some examples: if the user prefers to get notifications via SMS or email, the favorite food types (like burger, pasta, Japanese food, etc), some feature flags, number of orders done for that user and so on. It’s like a commonplace to aggregate data from different places and to be served easily to the mobile app, but also to other microservices so they just need to call one microservice instead of ten.

在内部，我们将此微服务称为帐户元数据。即使它是一个通用名称，它也解释了服务的功能：它处理帐户的元数据。什么是帐户元数据？好吧，大部分不是主要/基本的客户信息。举个例子：如果用户更喜欢通过SMS或电子邮件获取通知，则最喜欢的食物类型(如汉堡，面食，日本料理等)，一些功能标记，为该用户完成的订单数量等等。就像汇总不同位置的数据并轻松将其提供给移动应用程序一样，还可以将其提供给其他微服务，因此它们只需要调用一个微服务而不是十个即可。

Back in 2018, account-metadata was built mainly to put some random (and not so used) information that, to be honest, there was no other place to put them. We needed a bit of structure and query power and it should be easy to scale it, so we pick the provisioned DynamoDB by AWS. Just to make it clear here, that, we’re aware that the system could grow, also the company was quite big already and the average load was challenging. However, there was no way we could figure out that we would go from 10k requests per minute (rpm) to 200k and then to 2MM rpm.

早在2018年，帐户元数据的构建主要是为了放置一些随机(而不是不那么使用)的信息，说实话，没有其他地方可以放置它们。我们需要一些结构和查询功能，并且应该很容易扩展它，因此我们选择了AWS提供的DynamoDB 。只是为了在此明确说明一下，我们知道该系统可以扩展，而且该公司已经相当大，平均负载也极具挑战性。 但是，我们无法确定从每分钟1万个请求(rpm)到200k，然后是2MM rpm。

After the first release of this microservice, there was not much usage (compared to other microservices in the Account & Identity team). However, a few weeks later and some business decisions made, the system became very important and it would be one of the first calls that the mobile app would do to get all that info about the customer.

首次发布此微服务后，使用率不高(与“帐户和身份”团队中的其他微服务相比)。但是，几周后，在做出一些业务决策后，该系统变得非常重要，这将成为移动应用程序获取所有有关客户信息的第一个呼叫之一。

Few months after that decision, other teams started to see the account-metadata as a nice place to put info that was split into multiple places and that was hard to relly on. Also, we started to create more aggregations that would make the life of other microservices really easier, increasing the importance of the system and that’s just spread its popularity and importance inside other teams. Now, account-metadata is called every time a user opens the app and by many teams in very different contexts.

做出决定后的几个月，其他团队开始将帐户元数据视为放置信息的好地方，该信息被拆分为多个位置并且很难依赖。另外，我们开始创建更多的聚合，这将使其他微服务的寿命变得更加轻松，从而增加了系统的重要性，并且刚刚在其他团队中传播了其流行度和重要性。现在，每次用户打开应用程序时，许多团队都会在不同的上下文中调用account-metadata。

And that’s a very, very short summary of what happened and how the system became so important from 2018 until now. During this period, the team (me plus eight really brilliant people that I’m very lucky to work with) actively worked on it, but not exclusively. We’re still patching, developing, and maintaining the other ten microservices that we own.

这是从2018年到现在发生的事情以及系统变得如此重要的非常非常简短的摘要。在此期间，团队(我加上我非常幸运的八名非常聪明的人)积极地对此进行了工作，但并非排他地。我们仍在修补，开发和维护我们拥有的其他十个微服务。

We did a bunch of changes and to describe all the scenarios that we went through would take too much time, so I’m writing bellow the current architecture that we have to be able to healthy deliver 2 MM requests per minute. Finally, it’s time to dive into the technical part.

我们做了很多更改，并描述了所经历的所有场景都将花费太多时间，因此，我在下面写的是当前的体系结构 ，我们必须能够健康地每分钟发送2 MM请求 。最后， 是时候深入探讨技术部分了。

深入技术部分 (Diving into the technical part)

As I said, this microservice stores metadata about the customer. In the database, we split this metadata into different contexts (or as we called in the code: namespaces). A customer (customer_id as a partition key) can have one to N namespaces (as the sorting key) and each one has a fixed and rigid schema that is defined and checked (before insert) by a jsonschema. With that, we can make sure that however will insert data into a namespace (more details on this later on) will respect its schema and its right usage.

就像我说的那样，此微服务存储有关客户的元数据。在数据库中，我们将此元数据拆分为不同的上下文(或如我们在代码中所称的： namespaces )。客户(customer_id作为分区键)可以具有一个到N个命名空间(作为排序键)，每个名称空间都有一个固定的刚性模式，该模式由jsonschema定义和检查(在插入之前)。这样，我们可以确保将数据插入名称空间(稍后会对此进行更多详细介绍)将尊重其架构和正确的用法。

We used this approach because the read and write on this system are done by really different areas of the company.

我们之所以使用这种方法，是因为对该系统的读写操作实际上是由公司的不同部门完成的。

The insert is done by the data science team as they daily export millions of records from their internal tools to this microservice calling it via an API splitting these millions of records into batches of ~ 500 items. So, a given time of the day this microservice receives millions of calls (in an interval of 10 to 20 minutes) to insert data into the DynamoDB. If the batch API that receives the ~ 500 items would write them directly into the database, we could have some problems to scale Dynamo, and also it could be hard to keep a slow response time. A way to fix this bottleneck would be the data team writing their data directly into the database, however, we must check if the items respect the jsonschema defined to the namespace that it will be stored and that’s a responsibility of the microservice.

插入是由数据科学团队完成的，因为他们每天将其内部工具中的数百万条记录导出到此微服务，并通过API调用该API，将数百万条记录分成约500个项目的批次。因此，该微服务在一天的给定时间内会收到数百万次调用(以10到20分钟为间隔)，以将数据插入DynamoDB。如果接收约500个项目的批处理API将它们直接写入数据库，我们在扩展Dynamo时可能会遇到一些问题，并且可能很难保持较慢的响应时间。解决此瓶颈的一种方法是，数据团队将其数据直接写入数据库中，但是，我们必须检查这些项目是否遵守为存储该命名空间定义的jsonschema，并且这是微服务的责任。

So the solution was that this API would receive the batch of items and post them on an SNS/SQS that will be consumed by another part of the application that then will validate the item and if it’s ok, save it on Dynamo. With this approach, the endpoint that receives the batch of items can answer really fast and we can make the write without relying on the HTTP connection (this is quite important because the communication with Dynamo may fail and try it again could make the HTTP response time really slow). Another benefit is that we can control how fast/slow we want to read the data from SQS and write it on Dynamo by controlling the consumers.

因此，解决方案是该API将接收一批物料并将它们发布在SNS / SQS上，然后由应用程序的另一部分使用，然后将对该物料进行验证，如果可以的话，将其保存在Dynamo上。使用这种方法，接收一批商品的端点可以非常快地回答，我们可以在不依赖HTTP连接的情况下进行写操作(这非常重要，因为与Dynamo的通信可能会失败并再次尝试可能会使HTTP响应时间变长真的很慢)。另一个好处是，通过控制使用者，我们可以控制要从SQS读取数据并将其写入Dynamo的速度快/慢。

Outside of this workflow, the account-metadata is also called by another service that call it every time that an order is received by the platform, to update some information regarding that order. Given that Ifood does more than 1MM orders per day, the microservices also receive that amount of calls.

在此工作流程之外，每次在平台收到订单时，另一个服务也会调用该帐户元数据，以更新有关该订单的某些信息。鉴于Ifood每天执行的订单超过1毫米，微服务也将收到该数量的呼叫。

While there’s some very heavy write process on it, 95% of its load comes from API calls to read data. As I said, the write and data are done from very different teams, and the read calls are done by many, many teams, and mobile apps. For our luck, this microservice is much more requested to read data than to write it, so it’s a little bit easier to scale it. As any system that reads a lot of data needs a cache, so does this and instead of using Redis or something like that, AWS provides DAX as “kinda built-in” cache for DynamoDB. To use it you just need to change the client and understand the replication delay that may happen in the different queries operations.

尽管上面有一些非常繁重的写入过程，但其95％的负载来自读取数据的API调用。如我所说，写入和数据是由非常不同的团队完成的，而读取调用是由许多团队和移动应用程序完成的。令我们幸运的是，这种微服务需要读取的数据要多于写入的数据，因此扩展它要容易一些。由于任何读取大量数据的系统都需要缓存，因此不需要使用Redis或类似的缓存，因此AWS提供DAX作为DynamoDB的“内置内置缓存”。要使用它，您只需要更改客户端并了解在不同查询操作中可能发生的复制延迟。

With this amount of calls, it’s quite normal that we get some irregularity. In our case, we started to see some queries in Dynamo taking longer than 2 or 3 seconds, when 99.99% of the calls were under 17ms. Even though they’re just a few thousand per day, we would like to provide a better SLA for the teams. So, we decied to do a retry if we got a timeout from Dynamo. We also talked with teams so they configure a low timeout when calling our APIs. The default for most of their HTTP client was 2s, so we changed to ~100ms. If they get a timeout (let’s say that the microservice did a retry to dynamo but failed again) they can retry and very likely they will get a response.

有了这么多的电话，我们有些不正常现象是很正常的。在我们的案例中，我们开始看到Dynamo中的某些查询花费了超过2到3秒钟的时间，而99.99％的呼叫时间不到17毫秒。即使每天只有几千个，我们也希望为团队提供更好的SLA。因此，如果Dynamo超时，我们决定重试。我们还与团队进行了交谈，以便他们在调用我们的API时配置较低的超时时间。他们大多数HTTP客户端的默认值为2s，因此我们更改为〜100ms。如果他们超时(假设微服务重试了发电机但又失败了)，他们可以重试，并且很可能会得到响应。

To deploy it, we’re using k8s (reaching around 70 pods) and scaling it as the requests per second grow. The DynamoDB is set as provisioned instead of on-demand.

为了部署它，我们使用的是k8s(可达到约70个容器)，并随着每秒请求的增长而扩展。 DynamoDB设置为按需配置，而不是按需设置。

An important step to make sure that the system would be able to healthy work in cases really high throughput, we run a load/stress test against it every day, to make sure that the deploys from the day above did not degrade the performance and that things are still ok and working well. With the results of this load test, we could keep track if an endpoint was getting better or worse with time and development of it.

确保系统在高吞吐量的情况下能够正常运行的重要步骤，我们每天对其进行负载/压力测试，以确保从上一天开始的部署不会降低性能，并且一切都还好，运作良好。通过此负载测试的结果，我们可以跟踪端点随着时间和开发的发展情况是好是坏。

With time, this microservice became quite important for some teams and that’s a problem if, by any reason, the system goes down. To fix that, we ask the teams to call the microservice via Kong (our API gateway) and configured a fallback there. So, if the microservice goes down or returns a 500, Kong will activate the fallback and the client will get an answer. The fallback in this case is a S3 bucket with a copy of the data that the system would provide. It may be outdate, but that’s better than do not have an answer at all.

随着时间的流逝，这种微服务对于某些团队来说变得非常重要，如果由于某种原因系统出现故障，这将成为一个问题。为了解决这个问题，我们要求团队通过Kong(我们的API网关)调用微服务，并在此处配置后备。因此，如果微服务掉线或返回500，Kong将激活后备，客户端将得到答案。在这种情况下，回退是一个S3存储桶，其中包含系统将提供的数据副本。可能已经过时了，但是总比没有答案要好。

And that’s, in summary, how it works. There is also some other workflows on it, but nothing as relevant as the ones that I’ve said.

总之，这就是它的工作方式。还有其他一些工作流程，但是没有我说过的那么重要。

The next steps are still not very clear for the team. The usage of the microservice may grow even more and we could reach a point where it starts to became harder and harder to make it scalle. An alternative could be to split it into different microservices (maybe even with different databases) or aggregate more the data to better serve them. In any case we will still keep doing tests, finding the bottlenecks and trying to fix them.

对于团队来说，下一步还不是很清楚。微服务的使用可能会增长甚至更多，我们可能会达到这样一个程度，即它开始变得越来越难于扩展。一种替代方法是将其拆分为不同的微服务(甚至可能具有不同的数据库)，或者聚合更多数据以更好地为它们服务。无论如何，我们仍然会继续进行测试，查找瓶颈并尝试修复它们。