应用服务器架构_无状态架构的挑战以及如何监视无服务器应用程序

最新推荐文章于 2023-02-26 21:54:06 发布

weixin_26714375

最新推荐文章于 2023-02-26 21:54:06 发布

阅读量228

点赞数

文章标签： python 数据库 java linux

原文链接：https://medium.com/better-programming/the-challenges-of-stateless-architecture-and-how-to-monitor-your-serverless-application-94c0e8b8dd1

版权

应用服务器架构

So you decided to go serverless. Congrats! Welcome to the land of high availability, vertical and horizontal scalability, rapid development, and only paying for what you use.

因此，您决定不使用服务器。恭喜！欢迎来到高可用性，垂直和水平可伸缩性，快速发展且仅需付费使用的土地。

Business functionality is the name of the game. Software engineers get to spend their time solving business problems instead of solving networking and infrastructure problems. Your application is going to have full focus from the development team.

业务功能是游戏的名称。软件工程师可以花时间解决业务问题，而不是解决网络和基础结构问题。您的应用程序将由开发团队全力以赴。

But with serverless apps come some new and interesting challenges — challenges that were once solved problems like how to forecast application costs. Since your application is “pay for what you use,” how does that translate to the number of executions and compute time?

但是，无服务器应用带来了一些新的有趣的挑战，这些挑战曾经被解决过，例如如何预测应用成本。由于您的应用程序是“按使用量付费”，这如何转换为执行次数和计算时间？

Other problems include how to test your application at scale. How do you test all those lambda functions, API Gateway endpoints, and SNS topics?

其他问题包括如何大规模测试应用程序。您如何测试所有这些lambda函数，API Gateway端点和SNS主题？

How about application health? You need to know if your application is up and running and doing what it’s supposed to be doing. Traditional development had an easy answer to this problem. Just add a health check endpoint. Create a web service that checks the status of your servers and their connectivity.

应用程序运行状况如何？您需要知道您的应用程序是否已启动并正在运行，并正在执行它应该执行的操作。传统开发对这个问题有一个简单的答案。只需添加运行状况检查端点即可。创建一个Web服务，以检查服务器的状态及其连接。

But what if your app doesn’t have servers? A serverless application is stateless. It’s distributed. You don’t control the servers your code runs on. Heck, most of the time, your code isn’t even running!

但是，如果您的应用没有服务器怎么办？无服务器应用程序是无状态的。它是分布式的。您无法控制代码在其上运行的服务器。哎呀，大多数时候，您的代码甚至都没有运行！

Image for post — *Photo by* *通过照片* *Sebastian Herman塞巴斯蒂安·赫尔曼* on 在 *UnsplashUnsplash* . 。

了解您的无服务器问题 (Get to Know Your Serverless Issues)

You’re going to have different issues with a serverless application than with a traditional client/server or even containerized application. Believe it or not, your days of “Did you try turning it off and on again?” are now over. There’s nothing to turn off.

与传统的客户端/服务器甚至容器化应用程序相比，无服务器应用程序将面临不同的问题。信不信由你，“你尝试过关闭并重新打开它吗？” 现在结束了。没有什么可以关闭的。

When an AWS Lambda function is invoked, a small firecracker container is spun up, your code is executed, then it is spun down (unless more invocations quickly follow it). Essentially, your entire application is turning itself off and on again all day.

调用AWS Lambda函数时，会旋转一个小的鞭炮容器，执行您的代码，然后旋转它(除非有更多的调用快速跟随它)。本质上，您的整个应用程序全天都会关闭并重新打开。

What you are much more likely to see are infrastructure setup issues, configuration problems, and external integration mishaps.

您更有可能看到的是基础结构设置问题，配置问题和外部集成故障。

基础架构设置问题 (Infrastructure setup issues)

If you are using a repeatable infrastructure setup script like CloudFormation, there are many tools in place to make sure your serverless app is configured properly.

如果您使用的是可重复使用的基础架构设置脚本(例如CloudFormation) ，则可以使用许多工具来确保正确配置无服务器应用程序。

AWS SAM is a framework that allows you to easily connect serverless components together. It takes the heavy lifting of CloudFormation and abstracts it away into a few lines of YAML or JSON. This allows you to do things like trigger a lambda function when you upload a document to S3 or build a DynamoDB table in just a few lines.

AWS SAM是一个框架，可让您轻松地将无服务器组件连接在一起。它需要大量的CloudFormation并将其抽象为几行YAML或JSON。这使您可以执行诸如将文档上载到S3时触发lambda函数或仅几行即可构建DynamoDB表之类的操作。

What it does not do is validate your IAM permissions. IAM is what AWS uses for identity and access management. If you are following AWS best practices, you are practicing the principle of least privilege (PoLP). This means you have set up your roles to only use what they absolutely need.

它不执行的操作是验证您的IAM权限。 IAM是AWS用于身份和访问管理的内容。如果您遵循AWS最佳实践，那么您正在实践最小特权(PoLP)原则。这意味着您已将角色设置为仅使用他们绝对需要的角色。

This sounds great in theory, and ultimately it is a security-minded person’s dream. But in the beginning, it is incredibly hard. Finding the right permissions, knowing when to limit down, and figuring out how to build IAM roles has a bit of a learning curve.

从理论上讲，这听起来不错，最终这是一个有安全意识的人的梦想。但是从一开始，这是很难的。找到正确的权限，知道何时限制时间，并弄清楚如何构建IAM角色，需要学习一些技巧。

When you’re building your app for the first time, you will run into issues where one service isn’t allowed to talk to another. It’s easily fixable, but it’s bound to show up time and time again.

首次构建应用程序时，您会遇到不允许某项服务与另一项服务进行对话的问题。它很容易修复，但势必会一次又一次地出现。

配置问题 (Configuration problems)

As much as we all like to think we write perfect code, we do not. There will be bugs, and there will be many of them. With serverless applications, these bugs will certainly manifest themselves in configuration issues.

尽管我们都喜欢认为自己编写了完美的代码，但我们却没有。会有错误，并且会有很多。对于无服务器应用程序，这些错误肯定会在配置问题中显现出来。

A standard serverless pattern is Start Execution > Load Configuration > Do Business Logic > Save Results > Complete Execution.

无服务器的标准模式是“开始执行”>“负载配置”>“执行业务逻辑”>“保存结果”>“完整执行”。

If you don’t have an airtight configuration set up, your application is going to go haywire. Take it upon yourself to make sure you log an appropriate amount of details when it comes to saving and loading tenant configuration.

如果您没有设置密闭的配置，那么您的应用程序将陷入困境。在保存和加载承租人配置时，请确保自己记录了适当数量的详细信息。

外部整合的不幸 (External integration mishaps)

We’ve all been there. Everything is working great. Your application looks and feels bulletproof.

我们都去过那里。一切都很好。您的应用程序外观防弹。

All of a sudden, the wheels have fallen off. Bells and sirens and lights are going off and you have no idea what just happened.

突然之间，车轮掉了下来。钟声，警报器和灯光都熄灭了，您不知道发生了什么。

Turns out a third-party dependency went down. You rely on them to perform a task, and their application started failing.

事实证明，第三方依赖性下降了。您依靠他们来执行任务，并且他们的应用程序开始失败。

Your SLA is only as good as the worst SLA of your dependencies. Be careful when choosing them and have a plan in place when things go south.

您的SLA仅与依赖项中最差的SLA一样好。选择它们时要小心，并在事情发展到南方时制定计划。

了解您的(软)限制 (Know your (soft) limits)

A new problem you have to worry about is hitting soft limits. A soft limit is a constraint put in place by your cloud vendor to make sure you don’t do anything… stupid.

您必须担心的一个新问题是达到软限制。软限制是云供应商施加的约束，以确保您不做任何事情……愚蠢。

For example, AWS has a soft limit of 1,000 concurrent lambdas running at any given time. If you try to have 1,001+ lambdas going at the same time, you will start getting throttled and receive 429 HTTP responses.

例如，AWS具有在任何给定时间运行的1,000个并发lambda的软限制。如果您尝试同时运行1,001个以上的lambda，则会开始受到限制并收到429个HTTP响应。

Luckily, in this case, soft means they are adjustable. If you do run into a situation where you need to update a limit legitimately, you can send an email to AWS support and they will up your limit for you.

幸运的是，在这种情况下，柔软意味着它们是可调节的。如果确实遇到需要合法更新限制的情况，则可以向AWS支持发送电子邮件，他们将为您增加限制。

将错误抛出到DLQ中 (Throw Errors Into DLQs)

In a serverless application, you can’t log into an app server to go look at the logs. There are no more persistent connections. App server logs aren’t a thing.

在无服务器应用程序中，您无法登录到应用程序服务器来查看日志。没有更多的持久连接。应用服务器日志不是问题。

You have CloudWatch logs, but if you’re just browsing, it can be a nightmare to find what you’re looking for. Each lambda that spins up creates its own log group, and over the course of the day, you could have thousands of groups to comb through.

您有CloudWatch日志，但是如果您只是在浏览，那么找到所需内容可能是一场噩梦。旋转的每个lambda都会创建自己的日志组，在一天的过程中，您可能有成千上万个组要梳理。

You must send errors to a Dead Letter Queue (DLQ). A DLQ is a type of queue where you send exceptions to be reviewed manually. They give you direct access to the details of an error, plus you get the context of what your application was doing when the error occurred.

您必须将错误发送到死信队列(DLQ) 。 DLQ是一种队列，您可以在其中发送要手动检查的异常。它们使您可以直接访问错误的详细信息，还可以获取错误发生时应用程序正在执行的操作的上下文。

Your serverless application should almost always attempt to retry operations if they fail.

如果无服务器应用程序失败，几乎应该总是尝试重试操作。

If your retries fail and there’s not a clear path to resolution, take the context and the details of what you’re trying to do, and send them to a specific queue. You should have a separate queue for each failure point in your application.

如果重试失败，并且没有明确的解决方法，请获取要执行的操作的上下文和详细信息，然后将其发送到特定队列。对于应用程序中的每个故障点，您应该有一个单独的队列。

Yes, that does mean a significant number of queues, but it also means you know exactly what is failing should something show up in there.

是的，这确实意味着要排队的队列很多，但是这也意味着您确切知道如果出现什么故障，那到底是什么故障。

配置DLQ警报 (Configure DLQ Alarms)

In an ideal world, there wouldn’t be any issues.

在理想的世界中，不会有任何问题。

In a close-to-ideal world, a software company would identify and fix any issues before the customer reports them.

在理想情况下，软件公司会在客户报告问题之前识别并修复所有问题。

In order to be aware of issues the moment they happen, you need to set up alarms on your Dead Letter Queues. Alarms monitor DLQs and alert the responsible parties if there are incoming messages.

为了及时了解问题发生的时间，您需要在“死信队列”上设置警报。警报监视DLQ，并在出现传入消息时警告责任方。

You can build alarms directly in AWS CloudWatch or you can use an external service like Datadog to manage them. These services allow you to set alarms on specific Dead Letter Queues, watch for a threshold of incoming messages over a given period of time, and alert the relevant groups of people by sending emails, Slack messages, phone calls, etc.

您可以直接在AWS CloudWatch中构建警报，也可以使用诸如Datadog的外部服务来管理警报。这些服务使您可以在特定的死信队列上设置警报，在给定的时间内监视传入消息的阈值，并通过发送电子邮件，Slack消息，电话等来警告相关人群。

An example alarm for a DLQ would be:

DLQ的警报示例为：

Over the course of an hour: — If no messages are queued, everything is OK. — If 1-4 messages are queued, raise a warning in Slack that something might be going on. — If 5+ messages are queued, something is wrong. Notify the on-call engineers.

在一个小时的过程中：—如果没有消息排队，则一切正常。 —如果1-4条消息排队，请在Slack中发出警告，提示可能正在发生某些情况。 —如果5条以上的消息排队，则表示有问题。通知待命工程师。

Once you have a robust set of alarms, you’ll be able to quickly respond to issues as they arise.

有了一组可靠的警报后，您就可以Swift对出现的问题做出响应。

运行自动化业务流程 (Run Automated Business Flows)

It’s not realistic to have a QA analyst running tests through your system all the time. But you need tests to make sure the system is healthy and everything is working.

让质量检查分析师一直在系统中运行测试是不现实的。但是您需要进行测试以确保系统运行正常并且一切正常。

Instead, you can set up automated tests to exercise your application and have them run periodically.

相反，您可以设置自动化测试以测试您的应用程序并使它们定期运行。

With Postman, you can build workflows that simulate users in your system. You can record all the web requests your application makes in a business flow, parameterize them, and have them play back with random values at regular intervals.

使用Postman ，您可以构建工作流来模拟系统中的用户。您可以记录您的应用程序在业务流程中发出的所有Web请求，对其进行参数化，并使其以固定间隔定期播放。

On one of my projects, we configured these workflows (called collections) to run in our test and prod environments. We had a full suite of workflows that tested the entirety of the system that would run every four hours in our test environment and every two hours in production.

在我的一个项目中，我们将这些工作流(称为collections )配置为在我们的测试和生产环境中运行。我们有一整套的工作流，可以测试整个系统，该系统将在我们的测试环境中每四个小时运行一次，在生产环境中每两个小时运行一次。

The collections run a series of web requests and then execute tests to verify that things like status code, response times, and expected response schemas are correct.

集合运行一系列Web请求，然后执行测试以验证诸如状态码，响应时间和预期响应模式之类的内容是否正确。

With the system doing self-checks like this every couple of hours, we had the confidence that the system was operational at all times.

系统每隔几个小时就进行一次这样的自检，因此我们有信心该系统可以一直运行。

In the event tests do go wrong, we had reporting through a native integration with Datadog that would send the team a message in Slack for immediate action.

如果测试确实出错，我们通过与Datadog的本机集成进行报告，该报告将在Slack中向团队发送消息以立即采取行动。

If you build up these collections, not only will you get the confidence that your system is operational, but you also set yourself up for easy load testing.

如果您建立了这些集合，那么不仅可以使您确信系统可以运行，而且还可以进行轻松的负载测试。

尝试托管监控 (Try Managed Monitoring)

There are many native tools to monitor your serverless applications. You configure your dashboards, tests, alarms, notifications, limits, thresholds, etc., but you still have to maintain all the data you’re collecting.

有许多本机工具可监视无服务器应用程序。您可以配置仪表板，测试，警报，通知，限制，阈值等，但仍必须维护收集的所有数据。

An alternative to building it yourself is trying out a managed monitoring service like Datadog, Thundra, or Epsagon. Not only do services like this give you enhanced monitoring data, but they also give you observability of your assembled system by drawing out your architecture diagrams.

自己构建它的另一种方法是尝试使用托管的监视服务，例如Datadog ， Thundra或Epsagon 。这样的服务不仅可以为您提供增强的监视数据，还可以通过绘制架构图来使您观察组装后的系统。

Having visual architecture diagrams and infrastructure graphs help with issues like spotting bottlenecks and other throttling opportunities.

拥有可视化的架构图和基础结构图有助于解决诸如发现瓶颈和其他限制机会的问题。

In the event things do go wrong, a managed monitoring service will also provide enhanced debugging measures. The faster you can debug, the faster you can get your system operational again.

万一出现问题，托管的监视服务还将提供增强的调试措施。您可以调试得越快，就可以更快地使系统重新运行。

结论 (Conclusion)

Often thought of a little too late, monitoring a serverless application is a necessary part of application development.

通常认为有些太晚了，监视无服务器应用程序是应用程序开发的必要部分。

It provides insights into your system, aids in troubleshooting, and gives you confidence that your setup is correct.

它提供了对系统的见解，有助于进行故障排除，并使您确信设置正确。

Take the time to set yourself up with the tools to find issues before your customers do. Send all your errors to Dead Letter Queues. Trigger alarms when errors get there. Whether you set them up manually or with a managed service, your future self will thank you.

花一些时间来设置自己的工具，以便让客户先发现问题。将所有错误发送到“死信队列”。错误到达时触发警报。无论您是手动设置还是通过托管服务设置，将来的自我都会感激您。

Build automation wherever you can. Make your system reliable. Feel confident that you’ve built the right software.

尽可能构建自动化。使您的系统可靠。确信您已经构建了正确的软件。

Most importantly, let your developers focus on development. Having a fully monitored system allows them to put their efforts into what really counts: solving business problems.

最重要的是，让您的开发人员专注于开发。拥有受全面监控的系统使他们能够将自己的精力投入到真正重要的事情上：解决业务问题。

翻译自: https://medium.com/better-programming/the-challenges-of-stateless-architecture-and-how-to-monitor-your-serverless-application-94c0e8b8dd1

应用服务器架构

weixin_26714375

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
应用服务器架构_无状态架构的挑战以及如何监视无服务器应用程序

应用服务器架构So you decided to go serverless. Congrats! Welcome to the land of high availability, vertical and horizontal scalability, rapid development, and only paying for what you use. 因此，您决定不使用服务器。恭喜！ ...
复制链接

扫一扫