Building low-latency, event-driven applications

最新推荐文章于 2024-07-25 20:40:52 发布

李白的朋友高适

最新推荐文章于 2024-07-25 20:40:52 发布

阅读量417

点赞数 9

文章标签： aws

本文链接：https://blog.csdn.net/weixin_40272094/article/details/134719505

版权

Hello, this is full room. It makes me so happy to have you here. My name is Marcia Vis Salva and I'm here with Brenna Moore and we are going to be talking about building low latency event driven applications. I can fit more buzzwords in the title.

So yeah, so basically what we are going to be talking is about make the application faster. How many times you have heard this from your customers this week, this month, this year? Do they know what it means to make an application faster? Well, I hope today we can answer that question a little bit better and I and Bren, I can give you some kind of ideas and patterns that you can take and get your applications faster or maybe they are very fast already and you now can say to customers they are very fast.

So the first thing we are going to do is to talk about some important concepts. So we are all in the same page. Then we are going to talk about synchronous applications, asynchronous applications, multi region applications and that's only half of the presentation because then Brenna will come in and she will grab all that kind of patterns and things that I will show you and show you how they do it in second dinner in marble snap, very snappy game.

So first of all, I will introduce myself. I'm Maria Vis Salva, principal developer advocate for AWS. I have been doing several since 2016. So quite a long time and I'm very excited to be here with second dinner because before joining a les I have worked in the gaming industry for many years. So it was so thrilling when they told me I would be partnering with second dinner today to do this talk because they are basically doing everything I tried to do a million years ago.

Also, I'm the host of a youtube channel called Fubar Serve. So we will be talking a lot about patterns and things, but not cody, cody things. But don't worry all the cody, cody things and all the in depth demos will be provided in a resource at the end of the presentation. So you will get a link and you will find everything there. And if you don't find it, you can always reach out to me on social media. So feel free to ping me.

So let's talk about what is latency that's titled representation? Low latency. So let's get that into uh what is the topic? So basically latency is a measure of time, the time that we make a request from the client to get a response from the server. Very simple. If you are in the web application work, you might have used this time to first by term. And basically this measures how responsive the web server is. And basically again, here we are measuring in time in milliseconds the time from a http request in the client to get the first byte from the server to populate the page.

But the latency i like the most the concept is the perceived latency concept. And what is that? Well, in a normal latency concept, we are doing a request and then we get a response. For example, when we are making a web page or a web app with everything that the page needs to load. What if we have a request from the client that asked for something and the response is just the bare minimum to get the client or the customer or the user going so they can get entertained, fill the form or do whatever they need to do and in the background everything else load. And we see this all the time when we are working with modern web applications, but the word is not synchronous. The word is a synchronous. And that's something that our cto bernard vogel said last year. And what it means that the word is a synchronous that really nobody is waiting in the real world for things to happen. You go to a restaurant, you order a pizza and it's not like you're standing quiet like this while the waiter tells the chef to cook the pizza and you're like, i just need to move. I cannot breathe, i cannot do anything. No, usually you are looking at your phone. If we are with some people, you are looking at your phone and that's what you do. Well, the pizza comes in so you are doing things. You are waiting in a more active way and when the pizza come, it's put it on top of your table and you eat and you say, oh good, no, no much delay. So that's how we should do our applications and we will get into there in a moment.

If you're in the several ecosystem. Let's look also where you can find the latency because there's many places and in our architecture that can add latency to our uh our system. The first one is physics, the world is big. You cannot avoid that you deploy in ireland, your customers are in australia. That request will have to travel all around the ocean under the world goes up down. No matter how long it takes to respond, the response will have to do all the back way. Physics, you cannot find it or you can't, we'll look at that later.

Then we have the front door. Are you using api gateway? Are using functions? Where else are you using low balancers? Are you using appsync? What you're using? Then we have the computational layer are using lambda are using containers are using step functions, how we can optimize that and where we can find the latency. And finally, the integration services that we use, it's not the same to use dynamo with a single digit performance than to use. transcribe. That is an asynchronous service where you throw a video or an audio and it will take long depending on how video that audio is. So it's very important to understand what you're integrated to.

So let's start talking about the synchronous backends the request responses things and this is a 300 level session. So I'm pretty sure all of you have built like a million of these things in your life. This is the bread and butter of c api gateway, lambda dynamo. We'll see it in brenna talk. It's all built wild. This is what makes our back ends. So i will not show you how to do that. What i will show you is how you can run low test on this and see how we can optimize this bread and butter so we can get better results from it.

So to run low test, i pick a library called artillery io is an open source library that allows you to configure an api you can configure a scenario and then you can run low test. The cool thing with artillery is that you can run them locally in your computer or if you don't want to get your computer burning because you are doing a lot of uh request, you can run it on lambda or in fargate. And it's very, very simple to use.

I decided to run five different law test and they go from being very chill one request per 2nd, 2, 10 requests per 2nd 100 requests per 2nd, 500 requests per second. And you can see that this number four and five, they have the same amount of requests per second, but they last double the time because i have this hypothesis that usually these services perform way much better when you leave the load for a long time. So let's see what happens if i run this five flow tests in that simple backend application.

So here are the results we can see the p 95 and p 99 of running load those load test. So this lambda functions has 120 128 megabytes provision. So it's the default that you get when you are working with some or with cd k if you don't configure any lambda function.

So how we can optimize this because the latency is kind of all over the place here. So in order to optimize this, we need to understand a little bit of how lambda works. The first thing we want to understand is the invocation modes, there is free in lambda and i will cover two of them. The first one is the synchronous invocation mode and this is how api gateway connects to the lambda service. Api gateway connects directly to the execution environment. This means that the response comes inter immediately. As soon as the response is ready, it will come back to api gateway. This also means that the errors will come back in uh intermediately to api gateway and you can reach your concurrency limits and your account limits very easily if you are running at scale, when you are working with a synchronous invocation mode.

For example, with a bridge, what lambda does is put the messages in an internal queue, you don't have any control on that queue. But what that queue will do well, basically what the execution environment will do is pull messages from the queue when it's ready and process them. And basically as the messages are in a queue, then you can do the retries and the the letter cues and destinations and all this kind of really cool features of lambda that are available.

Another important concept to understand is the execution environment life cycle. So basically there are three steps in this life cycle initialization invoke and shut down. We don't care about shutdown because that's problem of the lambda service in the initialization phase is basically when the container or the little function is being provisioned into the infrastructure. And here we can do some optimizations. So we need to make sure that we pick the run time that we are having uh the right provisioning on your function, the speed and we will look at some of these things later. And then in the vocation mode, you can optimize your code. You can then put part of that code that instead of having it in the in the lambda handler, you can put it in the pre handler code. So it doesn't take time of your invocation and do some kind of optimizations as well.

So here you have places to improve your code. If you want to know how long your initialization last, you can get it from the metrics and also from x ray. And this is something very important to understand when you are doing optimization is from the initial point that you start, not all your lambda functions will have an initialization code. So whenever you run a function, if the function is already warm, this will not happen. But for some functions when they are uh being initialized, you will see the nasalization time in x ray. So this is your starting point if you want to improve the optimization, if you want to improve the initialization.

So let's look at some features that lambda has, that helps you to reduce latency. And these are things that can built into lambda that users might need to enable them or to configure. So it's something quite simple to use.

The first one, if you can choose grabbing to two will allow you to provision more memory for a cheaper price and more memory means more cpu and cheaper. Another one is provision concurrency. If you want to skip the initialization phase, every time you want to start a function, then you can have some warm functions waiting for your request. And this is something that many customers do. The third one, if you are using java enable snaps start, that will make again the initialization phase shorter and faster. And finally my favorite one. And it's the simplest thing and it has been there since the day that lambda was created is to provision the right amount of memory to your function. And you could not believe how many customers don't even think about this because they start thinking, oh if i'm provision more memory, it will be more expensive. And in some cases, it's not because lambda is built on the duration of the lambda function. So when we provision more memory, we also automatically get more cpu they are connected these two values and you will get more cpu your function will run faster and you will get bill less.

So if you want to understand what is the optimal amount of memory you can provision to your function, you should check this library. Lambda power tuning, it's an open source library, all the links at the end. So you don't need to uh start searching for them. I will, i will share them with you. This is an open source library that will basically allocate different uh memories uh to your function. And it will tell you which one is the one that performs the best in time and which one is the one that performs the best in cost. So then you can pick the right uh memory allocation to your function and don't need to worry about like, oh i over provision my memory or i i under provision my memory.

So now i apply this to the previous example. And this is the comparison between running the 128 megabytes lambda function with the 512 megabyte lambda function. That is what the power tuning told me. You can see that in the case test and forward the latency was going all over the places. Now it was pulled down quite a lot, but we are not there yet optimizing. We are going to do more to it.

Another thing that you can do is not to wait for the whole response to be ready to return it to your client. You can basically return the response in chunks. So for example, if you are working with bedrock bedrock models, take quite a while to return you an answer, but you can use the streaming functionality and get that response in little chunks. So your customer, your client gets entertained doing whatever they need with the little pieces of the answer of your, i don't know, getting a summary for a document or whatever you're using bedrock for. And there is many examples on how you do this.

So if you are using node, it's very simple to start streaming your responses, just wrap your function around this a les lambda stream file response. And what this will bring it will bring a new object into the handler uh method in the function that is response stream. If you write into that response stream, then boom, that will go to the client. And it's as simple as that in the links. I will leave you at the end, you will find free demos on how to build this uh for different kind of use case bed drugs. And the pdf done when i will show you now if you're using functions or else it's as simple as enabling response stream in the configuration of the function or l and then this is the result. This is a simple, simple demo of downloading a pdf using lambda. So i'm calling the, the um endpoint, the lambda functions ll and the demo will go very fast because it's a small pdf. But you can see that it's coming in little chunks and i already have it in my download folder. So good. That's the simplest way to do a backend.

But what if i tell you that there is even a more simple way or a simpler way? What if i remove lambda if you don't have much business logic in your lambda function? My question is why you have a lambda function at all, you can use something called velocity language template that can connect api gateway to different services. In this case, dynamo. So if you're just doing crude operations, read, writes, updates and delete into dynamo and you don't have very heavy uh business logic, then you can really connect a p a gateway to dynamo. Let's see how this performs. It's quite stable and the lat is very small.

So sometimes you really need to think, do you need a computational service? Api gateway provides a lot of validations in the request and the response that playing with bt l and this can bring you that result. But what if you have a little bit of business logic? What if you need to do i ds or you need to do some data transformation play a little bit with a raise or? But quite little. What if use step functions as your computational lay uh layer for your back end? What if instead of having lambda, you have a step functions that state machine you can basically do without invoking lambda because you can invoke lambda from the state machine. But without invoking lambda, you can do uh with intrinsic functions, data manipulation, array manipulation, string manipulation, encoding, decoding u id eu i creation, you can do so many, many things that basically a lot of the use cases that we tend to build simple backends for can be resolved with the functions. And in addition, step functions connect to over 220 aws services just through the aws sdk. And let's see how this performs in those loads tests.

So here we have p 95 and p 99 using that example step functions and uh api gateway and dynamo. So you can see again, the more load we put, the faster it performs and the difference between the test five and test one is not that big, it's quite stable.

So now you might be wondering what about caching can caching help me out? Because when we cash, we don't go to our computational layer. Usually it's the front door, the api gateway or whatever you're using that will respond with whatever has stored in the cash. And yes, caching is a great solution. And if you can, you should enable it. And if you're using api gateway, it's as simple as saying, please enable cash, give me a size and you're ready to go. And what happens when we enable cash in our example. The simplest backend. Well, here you see it test four and three and four that were quite all over the place, the latency drop a lot. So for other more performance tests, the cash doesn't do much. But when the latency is uh quite high here, you can see a big difference. So those are things that you might want to have in mind when you are building your synchronous application.

But let's go to my favorite type of applications that is the real time backend or as i like to call them a synchronous communication. So we talk about the synchronous communication about waiting your food in the restaurant. But what exactly it means usually it means that we have a client that do a request to a service and there is not a response that is coming back. So we as developers, we need to figure out a way for our client to get that response.

So the first option we have is the child on the back of the car or polling mommy? are we there yet? mommy? are we there yet? mommy? are we there yet? the same will do our client have an api say, can you do this thing? the vaan will say, ok, i can do this thing and then have another api hey, are you done? hey, this is done. hey, is the shop done? have you finished? and then when the data is done, it will return the data. this is a very simple solution. i don't know how many i have implemented in my life. maybe you have done this, but it's very simple to do. the problem here is that you're doing a lot of empty requests. and if you're in a mobile uh application, you are basically wasting broadband, you're wasting battery of your clients. also, if you have a lot of clients doing this all the time to your back end, you are getting a lot of empty calls and you will need to manage that. also the time from the moment that your data is ready to, the data arrives to the client can be very variable because you need to define how long between polls if it's 30 seconds, one minute, five minutes. and what if the data is ready before?

so a more optimal way to do this is using web sockets. the thing with web sockets that i didn't tell you with. the previous example is that it opened a redirection connection. in the previous example, we are using htp and rest. and usually what happens is that the back end cannot speak to the client. it's always the client, the one that initiates the communication. that's why we end up with this amount of checks from the client. are we there? yet? in the case of web sockets, we have a bidirectional communication. and here basically the the client will say something to the back end. but the backend when they have the data ready can also talk back to the client here. we reduce the amount of empty calls and we make it closer to real time because now when the data is read, it will go directly to the client. the caveat here. that is a little bit harder to implement but don't panic. i will look, i will show you a couple of ways that you can do it if you're in aws.

the first one, if you're using api gateway, you can enable web sockets. it's a simple, just configure that you're using web sockets

And what you need to do is to implement free methods, the connect the disconnect and the posting of messages you can implement that with lambda functions. And also you will need to have a table for all your connections. So whenever you connect or disconnect, you will need to handle that yourself.

If you're using GraphQL, you can take advantage of AppSync. So AppSync is our hosted and managed GraphQL implementation that provides basically queries, mutations and subscriptions. That is what the schema of, of GraphQL application provides.

Uh and for example, when the client writes into the service or into AppSync, in this case, we call that a mutation and you can see that when it's working with AppSync happens with HTTP. And also the client can subscribe to different operations in this place that you are seeing that in the screen if they are subscribing to a new message. So when there is a new message in the data sources, then AppSync will send a notification to the client through web sockets.

So this pattern of using HTTP for writes and web sockets for queries or to read, it's something you need to pin because we are coming back to that in a moment. It's a very good pattern.

The third option and my favorite one to handle web sockets in web applications or in applications is to use AWS IoT Core. And many of you will be like, well, Marcia, I'm doing a web app. I'm not doing a IoT thing here. I don't have a sensor. I don't have a light. I'm not working with a phone. What I tell you is a device in the IoT world. It can be anything, it can be your front end, it can be your mobile app, it can be your web app. It doesn't matter.

So you define what it is an IoT Core allow us to have this bidirectional connection with web sockets where devices can publish and subscribe to different topics. So basically now imagine that devices are front end applications or client applications or mobile applications and they subscribe to a specific topic in IoT Core. Then a message arrives to that topic and that message will be immediately sent to the front end applications that subscribe to it. And the cool thing of IoT Core is that it can handle scale because it's IoT service, it's not afraid of scaling up. So you can find out these messages to thousands of publishers. And this is very, very easy. IoT Core will take care of the authentication, the handshake, the connections and everything in between. So it's a very, very simple solution for developers from the front end from your client.

You can use AWS SDK and use the MQTT client to do the connection, send messages or handle errors. So if you're here at re:Invent, you might have seen this Serverless Video. Serverless Video uses IoT Core to do a lot of the real time things in the app.

So Serverless Video is a little demo that we build in the Developer Advocacy team that will allow you to live stream from your phone into whatever world is watching and then you can watch all the videos on demand and then look a little bit at the architecture.

So here we have our client is a web app very simple because it's a demo. Then we have an access layer and you can see here all the different kinds of protocols that we are using in our access layer. We are going to look at two more in detail. Then we have microservices and all these microservices, talk to each other through events. So this is an event driven architecture. There is no private APIs, there is nothing like that. And then this is a service that we want to look now. That is the publisher service. That is the one that is keeping the real time.

So what happens? I start a new broadcast. I start a new live stream. I go to my app, I connect several less video, I start new broadcast and hello word. And that will do REST API into the video manager service, create the broadcast and start streaming with OBS and do that, then events will be sent to EventBridge that a new event has come in. A new broadcast is in the stream. Boom. That event comes into the publisher service and will notify everybody that is connected to Serverless Video right away and it will refresh their page automatically showing that there is a new live stream.

So all of this happens through events, I start a live stream in five seconds. You all will see that appearing in your screen. And this is done with a pattern that is called CQRS, Command Query Responsibility Segregation. That is what I show you with AppSync. You can see here that the client makes a command, makes a change, makes a write into the data, modifying service for REST. Then events are flowing into the system and the query is done through web sockets. And this is a very powerful pattern that can help you to scale and can help you to extend your application.

So let's go one more step in. Now events are coming into our publisher service. In the case of Serverless Video, we have defined two different topics, one specific for the user. So Marcia will have its own topic and every one of you that register to the Serverless Video and there will be a global topic where everybody subscribe.

So what happens when Marcia starts a new broadcast? A new message will come to the global topic and then all of you will receive the message in your application and will be able to see that broadcast without refreshing the page. When the video finish processing, what will happen is me Marcia in my own topic will receive a notification that my video is ready to watch on demand and I will be able to see it.

So in this way, you can see how different users can have very specific, very specific messages and this is exactly how we do it. Also in Serverless Espresso. Whenever you go to get your coffee in the expo, you can see Serverless Video and Serverless Espresso in the expo. There is different topics for different things. So when you get in your coffee and all the notifications you are getting are inside that particular topic.

So let's go to the last bit of my presentation before I hand it to Rena that is multi-region. So we talk about how big the world it is and why bring multi-region to the picture. Because usually when we talk in AWS about multi-region, we talk about making multi-region for business continuity and disaster recovery. And also if customers have geographically distributed customers around the world. And this is because again, that thing, the world is big.

If we deploy in Ireland and we have customers in Australia, we cannot improve that response and request unless we have another region. I don't know another deployment in Singapore. So that those Australian customers can connect to Singapore and they don't need to travel all around the world.

So when we look at multi-region, I want to show you two different features. There is many, I have done a whole 60 minute session on CDN and multi-region last year. So you can watch it in YouTube. But there is two features I want to talk about.

The first one is latency based routing. So this is a feature that Route 53 has that allows you to deploy in as many regions as you want. In this example, there is 2 in Ireland, 1 in Virginia, same stack deploy in different places. And then we place a domain in front that is using latency based routing. And whenever a customer connects, for example, from the US, it will be directed to Virginia because it's the one that has the least amount of latency for that particular customer. So this is a very important feature that is super useful when you're working in a multi-region environment.

The second feature I want to share with you that I find that it is very useful when you're working in a multi-region environment is DynamoDB Global Tables. That is something you can enable in your DynamoDB tables that makes your DynamoDB tables a multi-primary multi-region database. Basically you can write in any region and everything will get replicated in the background. So this is very useful to keep your data in sync all around the world.

So let's analyze the latency of this type of application. So here we have the same stack deployed in Virginia and in Ireland. And we have this customer that is in the US and we go through the Route 53 with latency based routing. So what will happen here is the same 5 tests I showed you at the beginning. The orange one is when the customer connects using the domain, meaning that they will go to the closest region. The yellow one is when the customer in this case is a customer from Europe connects to the API Gateway address in US. You can see that there is 250 milliseconds in all the tests that are just because the world is big. So that's extra latency that by just deploying in as many regions as you can, you can remove.

So now I would like to give the stage to Brenna because she will show you how all these patterns apply in real life in an awesome game as Marvel Snap. So here you have Brenna.

Brenna: "Lambda keeps a cache in the background while the lambda container is suspended. This works really well for us. We managed to cut out a whole chunk of things. We had to deploy things that were growing, you know, ever, ever, ever growing stuff that we really didn't want to keep supporting and put it in one single function. Our overall lambda concurrency dropped with this because if you have APIs that are we have some high touch APIs, right? And you have some low touch routes in the same API, if you had those as two different lambda functions, two different containers, one of them would have fewer cold starts. One of them would have more now that you have all of these in one place, everything that's keeping stuff warm for high touch APIs is now keeping stuff warm for low touch APIs.

What does this look like from the development side, for .NET developers, writing a ASP anybody who's used to seeing the code on the bottom, right? You just keep writing the code on the bottom as opposed to what the lambda would look like. You know, if you were writing a regular handler on top. So those of you who might be familiar with .NET or at least are developers who are familiar with .NET. This is actually closer to something that they're more familiar with. Anyway, it's not a new pattern. They have to learn, it's an old pattern, they get to go back to.

So what does this look like when you start putting everything together global versioned ASP based services? Oh the other thing about the ASP services ASP is meant to be a long running process, you can run it locally, you ever try and run a lambda functions locally. Yeah. Ok. So now this, this is the actual pattern that we use for all of our APIs, right? We have multiple regions, we have multiple versions, everything sits nice. And in parallel as Marsha mentioned though, just a single API or HTTP API is not gonna cut it in all the scenarios. In our case, we have matchmaking, right? Gameplay things, we need to be snappy web sockets service needs to tell things to the client and conveniently we can use the same approach API Gateway is still the thing handling the connection, right? It's either handling the HTTP request or it's handling the incoming web socket request. We can just lie a little bit to the ASP handler and tell it. But hey, this looks like an HTTP request, doesn't it? It's an API Gateway proxy model. That's fine. Now, we can write the same code for our HTTP APIs that we can write for our web. So at APIs now, ASP isn't normally built to do this when you're running webs socket, API s and so running this locally does require a little bit of extra work. But to us, it's definitely worth it. The trade off makes everything a lot more similar.

Uh for those of you who are more interested in the numbers, here's some stats from one week from one of our regions, highs and lows. Those API Gateway metrics are summed, HTTP and web socket and those lambda invocations are lambda invocations. Marsh's example with the artillery running the load test, right? 500 a second. This is 76.66666 660 repeating 100 a second. That's 7600 requests a second for one reason.

Now, earlier, we said that right, client to service or even service to client isn't the only thing that we need to do. Sometimes back end needs to talk to each other as well. And so there are message patterns when a client connects to a matchmaker and says, hey, i want to play a game. Here's my cards. We go. Cool. I don't trust you. Are those really your cards? How do i know we have to verify that? Does matchmaker know what cards you have? No progression, knows what cards you have. We're not blocking the client because we have the web socket connection. We can say, ok, cool. I acknowledge this. Right. And we're waiting, the client's gonna wait until we send something back. Hey, here's your game, right? You've made it through matchmaking. But we also don't want to block that matchmaker lambda handler because that would just hold that open. And so what we want to do is send a message through an SNS topic route stuff through cues. We have this pattern in a bunch of places. We dedicate a topic and we dedicate cues. We include message attributes to make sure that we're sending, you know, we're subscribing to the message going the correct direction and just like with the API Gateway, we're duplicating these for every version, new version, new topic, new version, new q. And the reason that we want to do that is because if we're changing the message attributes, we don't want to worry about. Oh, is the previous version subscription something different? Are we gonna have to worry about how this is, you know, affects a version that was deployed previously?

An example of kind of what this looks like on the message attribute side of things. This is a little bit of a mock up of what it looks like to say. Hey, did, did they own all the cards? Absolutely. They owned all the cards. But where is this going? Oh, it's going to us. S two. Ok, cool. We're already in a specific version, right? So we don't have to worry about that part. We do occasionally need to do this globally or at least cross region. This particular message doesn't actually leave the region. But we use this pattern on all of our messages so that everything is consistent, keeping consistent code and stuff like this is super important to us just to keep the cognitive load low for developers.

This is the reply. The request would include sort of the return to sender region address that we might need. Sometimes we do need to do this across region. This is an example of sort of a end of game message, right? We had two players, they were in different regions. We need to route that back to their home regions so that we can say, hey, you got this from winning that game and you got this, you know, from losing that game or vice versa, maybe it was a draw.

Um we keep the same pattern, right? This is effectively just a fan out. And the great thing here from our perspective is that the code that's sending, this doesn't need to worry about changing the content of the message, it doesn't need to worry about changing what it is or sending two messages, we just attach a little bit to the attributes and the infrastructure handles the rest.

So, SNSSQS, this works super great. As long as you can control all of the attributes on the message. What happens when you use a managed service that doesn't include attributes on messages? Well, we ran into this with GameLift. So GameLift is an AWF service. It includes a thing called Flex Match, which is a managed matchmaker. We put people in a pool, it says, hey, here's a pair, here's a pair, here's a pair.

Um this is great except again, we don't, we don't control what's coming out of that service. We don't own it. Uh we certainly don't want to build a matchmaker. Grabbing one off the shelf is a lot easier. So this is kind of an example of the message that comes out of Flex Match, right? The player information is good, but that player ID is internal to Flex Match. We didn't set that, that custom event data is a thing that we set. It's a version, but it's the same across all messages that came out of that version. The match ID doesn't really tell us anything but the ticket ID is actually a thing that we can add. It's a string that we control. It's something that we can modify. And so we can just put the regions on there which is great. Now, we have regionality tied to player, but it's still not in an attribute.

Now, those of you who are familiar with SNS may know that you can now do message body based routing in SNS at the time that this was originally architected, that feature was not available. And so EventBridge was really the only solution that we could use to parse things out of the body of a message and then route it accordingly. And that looks like this. These are two EventBridge rules for picking the prefix out and then effectively forwarding or fanning out those messages depending on if they're in the same region or if they're in a different region.

So what that looks like in practice is that we get some information from GameLift, it goes through the event bus. We have these two rules, one or both of which get triggered that forwards it to another bus. If you're going across region or across account, you're limited to targeting another bus. We can't go straight to a queue. Uh we don't change the message. We just try and leave everything as the default so that we are not modifying messages in flight. We try not to do that with the infrastructure. Uh if you don't modify the message and you don't route it to a custom queue and you do route it back to the default queue. And that thing runs the rules that then fan it out. You can absolutely just DDoS yourself and send the same messages back and forth over and over and I promise you, I didn't do that when I was first working on this infrastructure. I absolutely did that.

Um so we send it to a second bus that has different rules, right? We're only forwarding it to the queue at that point. Event. Buses are one of the few pieces of infrastructure that we do not duplicate versions for. So we only have one bus for all the versions. We do duplicate rules and targets across different versions.

All right. Scale of SNS sorry scale of SQS and events. Uh the reason that the event scale is so much higher than the q scale is because there's other stuff that we run through EventBridge that i didn't filter out when i built the slide. Sorry, cool.

Almost last section, we're gonna talk a little bit about state finding a marvel character that's based on a database is hard. So you get these cool blueprint variants which i thought were pretty close.

Now, Marsha was talking about global Dynamo tables. I'm going to tell you a little bit more about regional tables and then we'll talk about global tables like i just said, all of the things that we do, we rely on event and message systems to move stuff across regions. So most of the data that we have in any given table is region specific. We don't need to replicate that stuff. It only belongs in one place. A game is hosted in a single region and two players connect to it once that game is over. If one of those players was in another region, we admit that message, we make sure that they can get a copy of it.

The one place where we do need to keep this stuff together is uh account. I told you we would come back to latency based routing. You can see that we have two different routes of the same uh domain there. And this is account and registration, right? When you make an account, we wanna make sure that we're tying you to a close in region and then we can always reference it when it goes back there. But if you might be traveling maybe for like a conference or something and you're suddenly close to us west two instead of us east one, then we might need to push you back, which is why we have those region based routes specifically putting something like a look up in a global table works really well for us replicating a game to six regions where players are only in two. Doesn't make any sense. Replicating a tiny string of what region and a couple other things super easy, makes a lot of sense for us.

All right. Last section just got to pull some really big cards. We're gonna talk a little about scale here. Is that summary of the table of kind of the rest of the stuff that i was showing you earlier again, this is one week and one region, uh those minimums sort of drift with the day to day right over the course of the day of the course of the different regions to follow the sun model. The, the valleys are at different times. Those peaks are synchronized. That's when we release a new card or a new shop or something like that. And so everyone logs in all at once and those peaks are the same in all regions at the same time, stress testing.

This is how we got there. Marsha mentioned artillery earlier, which is a super great tech for something where you are using the same stuff over and over again. It doesn't work for us. Games are state full. We can't make the same turn over and over again. Otherwise we're not exercising the right code path, making an invalid turn doesn't write an invalid record to the database. It just says you can't do that. So we needed a headless client. We needed something that we could actually have play. The game, have joined. Matchmaking, have go through all the the state machine flows and in order to do that, we also need to distribute it. So there's a custom stress test loader that we ended up writing for that link is here in the qr code. I think it's also in Marsha's docs at the end that she'll tell you about uh if this is useful to anyone. Great. This is not the headless client. You cannot bot the game with this thing. This is just a stress test distribution tool. Don't get your hopes up.

So after stress testing, we knew what we were getting into. Right launch day. We get through, we're rotating through an on call room. We're looking at dashboards, we're watching graphs. We're waiting for alarms to go off, 11, alarm goes off one. And this was actually a quota that we had already said we needed to increase. And just in this, you know, the stream of tickets that we had to put in to get quotas raised before launch. This one got missed. We were already on call with AWS for an infrastructure event management event. So we were already sitting in a chat room with them, sitting on the phone with them. They changed that throttle. Like as soon as we saw a blip on the alarm players never noticed producer on call for the night, glued to his phone playing snap because no one got paged launch night zero. We made it months before an infrastructure event where we actually had to start making changes to the infrastructure. And this is because we had more traffic than we expected at that time. So our scaling provision concurrency in Dynamo or in Lambda wasn't set up for that load at that time. This wasn't great from the client perspective. We did see throttles that kick airs all the way back through APIs all the way to the client. But as soon as we could get scaling under control, as soon as we could get ahead of it, as soon as we could get everything back to normal, everything's green.

So this is actually all i got for you today. This is a really fast review of all of the stuff that Marsha covered. Um i'm gonna turn it back over to Marsha. Thank you very much for listening and i hope you have a great rest of your convention.

Marsha: So let's wrap up and i can give you the the code. So first question for all of you, do you really need a synchronous back end as Brenna show us, we sometimes do and when we need to, it's very important to analyze our compute. Are we using lambda? Can we optimize that? Can we avoid lambda at all and just connect directly from API Gateway to the integration service? Can we use something like Step Functions and embrace those intrinsic functions that Step Function provides and all those integrations? And if possible, build an asynchronous applications that will help you to with EventBridge uh event driven architectures to be responsive to real time to make that feeling of like that perceived latency, very small to your users, remember event driven architectures, secure patterns. And if you really need to have low latency go multi region.

So this is the only picture that you should take besides us doing like different places. Uh get that. Those are the resources there. You will find like 20 different videos on how to build everything i show you all the links from Brenna are as well there. But if for any reason you are not a Node developer. Don't worry, i got you covered all the demos i have are in Node but the patterns and the interesting things you can apply to any language. But in here you can find almost every pattern that is basic like simple things, patterns. That's what it's called. There's over 600 of them for all kind of infrastructure as code, all kind of languages. So this one is a place you might bookmark like right now and that's it. For us, we are uh reaching the end.