Scaling on AWS for the first 10 million users

Welcome to the presentation.

I want to start by saying no one builds an application and thinks to themselves, "Wow, I hope no one uses it. I hope I have no users on my platform and I hope it's just me." Yeah, people build applications with the intention to be useful. You have customer sign ups, you have internal processes that you're improving on. You have external processes. Uh you, you build it with an intention and you build it to scale.

So at AWS and at Amazon, we take scale very seriously here. And so today we're gonna be walking you through the best practices, giving you the tools and the resources so your applications can be future proof and wherever you are in your scale journey, whether you are a start up founder building the next big generative AI scaling to 10,000 users overnight or you're an enterprise institution and you're building mission critical applications that if your app goes down, people will notice we're here to set you up for success.

My name is Sky Hart and I am a Manager for Solutions Architecture at Amazon Web Services. I will be joined halfway through the presentation by my esteemed colleague Chris Munz for an interpretive dance on scalability. I'm kidding. He's gonna, he's gonna beat me up after that for that. But uh Chris Munz is a very tenured Amazonian. He's been around for over 12 years and he is the Startup Lead and Tech Advisor for our entire organization.

So let's begin. Let's pretend we're not sitting in a giant ballroom, but you and I are sitting in a conference room together and our developer team comes in and says "I have an idea for a new application." Ok. As a leader, you might be asking yourselves a couple of questions, three questions.

One, where do I start? Amazon has over 200 services that are available to you. So how do I choose the right patterns for my architecture? How do I get started?

What is the return on investment if I build this? What if I don't build this and asking yourself these business questions are working backwards from your needs and requirements?

The second question is ok, great. I have this application. How do I build it for scale? Let's talk about everyone's favorite topic, risk mitigation. No, I'm kidding. But how do I make sure it's resilient enough that it can stand the test of time and when I scale up to 100,000 users, things aren't going to crash, which leads me to my last and final and most important thing at the center of it all is who are our users? And how do we make sure they're happy?

How many of you have opened up an application and it crashes and you never go back on. Yeah. And then you're like, oh latency, I don't want to use this anymore. So how do we make sure that our systems work? And they're, they're always up and that our users are happy and they leave us that five star review. At the end of the day.

Throughout this presentation, we're going to be talking about this concept of an app. You can define an app in a lot of different ways. For the purpose of this presentation. we define an app as the full stack that includes the front end, the back end and the data storage.

We have our user interface layer. When you log on an app application, you can see the front end, we might have our business logic layer in our compute and our engine behind the scenes. And then of course, our data storage. How do we leverage the existing data to go back into our business?

Another groundwork I want to lay before we get into it is acknowledging how much our world is shifting. One is uh developers experience these these modern frameworks that they're utilizing, we need to be accommodating, we need to change with the times.

The second one that I want to mention too is the move to serverless technologies. We're going to keep saying that throughout, but it's taking all of that heavy lifting and that management of underlying infrastructure onto AWS. So you can spend more time innovating.

And the last thing I want to mention is rapid scale. What do we mean by that? It means that we have more data than we've ever seen before. And with that, we become this expectation and how to harness the power of this data and close to real time these nanos applications on a global scale.

Let me give you a quote that me really famous, really infamous quote. "No architecture is designed for high scalability on day one." But we'll certainly try.

I like to use this analogy in, in terms of like pretend I'm a contractor, building a building, right? I'm not handy. So don't quote me on any that. But the most important thing is going to be laying the foundation before you can put the roof on before you can put the windows on. You make sure that your foundation is going to be able to hold the test of time in terms of weather conditions, you know, different systems and things like that, it needs to be ready.

Now, I mention a mental model that we like to use is this virtuous cycle of building. If you might see this in MVP or prototyping these processes, right? Then the users in the middle there you start building, you get your development teams, the developers come and they build this application. That's great. But we need to understand what we're building. We need to use tools that are monitoring and observability to be able to see what's happening proactively. It's my app. Does it have good latency? What about the compute? What about the utilization? Is it cost effective? All of these questions, we need to be measuring, to be proactive and get in front of it.

Then we learn from this, we learn from this feedback loop from our users from our customers and, and things like that and we build it again. So wherever you are in your scaling journey, remember that this, there's a constant iteration and process improvement to make your application more efficient and more cost effective. Start from day one.

We're in the conference room, right? We're about to build this application we on board users, you know, above one beta testers, let's get them utilizing this application. Now, a very modern way of approaching this is we're going to see this throughout. is you split your application from the front end and the back end it's called decoupling. You might have heard this buzzword and we're gonna kinda dive in a little bit deeper in a second.

The reason is is we have this architecture in terms of traditional front end hosting. So you're gonna see a lot of times that it really common people might have EC2 on a single host, they might attach an OCA group to that ELB then be sending it to CDN and, and this is a very common pattern, but there's some limitations and some pain points that you might be experienced.

One is what are you managing? You have to manage a lot of the underlying infrastructure, a lot of the patches and the bug fixes and things like that. It's not a managed service, right? And there's just some limitations in terms of having everything on one single host, your fall over your redundancy points. If everything is coupled together, there's nowhere you can go when, when disaster occurs.

So we move towards this modern front end in this way of thinking which is splitting this and and the benefits here to this modern front end is i i said in the beginning, this developer expectation uh in moving towards these modern frameworks and, and built in scale and, and performance and all of these bells and whistles. And I'm gonna go into Amplify, which is a really powerful technology here too in a second.

So Amplify hosting is one of the simplest technologies you can use in our stack. I used it with a bunch of customers prior when I was in SA and how it works is you simply connect up your repository. So whether you're most familiar with GitHub, things like that, you configure the build settings, which is step two. So that's users governance, you know, all your configurations and how you'd like it and then you deploy your app and it's really, really that easy and it's serverless, you don't have to manage that underlying infrastructure and the back end and how all of our AWS services work is we take feedback from developers.

So what do you see really commonly, what are some tools and things that you need? And with that, we actually came up with these Amplify hosting features. So atomic deployments feature branch deployments, all of these are built in. And remember when I said earlier in the presentation that that this ever growing scale, we want these applications to be globally available.

So I don't want to be configuring these two instances across a bunch of different AZs. Instead we have a front end that's globally available and it's built on this idea of a CDN. So it's built on CloudFront for those of you who are not familiar with the different front end frameworks.

There's three that Amplify supports client side rendering. That's essentially like the best way i can describe it is you kind of have a container that runs in the client browser. The second way of hosting that's really common is the server side rendering. So that's gonna be the opposite. It's actually put the load back on the servers that's popular for Next and Gaspy and that kind of thing.

And then the third framework that Amplify supports is a static site generator. You might see this as like people host static websites on S3 and, and some people still do that i talk to, but a lot of customers prefer to do it on Amplify just because it has the full package and of, of ease of deployment there.

So you might say Sky, what about the back end? You know, we have the front end. What about the back end? And how do I select? Compute? There are three buckets and I want to go for computer engines that I'm gonna dive into a little bit deeper here.

People know Amazon EC2 was a service that was created in 2006. You're managing a lot, you have the most control over your configurations and then containers became really, really popular and we rolled out a few different options here. So ECS EKS for those Kernes fans out there and then AWS Fargate, which is our server container option and the third bucket. Of course, everyone's best friend. AWS Lambda serverless technologies for compute there.

If I say that's a lot, what do I do? Where do I start? How do I evaluate compute options? So I thought I said before is a lot of times people just say I wanna run EC2. I wanna do this on a single host

"but the limitations there is, there's no f over, there's no redundancy and if you have this architecture right now, that's ok. um so we, it's something we see really commonly, but it's kind of like putting all of your eggs in one basket. and so if, if something fails, the whole system goes down, your application goes down, users aren't happy, we can go absolutely far with this. but there's a different pattern that we want to consider. and you might see that this is in terms of both back end and then also data tearing where people host their self managed, you know, their databases on ec2. and it's kind of like an anti pattern that we see. so we wanna leverage as much managed services as we possibly can for just getting our application off the ground.

another way to look at this is this matrix and this is a really good slide. i'm gonna pause here so you can take a picture, but i'm gonna talk you through this a little bit. so i i want to start on the chart on the left, you'll see the more opinionated and the less opinionated i like uh the hyperbolizing these services, right? because the less opinionated is gonna be amazon ac two. and what we really mean by that is you have the most control in terms of configuration, but with great power comes great responsibility. so then you have to configure control, debug all of these things and then make sure that your computes are right size, right? so that's another factor too is, is from a cost perspective. you want to make sure that you're not scaling out to oblivion and then you're, you're only utilizing what you really need on the opposite side of the spectrum is aws lambda. and you see the customer manager side, it's really just application code, which is very easy.

so kind of when you're launching it, you want to think of this in the spectrum, it really what's purpose built for both your workload and then your experiencing comfort level. after you select your compute options, you're gonna say, how do i expose my application to the internet? so there's three services i'm gonna cover briefly for you. api gateway is our, our service. it's very purposeful. i think for rest api s. the second one to note is application load balancer really acts as kind of like a layer seven proxy. uh it doesn't have all the bells and whistles of api gateway, but it's a really powerful service in a lot of different ways. and then the third one is aws appsync for those of you familiar with graphql. it's really popular for our customers hosting that.

i have a cheat sheet for you picking an api front and there's a lot of overlap here. and if you have any questions like find chris and i in the hall afterwards because uh but here's a good cheat sheet for you to go off of uh in terms of web sockets, you know, how ma how many requests are you getting and, and considerations there. and then after i go through this cheat sheet, i'm actually going to recommend something else and i'll tell you why in a second, i see photos still up app runner runner is the best employee you will ever have. and i say this because it provides the entire full stack of what you would need to, to expose these api s. so you'll see that it's purpose built with far auto scaling. el bc are built into the service and i work with start ups a lot and in the fastest way to deploy an application is utilizing this app runner. you don't have to, to configure all the different components and things like that. it will get you off the ground and running really, really quickly.

let's pull it all together and i didn't go through the database yet, but we have amplify hosting an app runner. at this point. i am managing none of the underlying infrastructure which makes me happy. i get to spend more time innovating and focusing on the things that i really care about there.

so next question to know to no sql or not to no sql, say that five times fast. this is a question i get all the time. i'm also a data engineer by trade. uh and there's a, this is a controversial topic and, and for this instance, i'm gonna actually recommend you start with sql databases and i'm going to tell you why, why start with sql. it's a very popular ecosystem where you can get a lot of support from other individuals uh post graphs in my sql and, and the things that you're familiar with also a lot of app applications actually follow this relational data structure.

um so you might find me in the hall, you said wait sky, i have massive amounts of data. that's me. you're building a time series application, you're building a trading application, you're building something with massive scale and petabytes in a couple of weeks. ok, fine. you might need no sql or some other purpose-built database. what i mean by purpose built databases is we have a suite of databases and i recommend checking out one of those tracks in terms of time series graph, you kind of name it, you know, documentdb have it other use cases for no sequel. if you really don't have those, those relational data sets or that sort of structure and schema again, if you have those use cases that have petabytes of scale really quickly, massive amounts that then you want to consider starting with the no sql database, but i'm guessing this isn't most of you in the room. so we're going to start with sql databases specifically amazon aurora. and the call out here is like i said is post graphs in my sequel. if you're hosting those on ec2, i really recommend you try out aurora. why? because it automatically scales there's a lot of different performance availability durability and it does the work for you. it does that heavy lifting a ws manages all of that. it's very easy to spin up in a matter of minutes. i'm an old database guy and i came from that world and aurora makes it really easy. we're constantly at aws coming out with more and more iterations for their services. so you'll see. amazon aurora, serverless v two.

the really important points to mention here is that decoupling of the compute and storage layer and why that's important when you're scaling is for costs really because what this does for you is it scales out to peak workload. so like let's say you have a huge black friday scale scale, it will scale out to meet those peak demands and then it will scale back in by itself. so you're only paying for what you need and what your users need, which is very compelling.

so i'm going to bring this all back together. and at this point, we're only using service by hosting a runner in aurora service to uh i can i i constantly think this tco this cost of ownership and and being able to leverage these amount of services to scale those peak demands, high availability if you're building a global application and it needs to, a lot of these features are already built into the services as soon as you stand them up. wow, we did it with that. we hit 100 users. so i'm probably having a mini party. i'm really excited. i'm i'm getting the energy up and then i hit 1000 users and we're starting to get a lot of attention on that application, right? and then i hit 10,000 users but stuff starts breaking and all of a sudden my systems go down, my database is having too many rights. it's handling too, very different and things are breaking.

so i hate to give it to chris when things are breaking, but i would really love your help here. great, thanks. thanks for setting everything up for us here. so sky walk us through the initial architecture that we have some of the initial patterns of things that we'd be thinking about. and again, the key that we want to start off with here when you're building a new application is thinking about how we can encourage you to do less with your infrastructure by spending more time on your business application, making use of managed services and service here at aws. again, when we start to reach this point of scale, maybe we'll say hypothetically again, in the tens of thousands of users, we potentially start to see some things go wrong, right? and these things will typically lend themselves to very common patterns. again here at aws, we've been working with start ups and businesses for a long time building. so we see a lot of the same kind of issues trickle in around sometimes the the same points in an application scale, right?

so one of the things that will actually see it becomes a pain point is that the business itself has grown, the product has grown, the features and capabilities have grown. and so you start to run into where the different parts of this application start to impact others. many, many, many years ago. uh in the early two thousands, amazon.com was originally a monolithic application. entire site was one monolithic application. and what we found was that the demands of the various components of that application were effectively causing negative impacts on others.

the other thing typically where this will start to make itself apparent is in the database, you'll have some queries that are maybe very intensive, you know, looking across very large amounts of data and then others that are maybe quicker and easier. and so again, this conflict and imbalance of the resources can sometimes cause challenges.

so what we need to do is kind of go through the stack and understand where we have room to optimize right, where we can start to think about how to pull together or pull apart these various components and think about how we scale them further. and so we're going to again, just as sky had kind of outlined here, work on the front end, the back end and our data tier.

now before we go too much further into this bill, there is something that we need. and basically what we need is a way to measure what's happening inside of our infrastructure and architecture. now, for those of you who have been building and running applications, i'm sure you've heard from say a support person and someone else in the team, a customer. hey, the site is slow. hey, the app is slow. you're like, what does, what does slow mean? what's happening? what maybe are, are you seeing, can you explain this further to me? and so again, we need some tooling. we need some things to get the data that we need. in order to be able to scale and to grow here at aws. we have a number of products that can help you with this kind of the two biggest i i'll say buckets of products, they themselves have multiple components that most of you are probably familiar with are amazon cloud watch and aws x ray.

now, amazon cloudwatch has been around since the earliest days of aws. um it's very deeply integrated across almost the entirety of our product portfolio. today has a number of different capabilities built into it around logging and metrics and alarms and dashboards. but then also some other aspects that can help us with things on the front end. so we could do there's a tool called synthetic canaries allow us to test our api from cloud cloud watches infrastructure remotely"

There's also real user monitoring or an ARM component that you could include in your applications front end, be able to get performance data from that.

So on X-Ray, we have the ability to do traces across our architecture. Now, at this point, we don't have too big of a distributed architecture, right? We have App Runner and our database. And so there's only so much that we can measure there, but as we break apart and decompose our application, these will become more and more important.

Now, the only thing that we've seen over the last several years and this predates the whole I would say craze this year in degenerative AI space is more and more tools that are coming out for both ops and dev folks that enable you to make use of machine learning.

And so we've got two kind of core products in this space here today at AWS. I don't know what this space will look like by the end of the week. Of course, this is a Re:Invent. So there's always lots of things happening. But the first is Amazon DevOps Guru, which as the name might imply, it's a useful tool for the DevOps folks inside of your organization has the ability to look at your infrastructure, understand the components that you have and then used on.

And then based on data and machine learning models that we've developed internally here at AWS offer up guidance. So we can look for areas where you're seeing high latency slow database queries, other sorts of things in infrastructure that you wouldn't like and then offer up basically, hey, this is something you should take a look at. This is the recommendation that we have.

Next, we have Amazon CodeGuru, which is the name might imply. It's something that it's used for looking at your application code. So you can point this at your code repository, have it scan all of your code and then pull in and get understanding about what your code does and how it might be performing uh where there might be issues that it can help identify.

So again, these two tools bring with them machine learning models based on the knowledge from decades of development and infrastructure inside of Amazon. Uh and again, it could be quite a powerful kind of multiplier to your efforts and what you could do today.

And this is just a quick example here of a screenshot from uh DevOps Guru where it's highlighting slow queries, it's highlighting aspects of what's happening inside of my database. And again, there's just a wealth of information that you can pull forward from these services to help you really understand where the pain points are, right?

We don't want this to be the thing where you just turn the knob and you're like, hey, let's just do bigger instances, let's just run more of those right. There are other options that you can get to before you get there.

Cool. So now that we've got some tools in our toolbox, let's get back to actually how we're thinking about scaling this architecture and breaking it down bit by bit.

Now, on the front end here, if you run your front on Amplify hosting, one of the great aspects about this product is that it scales really, really far without you having to do anything. There's effectively no knobs or levers in Amplify hosting that you can tu or tweak or tune that will enable it to scale further or better or further for you.

And I've had a number of conversations with the team uh actually about this and saying, tell me where it breaks, where's the break point? Where does it tap out where customers having challenges? And they said theoretically, they don't see them really one of the best aspects about Amplify hosting. It's built on top of CloudFront.

So CloudFront just celebrated its 15th year birthday here for us at eight. Um it is a CDN service. There's over 550 points of presence today for CloudFront. I remember when it was a very small kind of baby service many, many years ago. And now again, it's got this massive global capacity. Uh and it's able to help bring your traffic or bring your customers to your site faster and easier than ever before.

And so on the front end, a lot of the performance actually comes from the work that you do. All right, it comes from tuning your front end code. It comes from looking at how many calls you're making to backend databases, how slow those might be. It also comes from things like how much CSS JavaScript images, static content that you're loading.

And so this is where the CDN aspects of CloudFront become really powerful and that it can help cache those resources at the edge closer to where your customers are.

So this is one area where Amplify does expose the capabilities for you, which is that you have the ability to set custom headers on objects that are hosted by Amplify. And so we see here an example where I'm setting the cache control header specifying the maximum age and it's for the reject pattern for everything in the u that's in the image directory.

So you can imagine i'm going to cache my static images and i'm going to say, hey, those don't change that often. I'm comfortable having those being cached as long as possible at the edge and in the customer's browser. But beyond that, there's really not a lot more that you can do with amplifier or with CloudFront to help change or tweak these things.

So let's move on the back end the back end data storage. Just to say with the database, the data storage area is where i find actually that customers get the best impact on being able to scale earlier on. It's typically where there's more pain points. It's typically where physics becomes a barrier to how you think about how you have to scale things.

Now, one of the awesome things that we have now today with Aurora serverless v2 and one of the key aspects of Aurora that's different than a traditional related database hosting product is that it's separated compute from storage.

Um i've been running MySQL and PostgreSQL at scale for many years. Uh pre Amazon and what you end up running into again is either compute or storage as a bottleneck. But what Aurora does is basically detach the two so they can scale independently for you.

And so Aurora serverless v2 basically helps automate this process for you by scaling out compute, scaling out storage for you automatically. Now, the storage you really don't have to think about. It's going to take care of that all behind the scenes uses essentially a shared storage model for the database storage.

The CPU side of things though is where you get a little bit more control. Now, Aurora serverless v2 is based on the overall Aurora serverless model which has this concept called Aurora capacity units. A single ACU, Aurora capacity unit, represents two gigabits of memory for an underlying database node.

Today, you can configure manually yourself Aurora for either a minimum of half of an ACU all the way up to 120 ACUs which today is 256 gigabits of memory. So at that high end there, you're talking about a fairly large host or hosts or capacity scale that that exists.

And so again, this is in the scaling up of the single node that you could have as part of an Aurora cluster. Now, again, Aurora takes care of this for you. So Aurora serverless v2 basically looks at a number of metrics, a number of bits of data of what's happening inside of your database and automatically adds ACUs to your database. Does it completely transparently for you?

It should not impact trans transactions, it shouldn't impact things like buffer pools. And again, these are things that you would typically have to tweak anytime that you resize the database in a non Aurora model of having compute and storage kind of mashed together.

Now, the other option that you have and this is something that is part of Aurora overall is that you can also add other nodes to your cluster. So we can continue to add other re nodes to our cluster.

Now, these re nodes can actually be a mix of Aurora serverless or regular Aurora. So you can actually mix both these kind of hard configured resourced instances or nodes with serverless nodes that can scale up and down based on the overall load that they see today.

Aurora supports up to 15 read replicas. So again, it's 15 read replicas that can go from half an ACU all the way up to 128 ACUs.

Now, there is an interesting thing that Aurora does here where you'll see that there's different tiers that are written as part of this. So you always have a writer node in a cluster and then you can have these 15 read replicas that are part of it. Those read replicas do have to be given what's called a tier.

And what this is is part of how Aurora handles its availability as sole such that if the primary nodes were to fail or die, it then promotes a reader in order of the tier. The other thing that happens as part of this is that tier zero and tier one readers are always the same size as the writer.

So you can get really flexible with this. You could have say a reader that's a much lower tier that you forcibly scale down and you limit how large you can get. That could be useful for things like internal admin panels or BI tools or other data analytics tools that don't need to impact or need to have the full size of the Aurora node that's powering all of this.

And so this is the one area where you do have control again over these read replicas. These are things that you would create and add again based on the overall needs of your application.

Now, if you do add these various read replicas and you start to grow this cluster where your write node is growing automatically again due to Aurora serverless. And then you add these read replicas as need based on the demands of your application.

There are some customers that see, you know, 2030 to 1 read to write demands where these read replicas become a really important thing where you wanna, what you're gonna wanna do is add some sort of database proxy.

So database proxies have been around in the PostgreSQL and MySQL world for a very long time. Uh and then just a few years ago, we announced RDS proxy.

So RDS proxy as the name might apply is a database proxy that can sit between your application and all of your database nodes. So it can sit between your application and the write node, your application and all the read nodes.

And what it does is it helps simplify things such as connection handling and connection. Uh you know, memory consent. This is another really important thing, especially when we start talking about how on our application tier, we might see some aspect of auto scaling or serverlessness that happens.

Being able to have RDS proxy own the connection pools for you actually reduces resource demand back on the database. So by using an RDS proxy, you effectively get more scale out of your database without having to change anything too much architecturally, you point your app at the proxy proxy connects to the database and away you go.

So our architecture starts to evolve a small bit here, right? What we've done now is we've implemented RDS proxy. We have our primary write node in our Aurora serverless v2 cluster, we can add some re nodes again to help with splitting out that re traffic.

And so again, what we start to see here is a pattern where we can actually take this really pretty far. All right, if we had 15 re nodes at 128 ACU, you're talking about multiple terabits of, you know, memory and aligned CPU capacity for a database. This is going to take you really, really, really far.

And again, the storage tier just scales completely transparently for you.

So it's not something that you necessarily have to think about factoring in yourself. Now, Sky of course, gave us that awesome quote earlier in the presentation. And there's another one here that I like to lend on, which is that the best database queries that you're ever going to make are the ones that you don't make often.

And so another trick that we have here, but I purposely put this in in this order is to add caching in front of our database. Now, there's been a number of different ways and models for doing this over the years. Um and again, a bunch of different ways that you could think about this. I'm personally personally a fan of bringing my database, cash off of the database, uh not running it in my application as well.

And so today with uh Amazon Elastic Cache, which is another product that's been out for a number of years. You've got two primary options or two primary engines, I should say for doing caching inside of this product. You've got Memcached, Memcached uh was first built by LiveJournal almost 20 years ago. It's in all sorts of large infrastructure works really, really well handles really heavy scale.

The other is Redis and we see Redis also used incredibly high uh scale architectures. Now, the one got you here with Elastic Cache compared to so far, everything else that we've said is that with Elastic Cache, you are gonna have to basically take on scale of the resources yourself. So you do have to think about the size of your clusters for this. You do have to think about how you address objects in it.

And so adding a cache product is probably the first time where you start to see a more significant change in your application code than just splitting up, say the reads and the rights between two different types of database notes. So again, what we've done here is we've taken our baseline architecture, we've now added a cache to it. This could help again buffer how much we need to think about scaling our database, right? You could move a considerable amount of your re traffic to a cache and then end up saving a ton of time and money on scaling out the database further.

Um and so again, what we kind of see here is this model where we think about scaling, you know, first up and then out. Alright. So cervi uh Aurora Cerus v two is going to scale up automatically. It's gonna make those nodes bigger, adding more read replicas becomes something that you have to think about. Uh in the case of Elastic Cache, you can scale those nodes up bigger and then you can add more to uh you know, a pool of nodes that's represented by the service behind the scenes.

Again, you want to try to lend yourself as much as you can to thinking about how uh various other things can help reduce load and demand, right? So a proxy in front of your your database helping to remove overhead from connections is a real simple, easy win, right? Finding a way to do off database database caching for your application can be another real simple, easy win.

So let's talk a little bit more here now about the application here or the back end. So Sky talked really briefly about what AppRunner does for you in terms of how it's built on top of ECS and Fargate and auto scaling and ELB and all these other components that you yourself don't have to configure, right? If you were to write that in CloudFormation, it would be a couple of 100 lines of CloudFormation. Uh in this case, you're not doing any of that.

So behind the scenes, uh AppRunner has kind of a couple of core components to it. The first is that it brings out for your application kind of at its front door, an NLB or a Network Load Balancer, which is part of our overall Application and Elastic Load Balancing product suite. Behind that we run a L7 Request Router. So this is a request router for HTTP traffic. And again, AppRunner today only allows you to run essentially web applications. So port 80 or port 443, then behind that it manages for you ECS Fargate tasks.

So it is effectively using again Fargate behind the scenes and manages the scale of all of those resources for you. Now again, as Sky was saying earlier, you see nothing of any of this. You connect your code repository, you deploy your application, your application runs right? You're not exposed to any of the kind of bits under the hood.

Now, AppRunner though a little bit different here. Again here from Aurora Charts v two does give you some options when it comes to kind of tweaking uh knobs and levers for scale. So AppRunner refers to the Fargate tasks that it runs on as instances or AppRunner instances. An AppRunner instance is configured for a certain amount of memory and cpu and effectively, this directly correlates with the, the cost of the product when you're running it.

So today, you have everything from a quarter of a vcpu or 0.25 cpus and half a gigabyte of memory up to four cpus and 12 gigabytes of memory. These are hard allocations. So you have to choose one of these configurations, you can't tweak these two things independently. If you wanted to do that, you would choose, for example, like different two instances, you know, comparing a compute intense versus memory intense versus potentially a more mixed mode aspect of it.

Now, you can choose the size of the the underlying AppRunner instance. And then as I'll talk here in a moment about it scales this up and down for you. Now, by default, every single AppRunner instance has a maximum amount of concurrent requests that it can handle. Ok. This is not TPS. This is a concurrent requests. There's an upper hard bound on this today which is 200 requests.

Um and so this becomes one aspect that effectively blocks the scale at an instant size for AppRunner. The second factor that goes into scaling AppRunner is the number of instances per service or per application that you have today. That's a default soft limit of 25. And I've talked to the team about this one and that number is an easy thing to request a limit increase. It's set to 25 just kind of as a more of a safety mechanism to make sure you don't, you know, accidentally uh you know, create something too huge here.

So if we're to put these two numbers together here, assuming that I can get up to 200 you know, concurrent requests for instance, uh a soft limit again of 25 initial upfront instances in my application that gives me about 5000 concurrency max in an AppRunner application again by default.

The next aspect of this is that AppRunner manages for you scaling these tasks or again instances as it calls it. So it will fire up more of these Fargate instances behind the scenes for you. It will then go through the health check that you configure for it to add it to the L7 load balancer. And it basically pays attention to how requests are coming in and out.

Now, one thing that it does do for you, it does an intelligent scale down when it sees that the load has dropped to the point where the in are no longer as active as they need to be. And so this L7 request router will actually intelligently ship traffic away from some nodes so they can be spun down automatically for you. This is one tricky thing that people have typically seen with auto scaling is that in a traditional auto scale model with two, you scale down a node, you may have no idea what that thing is doing. So this L7 request router helps basically remove the load from that AppRunner instance and then make it easier for it to then be spun down.

Now, one of the tricks the AppRunner also does is it effectively behind the scenes, keeps the Fargate task essentially in a frozen state. So it keeps the memory active. So if there was all of a sudden a sudden surge of traffic, it can then fire that back up for you very quickly. And so again, AppRunner is taking care of all of this for you. And by default, you could scale an AppRunner application down to a single instance.

So again, in thinking about this aspect of our backend application, what AppRunner is gonna get us again is that maximum, you know, 5000 currency. Now people sometimes mix up concurrency in transactions per second, right? And so if you think about the duration or how long your transaction, your action actually takes you factor that in again with that 5000 currency. And then you can kind of think about the scale here sometimes thinking about it in terms of TPS is a little trickier.

So I thought here of the say I have a request that takes up to two seconds per request. That means in a minute, I could do 100 and 50,000 requests, right? So it's a decent amount of scale here. If it took one second, then I could do 300,000 requests in that minute of time. But again, but again, one way to think about this again, for your application, one thing that we don't get into here is your application performance tuning, right?

So you're gonna want to use those tools that we talked about before to see. Where's my application slow? Where is the code in it running slow? One? Actually, I um interesting tips and tricks that we see quite often. And one aspect about AppRunner is AppRunner has managed runtime for you. Sometimes moving to a new version of your, your run time can actually bring a pretty beefy speed performance boost. You see this very commonly with Lambda.

So in terms of our scale here, this could probably pretty easily get us over that 100,000 plus user mark, maybe start to get us into our first millions of users. But then what do we have to think about after that? Well, at some point, everything that we've discussed does start to hit a wall, right? Hey, we've reached those 15 read replicas for our Aurora database. They the maximum size they could possibly be. But what we're probably run into before then is that the contention on our right node is maxed out.

Now in a relational database of any type, basically, you are limited to a single write node, even if you've got some sort of, you know, master master type of configuration and a re replica. The other thing here is that we do have a limitation on a runner, right. So today, there is a end point probably where our application again, due to the complexity of our business might start to have issues with components that are causing, you know, increases of latency and friction on other sides.

And really where you start to see this become a pain point is actually organizationally. So again, going back to the example of Amazon.com, where we had this large monolith in the early two thousands. And we were starting to see all sorts of kind of cracks in the foundation as it were with it. One of the biggest ones that we had was with our development teams. The developers kept stepping on each other's toes in that single code base.

And so that monolith became a problem for us to be able to move as fast as we needed to with all of kind of the overlapping entanglements that happened inside of this. And so this is typically where you start to talk about decomposing that application, whether you want to call it a service oriented architecture or a microservices based architecture or what have you. But again, what you have to start thinking about is how do I break apart this monolith?

There's a number of sessions this week that you can go to, that can talk to you about different strategies and examples of how to think about it. I think that there being kind of primarily two different models for how you think about breaking apart an application. One is a data domain mapping. So you start to look at the data in your databases and see how that can be grouped together in terms of commonalities and needs for the business.

The second is a business function mapping pretty close to each other typically where you start to say, hey, ok, there's different needs of my business inside of my application and my data.

And I'm going to group that together by the business grouping or line of business as it might be. This also becomes an area where you might start to think about how to evaluate other technologies, right? If we're going to start building a whole new applications as part of this, maybe I want to look at serve lambda, maybe I wanna run some things on EC2, maybe I've got as uh Sky was talking about earlier, different database needs.

Now again, one of the kind of simplest easiest first tricks that you can do when it comes to scaling your database is to just have more databases, right? So database federation, we take that data domain mapping, we take that business domain mapping, we separate out that data into completely different clusters that we might have. So again, if I talked before about how you can have a single uh right master in Aurora up to 100 and 20 ac us and 15 read replicas. Imagine now that I could have three or four of each of those. And again, you keep pushing the bounds on scalability as you break this up.

Now this changes a couple of things. One I'll no longer be able to query all of my data in one place. Probably not doing that terribly often when you are doing it. It's probably for things like business intelligence needs or analytics needs, there's better products to do that, right? Whether it be Redshift or Athena or uh Snowflake or something like that, you think about a purposeful BI database tool or purposeful analytics tool. And so doing something like this again, gives us that ability to just continue to kind of stamp out all of the scalability patterns and tips and tricks that we've used before as needed for these different areas.

So maybe for example, here, forums on my site are particularly heavy. So I have to scale that a little harder, the user database, you know, critical and important. I have to scale that a little bit further. But maybe my products database has a much more manageable amount of rights and reads. And so I don't have to, you know, quite tune that up quite the same. Again. Typically, the data area is where we focus a lot on scaling challenges with customers.

This does become a point in time where we want to say, hey, do I have very specific data needs that I need to think about a purposeful built database for this, right? Do I actually really have non relational data that could go in a key value store? Do I have document data that should go into a document store or key value data or sorry uh time series data? So thinking about where you want to put the data, thinking about the database product that aligns to it. Again, that's the kind of thing where if you don't have deep expertise on this, to lean on your team that you're working with here at AWS, your solution architects can do data discovery exercises with you, data mapping exercises with you. We can help you figure out, hey, based on what your use case is, this is the right product to think about.

Then typically in following a breaking up the database tier, you think about breaking up the application tier, right? I always lean to start with data and then move forward to the application tier. It becomes a little bit of the same thing, right? So if we had a single app runner application, now we could have multiple app runner applications. The challenge that you run into here now becomes, how do I think about gluing these things together? Right? How do I think about how I'm gonna expose the different services out to my client?

Uh API Gateway has a a neat thing that it could do, which is called base path mapping. It allows you to map different parts of your API to different backends. It's not just like, oh, I had different lambda functions. You can literally delegate out to different teams. Hey, you own the path for this and you own the path for that off of our API. The only thing that we might have to think about is moving to completely different technology patterns, right? Maybe exposing APIs inside of our infrastructure for internal microservices isn't what I should do. Maybe I want to think a little bit about moving from synchronous models with APIs to asynchronous models with other ways of connecting these services.

And this is another area where again, you could see some really great talks on this this week thinking asynchronously part of the service track, a couple of others that are in the service track and app integration track about this as well. But if we think of a traditional model where maybe I have two different services, we're in a synchronous world service A called service B, service B replies back to service A and then back out to the client. There's tight coupling there, right? There's a bunch of brittleness there. Every single one of these blue arrows becomes a potential area that I have to think about how to recover from a failure on.

And the second model here where I have an asynchronous application client, called service A service A call client B or service B but then service A replies before it waits for any work from service B. Another way to think about this here is I have an order service, I have an invoice service. And we could see basically on the top here in this asynchronous model, the order service calls the invoice service says, hey, client, I've done that. Here's a, you know, an order id or a code that you could think of. And then later on the client can go back and make that request to that next service. And so just by separating this out, we reduce some of the the tight coupling that we have inside of our architecture.

Um I used to lead service develop advocacy here for many years and we would spend a lot of time talking to customers about synchronous versus asynchronous. And when I see some of the best companies that have built service applications, uh companies like Lego, for example, where they've moved heavily to asynchronous models, it really changes the way that you mentally think about your application and it really lends itself to helping you think about scale very, very differently. So I really encourage you to think about how you explore how motions between components inside of your architecture and where you can move from a synchronous to an asynchronous model can help you really free up some of the again tight coupling.

Now, beyond having API call to API call, there's then a whole bunch of services that we have, uh that essentially allow you to pass events from service A to service B, right? So part of our app app integration suite, uh we have services like uh Amazon Simple Notification Service or SNS Amazon Simple Queue Service or SQS. Technically the first, the first AWS product. That's the trivia question. If you're very curious, it was the first preview product for AWS was actually SQS back in 2003, Amazon EventBridge, which uh basically can act as a message bus hub in between different services. And then we have Kinesis Data Streams which can allow you to really ingest fast and spread out lots of information of scale.

It's another quick cheat sheet. Again, these four products, they have bits of overlap, there's ways that you can use each of them kind of the same. However, there really are distinct use cases where each of them shines. And again, I would encourage you this week to check out some of the sessions in the service and app app integration tracks which go into a lot more depth about how you think about persistency durability retries, the cost models, the consumption models, there's a lot of different factors that you have to think too when you're choosing one of these products to sit between different services.

And so eventually our our architecture starts to evolve to something much, much more complex, right? This this is not a, a real customer's image. Uh this is not something that I directly replicated from a conversation, but this is the kind of architecture that you could expect to see and shouldn't feel daunted by or concerned with. Right? In this image, what we're showing here is a whole bunch of managed services that you don't have to think about the underlying infrastructure for almost all of them have automatic capabilities for things like scaling multi AZ recovery options and stuff that you again, you don't have to think about.

And so again, where you could get from, this is again, really quite far. So hypothetically here, right, we've decomposed our application, we're leaning on top of the scalability capabilities of these managed services. We're breaking out our services into different uh components. Uh our front end is still, you know, continuing to kind of scale away for us without us having to do too much work. And so we can reach our, our, you know, initial goal here of exceeding 10 million users on AWS, right? Where do we go from here? Everything after this point is where it starts to look maybe a little bit more unique.

The first thing of course is stamping out more of the patterns that we've talked about. And again, we can talk about companies like Uber, for example, where they had thousands of microservices that all looked exactly the same and would follow the same types of scaling patterns. We do something very similar internally to Amazon a lot of our products under the hood look exactly the same, right. We're following patterns that we've established over and over and over again. When there are certain break points, we respond to them. You always want to have the ability to dive really deep into your stacks performance. And so your observability tools, your, your monitoring tools, your your code profiling tools become incredibly important and key and critical to understanding where to scale how to scale and so forth.

And then you do potentially reach a point where you say, hey, you know what actually it makes more sense for me at this scale to operate some of these things. So I'll be the first to admit, right? Lambda is the most expensive way to buy compute at AWS. But you use Lambda because the TCO and you're not having to manage that stuff for a really long time is really great, but there could become a point in time where you say, you know what i do want to take back that responsibility. There's economies of scale that I want to make use of. But again, that's really far out at the scale line for that starts to make sense.

Cool. So to infinity. Uh again, here in closing, um uh I've been giving a variation of this talk now for almost a decade here at AWS. Uh we've constantly revamping it. You know, the things that I talked about 10 years ago in this talk. And five years ago ago in this talk would look quaint to what you could do today. And so there's constantly things that you can think about in terms of, you know what we call serverless, right? Getting you away from managing manual infrastructure, dealing with resources like autos skiing yourself.

The other thing is is that speed today in the cloud is so different, right? CPU storage, memory networks so much faster than they were a decade ago. Um again, you really do want to think about things like caching. That's where you're gonna save a lot of time when it comes to scale. So where inside of your infrastructure, can you cash at the edge at the application at the database tier? Again, reducing those queries to your database, reducing the load on your database is gonna be one of those easiest ways to win when it comes to thinking about scale.

And then again, when it comes to just kind of quick, easy wins, you have things like federation. People think that it's a cheap hack, it's a cheap hack that works almost every single time. And it will really help you overcome some of those pain points where you're like, hey, i'm at the bounds of what a product can do in terms of scale or concurrency or something like that.

Um and again, you want to look for best fit technologies based on need. We've talked heavily here about AppRunner. You, you might already know today that AppRunner is not gonna work for you and that's fine and that's great. But for a lot of people starting off really small again, not having to think about those resources or how to build something is going to be more beneficial.

So with that, I really want to thank you for coming here to this session and joining us here Monday at Re:Invent. We hope you have a really great week here. This is a really fun uh event every year. It's been really exciting to be here. We live by these uh survey results. So please do come and give us results, really appreciate it. And then most importantly, uh a round of applause for Sky. This is her first re:Invent and I think she did great.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值