Build your application easily & efficiently with serverless containers

All right. If you haven't visited the expo, uh only a couple of hours left, but I'm glad you're here uh to attend my talk today. And uh today, I'm gonna be talking about how to build your application easily and efficiently with cus containers. Um easy and efficient are definitely topics that I love to talk about. And serial containers are another area that work in.

I work as a developer advocate for container services at a to bs. So to kick things off, I want to introduce an idea that I've developed over years of software development at different start ups. And that is that as a software builder, you need the right tools at the right time.

So similar to growing things, you know, if I was to use a watering can on that full grown tree, it's not gonna do much for that tree. Likewise, if I use that rake on a tiny baby plant, it's probably just gonna kill that plant. I need to use the right tools for developing that plant in the life cycle.

Same thing goes for software. If I'm building something that's completely new in a software space, um the main things i generally need are rapid product development because i need to build new features, i need to figure out what fits with my intended market and how to build what they need in order for that product to be a success.

But if I'm building on a piece of established software that already has a successful established user base, maybe what i need more is support maintenance, reliability, uh feature development maybe slows down a little bit but have different needs uh as a software builder.

So when looking at ease and efficiency, the two dimensions that we're going to talk about in this talk, the main thing to look at is finding the right spot on these two dimensions for where your software and where your development cycle currently is.

So when i look at efficiency, the questions i ask myself is does this tool help me build better and faster right now? Or have i outgrown this tool? And it's now starting to get in my way of what i'm trying to build in terms of efficiency.

Is this tool's pricing is compu computational model or it's optimization appropriate for where i am right now in terms of scaling, uh have i outgrown this tool or is it fitting what my needs are in terms of pricing model?

And i believe that containers are a tool that helps with both of these dimensions, ease and efficiency from all the way from a brand new greenfield project up to an established software that's been around for decades.

And i'm gonna show the two dimensions. First greenfield projects. What i found with containers is that for greenfield project, a container is gonna get you started a lot faster. There are literally millions of prebuilt container images on docker hub. Um and many of those are production ready. They're patched, they're maintained uh by core teams that develop runtime themselves like the python core team or no g core team.

So they've built a production ready pattern for distributing a particular runtime for your application. And all i have to do as a developer is pull it off docker hub and start developing on top of it that speeds up my development on a greenfield project.

Additionally, building your own deploy tooling isn't the greatest use of your time if you're trying to build a business. Um and likewise goes for local development. If i'm spending a lot of time building out a deploy chain, a build chain and the chain for shipping my application to production, that's all wasted time that could have been spent on building business features to deliver value to my end customers faster.

Now. Meanwhile, on the other end of the spectrum with an established software, what i found is that containers help solve a problem of standardization. uh often, especially as the company grows, you end up with a lot of different competing tooling, competing standards and competing formats for how to build an application and ship it to production.

docker and docker containers provide a standard format that not only eliminates these special snowflakes that where the deployment tool is only known by a couple of people inside the company. uh but it also means that you can bring in an employee from anywhere that uses docker containers and they're going to be more or less familiar with the same tool chain.

Additionally, for established businesses, you'll find that containers help you pack uh applications on infrastructure more efficiently to save money. you're not going to have as many v ms that are sitting there running at extremely low utilization but still costing a lot of money and there's reliable uh deployments and rollbacks.

So container image once it's deployed is immutable and even if i have something that happens terrible in that environment where my application crashes, it goes down and completely destroys this local file system. i can always bring that container image back onto that machine, reboot it from scratch and restore it uh based on the pattern that's supposed to be running in.

Now, i've talked a little bit about ease and efficiency, but i want to talk about efficiency as well because this really goes into the cost, the old face when you're developing applications and concurrency is at the root of all efficiency problems.

Um computers throughout their development have enabled us to do more things more quickly with more concurrency. And i want to talk about a little bit of the timeline of that starting with the 10 81,000, which was uh the first computer that i actually wrote code on.

Love this computer, but it had an extremely low power processor four megahertz. it was only really capable of running one program at a time. and in fact, if you wanted to run another program, you had to physically turn the machine off, take the disk out, put in a new disk in it and turn it back on. so no concurrency really going on there.

Now, fast forward a few years, we get windows 2.1 we get co-operative multitasking. and the idea that you could have multiple programs running at the same time on a slightly more powerful processor, but they had to coordinate periodically.

each program was responsible for checking with the operating system to see if another background program wanted to use the cpu. and if there was, then that program could choose to yield its time to that other program in the background. it required every program to be implemented in this manner and to implement co-operative multitasking properly.

so it didn't always work a lot of times you would have programs that would freeze up the entire computer. so processors continue to get more powerful. uh a few years later, we get windows 95 windows, 98 we're starting to get preempts multitasking.

and so in this approach, the operating system is actually was over that cpu and it is freezing programs and switching out which program is running on the uh cpu core on the fly programs no longer have to do cooper multitasking.

and instead the operating system is taking over and saying get off the processor, it's time for another uh program to actually run this processor right now and switching back and forth really quickly to give the illusion of multitasking.

Now, a few years later, we get multicore processors. now, for the first time, we actually have processors uh which have two cores that can run two independent programs at the exact same time in parallel with each other, which led to a whole new um level of innovation in speed and concurrency in applications.

Now where we are today is client serving ar architecture. a lot of the programs that we use day to day, they don't just run locally on your computer. they communicate over the internet and one interaction that you have on your local client could kick off tens or even hundreds of servers in the background that are coordinating to answer your request and make something happen in response to your click or your interaction in this application.

and when we talk about concurrency, these servers are often handling concurrent requests uh requests from tens hundreds thousands, maybe even millions of different connected clients at the exact same time.

so we're seeing this concurrency rising exponentially. uh we get more concurrency over time, we get more power we have more powerful things we can build.

So how do you actually build an application that can handle this concurrency? um more efficiently, we want to be able to go from zero users to millions of users and have compute model that works all along the way. it needs to be easy for us, but also needs to be efficient.

So getting back to the idea of builders needing something that's customized for where they are right now. Um i'm introduced the idea of low concurrency applications versus high concurrency applications.

Um i see a local currency application as something that it's a niche product right now, maybe hasn't quite reached broad market market fit yet. Um so it has a few users that are interacting with it maybe throughout the day. Um but it hasn't quite yet reached viral adoption.

The most important thing for this application is ease of development and low baseline operational costs. If i only have a few users using the system, i'm probably not making very much money off of it. Therefore, i don't want to pay a lot of money to run the system for those users. I need to minimize the cost there.

Meanwhile, on the other side, we have high concurrency applications. These applications have no shortage of users, maybe they're viral, they've got millions of users um hitting the system and with you at that point is the most efficient compute possible.

Um they need bulk compute at the cheapest sustained usage rates. you start to have a little bit more leeway, a bit more margin in the system to operate more complex systems. And you can start to run at a higher baseline cost. But you don't want cost to explode and go exponential as your users go ex exponential.

Um you want to minimize the growth of cost as your users grow. So those three services i'm gonna talk about for serve containers, ibis, lambda, ibis app runner and ibis fargate. And these are server technologies that will help you grow from a low concurrency application to a high currency application.

So starting with a bs lambda think of a bs lambda as a containerized event handling function in the cloud and this function will look something like this. This is a simple hello world obviously not very sophisticated. But imagine if you wanted to execute this hello world uh on demand for many, many clients, there's different ways to run this function.

One way might be with events, i can call the a bs lamb to api directly or i can hook up amazon eventbridge. We saw several, several announcements have to do with amazon eventbridge today, i can tie those events in. So that way a handler function such as this one executes inside of arias lambda whenever an event happens.

But for web workloads in particular, this is where things get more even more interesting and there's a lot more concurrency i want to build a feed web request to this uh handler function. So i could do that with an application web bouncer. I could do that with an amazon api gateway and even i can have a lambda function url.

And these are all different ways of getting a web request off the internet and into my handler function. Now, the interesting thing is what happens when the haler function runs. What lambda does is it packages up the handler function inside of a micro vm spins it with micro vm and uh and puts the handler function inside that micro vm to run your code in response to that event or that request.

And these micro vmv ms are strongly isolated from each other and they're only ever doing one thing at a time. Let me explain how that works. Here's the life cycle of a lambda function

In instance, starting from what we call coltart. Coltart is when nothing is happening yet in the system, the code hasn't even been running an event hasn't happened or web request hasn't happened uh yet. So there are no micro v ms at the time that a request arrives or an event happens.

Lambda spins up a micro vm downloads your code into it, unzips it and then begins initializing your code. So initialization code is as it runs from the top down of the file, maybe you need to connect to a database, download some information uh to get started with or something along those lines, this happens outside of your handler.

And then from that point forward, lambda can begin running the handler function multiple times in response to events or requests that happen. Now, the pricing model for this is that you pay per millisecond for the time that's spent in the init initialization code and the time that's spent invoking the function code.

The reason why this is particularly important for low concurrency applications is that lambda optimizes down to zero cost when there's no work to do. So, here we see a workload where you pay per millisecond for initialization and and invocation. Then there's a second where there's no request coming in. There's nothing for the system to do. There's actually no charge there. Then another request happens, we pay per millisecond and then no requests are arriving for another 500 milliseconds, there's no charge.

So lambda is micro optimizing down to only charging you during the milliseconds where you actually have work to be done. But what happens is you start to get more traffic, you know, you start to get that product market fit and now you have multiple users using the system at the same time and multiple concurrent requests arriving.

Well, here's where lambda has to start to spin up multiple function instances because each of these function instances is only ever working on a single thing at a time. We see one request arrive. There's a cold start and it spins up a function. If a second arrives while a request is being processed, it has to spin up a second function instance in parallel. And likewise, with three concurrent requests that is going to require three function instances.

Lambda still attempts to optimize pricing. Uh in this case, there's enough traffic to keep two function instances busy with back to back requests. But the third one doesn't actually have enough arri uh request arriving to keep it busy. So we see that the cost goes back up and down between times three function instances and times two function instances.

But here's one thing that's interesting. You, you establish that product market fit. A lambda has helped you build out uh product features that have become very popular and you start to get a lot of users. Uh all these users are talking to your lambda function and the number of function instances are starting to stack up.

Um now it's time to start thinking about other alternative models for running your code. And to explain this, I want to do a little deep dive into what concurrency actually means and what's happening during the life cycle of a request.

This is a very common uh request that you'll see time and time again in applications, a simple user sign up request. It takes about 100 and 20 milliseconds to process from beginning to end and you'll see it's broken up into chunks here.

The first chunk is an input payload arrives. It takes about one millisecond for the processor to validate it. Then the application needs to insert some data into the database. So it talks to the database over the network and it's waiting 10 milliseconds for the database to persist that data onto a disk.

Then a response comes back. Your code handles that response and maybe it makes another call to a downstream service like an email sending service or something along those lines. Now, this service takes about 100 milliseconds to finish sending the welcome email or verify your your email address, email.

And then finally, uh control is returned back to your application again. And uh the application spends eight milliseconds generating a response to send back to the client. So 100 and 20 milliseconds here. But if i look at it closely, i'll realize that out of that 110 milliseconds, only 10 milliseconds was actually spent with the processor running my coat.

The other 110 milliseconds was actually just spent doing nothing but waiting. And so this is very common in in io heavy workloads, uh the processor for a single transaction will spend the vast majority of his time waiting. And so as i start to look at that model of of many function instances running in independently and only working out one thing at a time, i see a lot of time being spent on waiting and very little time actually being spent on the cpu doing things, but i'm paying for those milliseconds.

So what's the alternative? Well, every modern application language has the concept of an event loop and the event loop is a way to allow a single process to work on multiple concurrent requests at a time. So think of each second as having 1000 milliseconds of time and we should do work. And if i split my code up into small chunks of one millisecond, one millisecond, eight millisecond, i can schedule those chunks of code into every millisecond of those 1000 milliseconds.

And whenever the cpu would otherwise be just waiting, doing nothing, i can actually grab another piece of code to do and fill that time. So here's how that works. When request arrives, we start validating that payload for one millisecond. And now while we're waiting on the database rather than doing nothing for 10 milliseconds, we can actually grab another request off the internet and start processing the other request by do validating another input.

And so here you can see that these different transactions they stack up and your code starts working on them in parallel with each other. Um and actually processing multiple requests at the same time in the same process.

Now, i particularly like no gs um my favorite runtime language. So i'll show an example in no gs of how easy it is to write this here, i have a function and i just add the async keyword to it. And anywhere that my code would be waiting on network io i can just add the await keyword. And so what this does is it tells no gs, this is a point where i, i'm waiting so you can do other things while i wait.

And so here's what happens. No gs splits that function up into three blocks. They are separated by a weights and the first block of validation takes one millisecond, the weight takes 10 milliseconds. The second block takes one millisecond, the weight takes 100 milliseconds and then the final block takes eight milliseconds. But if you add that up, it only adds up to 10 milliseconds.

And that means that in every second of time, i can actually run this function 100 times per second and stack up 100 concurrent requests. Now, this sounds magical and it definitely is, but it does have some downsides and i want to make it sound like this is the perfect solution to everything. And the main downside is what happens when you have too much concurrency.

So i said that that function could handle 100 requests per second. What happens if my clients start sending 100 and 10 requests per second? Well, what's gonna happen is that the work to do is gonna start queuing up in memory and it's gonna take longer and longer for the cpu to get around to doing that work as the queue grows in length.

And so here i see the memory goes up. Uh response time starts going up as the cpu utilization reaches 100%. And so eventually what's gonna happen is requests are gonna start timing out. Everybody's gonna be unhappy. Clients are gonna be like, why is the service down?

So what's the solution here? Well, similar to how lambda works, we need to start running multiple copies of the application in parallel with each other, each with their own event loop and load balancing across the event loops.

Now, ideally, these event loops are on different cpu cores or even uh different cores of a different machine. And each of those event loops is gonna be able to process in this case, 100 requests per second uh with that function that i showed earlier.

So this is a great transition to iis aar because that's exactly how i aar is designed to function with avis app runner, you set up your application and then you can tell avis app winter how many requests at a time to feed your application up to a limit that you specify app runner handles everything in the stack between your client code and your back end server code.

So that includes load balancing and english getting traffic off the internet to your application. That includes the scaling aspect that includes the micro vm similar to lambda. And then you provide the application container that's implemented with an event loop and you tell app runner, let's say i would like to be able to serve 100 requests at a time to my app runner um container.

So here's how that stack builds out. I have my client side code and a s apper provides an endpoint, an automatic endpoint for my service. And all i have to do is have my client uh send request to that endpoint. The endpoint behind the scenes is powered by an envoy proxy load balancer that's managed by a usa runner.

And inside of that envoy proxy load balancer, there are limits set up according to the concurrent request limit that i set of 100 requests. And the queue keeps track of how many requests are in flight to my container at any given time.

If i start to reach the point where i'm about to hit that limit of 100 app win or goes ahead and preemptively launches another copy of the application container uh to distribute requests across. And now as i can now, i can reach up to 200 requests at a time because each of these application containers is being fed up to 100 requests at a time and i can set limits so i can say don't go over to application containers.

And so if that happens and i start to receive, for example, 300 requests per 2nd, 300 requests at a time app earner can start to shed traffic and say i'm gonna return a 429 too many request errors uh to clients to prevent it from impacting and causing a, uh the denial of service attack or high latency and response time for the existing clients that are connected to this endpoint.

And so here's how that, here's how that model works as you scale out with traffic growing over time. I want to call out one important caveat too. Lambda has coal starts. App winter also has coal starts. It takes a certain amount of time to set up that container and get your application started and the longer your application takes a start up, the less reactive app runner is gonna feel to traffic spikes.

So here we see that second coal start that happened as traffic grew, took a little bit long. And as a result, there was a scattering of 429 too many request status code errors that were returned to clients

so it's very important to make sure that you allow app earner to scale out to an appropriate number of containers and that your application starts up as quickly as possible. But once you get those two things working well together, you'll find the app earner is able to launch containers uh to add bulk amounts of capacity at the time. Like in this case, 100 concurrent requests at a time is being added to the overall amount of capacity that this application is able to handle.

I'm gonna talk about the pricing model. Similar to lambda, app runner attempts to optimize to reduce cost when you don't have traffic. Now, it does not optimize all the way down to zero though, uh app runner will charge for the memory of your application at all times. However, it's gonna try to charge for cpu only when you have active requests arriving each time that an app or service activates it by receiving a request. It's gonna stay active for at least one minute.

Now, the difference between lambda and a printer. However, is that once that uh process activates, you can serve any number of requests up to the limit that you specify and you're gonna pay the exact same price. So whether i'm serving one request per second or 100 requests per second, um i'm gonna be charged, i'm gonna be charged the exact same uh amount for based on the dimension of cpu and memory.

So you start to get a lower per request cost at higher traffic. Um but the caveat here is that, that with that minimum time of one minute being active. If i was to receive, for example, a tr uh request every 10 seconds, it would actually keep this container active 100% of the time. It would never be able to dial back to charging only for memory.

So the ideal use case for an app runner style service is a business style application that perhaps is busy during the day, during um daylight hours. While people are in the office, everybody goes home at five, nobody's really using the front end and you know, application uh for the office or whatever. And as a result, app can dial back to only charging for memory because there's no request arriving overnight. And so um you'll find that this does provide a lot of optimization for some uh business use cases because memory is the extremely cheap dimension compared to cpu. You can add gigabytes of memory and not even begin to approach the cost of adding cores of cpu.

So this works together with scaling. So here we see as traffic rises over time, uh we see that the cpu memory starts out as charging for one application instance at the cost of cpu plus memory, then it continues to rise and we have to scale out to two. So we require being charged for two times cpu and memory. But then when traffic stops, uh app printer dials back to having two container instances that are only charging for memory and then only one that's only charging for memory. So you can start to imagine how app printer will scale up and down and how cost scale.

Now this leads to the third model a s fargate. So with a bs fargate, we once again have a server container. So lambda similar to app runner, but there's a lot more pieces of the puzzle that are up to you to handle and this can be both a blessing and a curse. It could be a little bit hard to configure these things at times. But on the other hand, you have the ability to optimize things exactly how your application functions and potentially reach even lower price for handling large amounts of traffic.

I want to talk about what first, what avis farge does handle for you and that is patches to runtime and uh or patches to the underlying hosts and infrastructure. So avis far far is providing v ms on demand which will then run micro v ms for your application. You don't have to think about those. Uh for example, we had different ssl patches that happened in the past and all of those were handled by abu. All i had to do was uh keep my application running abis fargate, the underlying hosts and host operating system are all patched.

I do have to manage though the runtime inside that container. So that is a little bit different. Lambda will patch the runtime inside the container. Um but i have to patch the runtime inside of a container on fargate. Uh load balancing, gris, this is another big one. If i'm running an internet connected workload, i'm receiving traffic from the internet with aris fargate. I have to choose what my english and load balancing is and pay for that separately. Whereas app runner bakes the price of the ingress into the app runner service and then last but not least the scaling.

So whereas app runner and lambda have a built-in scaling model, fargate does not have a built-in scaling model. You have to choose how many fargate containers you would like to run at any given time and make sure that you're running enough fargate containers to handle the traffic volume that you are actually receiving.

So how do we do that? This is where an orchestrator comes in. An orchestrator is a piece of code which is able to take your intent for how you would like your application to run and try to obey your intent to make sure your application runs properly.

So here example, developers can tell the orchestrator i'd like to run 10 copies of the app. I'd like to register those copies of the app into this low balance r i provision. Um and i would like you to gather the metrics and stats from my running containers and if the traffic goes so high that cpu goes over 80% then i'd like you to start scaling up and adding more containers.

Now, this orchestrator you have a different choice as to which orchestra you'd like to use. Um a os fargate allows you to use either elastic container service or elastic kuber service as the orchestration api uh for handling these micro vm containers inside of aws.

Now, ecs i or elastic container service is a fully a managed api and i think it fits best with the server model. It is a server api you don't pay anything for it. Uh you use it on demand and it is orchestrating a bs fargate on your behalf. So that you only pay for the a ts fargate test. On the other hand, elastic cerne service is an open source deployment. So it uh aws is gonna deploy some physical pieces of hardware that are gonna be running this open source project. On your behalf, you do have to pay for the cost of running that hardware. In addition to the aws fargate cost,

other choices that you have uh with us fargate is how you want to receive the traffic application load bouncer is best for if you have htpgr pc or web socket style uh application where you want layer seven or application aware load balancing uh to your back end

network. Low balancer is best if you want low level raw tcp or udp, um packet level and connection level, low balancing. Um load bouncer is a little bit cheaper but you get a lot more features, a lot more power and and a lot smoother of load balancing with an application load bouncer.

Amazon api gateway is a serverless ingress. So the idea with that one is you only pay when a request arrives, you pay per request that hits the amazon api gateway disadvantages. Amazon api gateway can start to get a little expensive as you get many, many requests compared to um application load bouncer and network load bouncer a lb and nlb, they have a higher baseline cost. So if you have low traffic, it's gonna look like api gateway is cheaper for you. However, as you get really high traffic, you'll start to notice api gateway costs adding up compared to an application load bouncer or network load bouncer, which can handle large bulk amounts of traffic at a fairly low rate.

I will talk a little bit about the scaling and scaling options that you have um with amazon u cs, for example, uh the scaling is integrated into a bs auto scaling and amazon cloudwatch. Um you have options step scaling is for if you want to define extremely custom scaling rules like at this particular cpu threshold, add this many tasks and in this many containers, target tracking is if you want uh a to bs to figure things out on your behalf, you say i'd like you to try to keep the um cp utilization at around 80% and figure out how to do that. Aws is gonna play with your task number and increase it and decrease it uh until it finds the right balance and then schedule scaling is fantastic. If you have known your scheduling events, for example, you know that at this time, there's gonna be a lot of people that are gonna start using your service or at this time of day. Uh traffic increases or decreases and you can scale across any type of metric.

There are built in metrics like cpu memory and network io um but you can also hook in your own custom metrics or metrics that are application-specific. For example, for a, a worker that's pulling work from an ssqsq, you might be interested in knowing how many messages are in the queue and scaling according to that.

Now, the fundamental difference with avis fargate from the lambda model or the app uh model is that avis fargate charges based on time not activity. Abis fargate does not actually care whether you have any traffic at all to the container. So if i have zero traffic, i'm gonna be paying the same rate as if i'm serving 203 100 requests per second to that task.

So you'll start to realize though that as you max out your fargate task, you keep them busy with lots of work and lots of requests that fargate does. However, give you the lowest per request cost compared to a lambda style, which at that same level of traffic would add up to a lot more cost.

There are also options you can use to reduce the far gate price even more such as um adjusting the cpu and memory dimension down to the lowest size. So traditionally, if you're spinning up a vm, there's discrete sizes and you can only scale your vm down so far before you can't get any smaller vm that has a lower cost with avis far, you can only go all the way down to 256 cpu which is a quarter of ac pu and 512 megabytes of memory. So you're paying for a very small slice of compute capacity with a low cost. It is far is also part of the compute savings plan. If you're know, you're gonna be running this application for a long time, maybe even years, you can pre commit to paying the cost of that application and receive a flat percentage discount on the cost. And the other way that i've been recommending to folks is to consider graviton and a md uh arm based uh uh or arm based tasks. The reason why is when you spin those up, you'll notice you get a much more powerful processor at a lower cost per uh per minute and per second on a s far eight.

So now we get to the question and, and this, this is where we start to balance the efficiency versus ease. How do i know which service option is right for my application? And the thing i would encourage everyone to do is think not just is it right for my application now, but also it will be right six months from now or a year from now and find that right balance between the ease and efficiency.

Let me look back at some of these dimensions. So we can prime side by side pricing model with a lambda. The pricing model is pay per millisecond per invocation based on the memory size of the function and every single one of your concurrent invocations is gonna stack up and be charged separately with abi app runner. The pricing is based on second once again, but it's based on the cpu size and the memory size, you have two dimensions there. Instead of just a memory, you have cpu and memory and apple is able to optimize down to only charging for memory when you have no traffic. But then when you do receive traffic, it's gonna charge a flat rate for the cpu. No matter how much traffic is arriving to that container, it was fargate has a constant price based on c pm memory. Whether you're receiving any traffic or not, the resolution is important with avis lambda, it can scale to zero between each request even at the millisecond resolution. So if you have one request arrive and then a gap of seconds or milliseconds before the next request arrives, avis lambda is able to optimize all the way down to zero in between those gaps with a us app runner

Every time the application instance activates it activates for a minimum of one minute and the duration is rounded up to the near a second. So you'll see a little bit less resolution there. Um but if your traffic has a, a pattern where it goes up for a period of time and then it goes down and there's very little traffic, uh, app is gonna be able to fit that resolution.

And then for eight of us, fargate, the pricing is calculated per second with a 11 minute minimum once again similar to app runner. Um, but it's gonna charge from the time that the task starts until the time of the task stops and it's up to you to optimize the starts and stops to make sure that you're not running a container that is not actually doing anything.

Scaling model lambda has built in scaling. It's gonna only ever serve a single invoke at a time from a function instance, but it's gonna increase the number of function instances on the fly extremely quickly, extremely uh reliably um and be able to handle spikes very, very quickly.

App runner also manages the number of container uh instances for you. It could be a little bit slower to respond. I would say on average, like i've seen with lambda, uh coal starts as low as 10 millisecond with app runner. It's not uncommon to have a co star of about 30 seconds.

Now, the difference though is that when you add an app runner container or function uh uh application instance, you're adding a pool of capacity that's much larger, adding potentially 100 uh concurrent requests to your available capacity pool. So app rr can be slower to respond. But when it does respond, it adds a larger chunk of capacity and this still allows it to keep ahead of traffic as long as the traffic doesn't spike from zero to very high instantly. As long as there's some kind of curve to it, a runner can, can respond in time.

A s fargate, no built in uh concurrency whatsoever. But uh i mean, no built in scaling whatsoever, but it can handle really high concurrency. So you have to find your own custom scaling rules and decide how many containers are appropriate to run at any given time and make sure you're not running more than you need and paying more than you need. But there's help. You can use an orchestrator, you can use a us r scaling and provide some tools out of the box as well as you know, custom logic that you can implement yourself web traffic and ingress.

So with avis lambda, you have a choice. There is a built in function, your l that doesn't cost anything additionally. Uh there is api gateway which is a serverless uh model that's gonna pay per request that arrives. So there's no cost when there's no request or you can even use an application load bouncer and hook that up to avis lambda, application load bouncer. Although obviously has a cost and hourly charge whether you're receiving traffic or not with a s app runner. All the ingress is built in. Uh you don't pay anything additional for ingress. Because that envoy proxy load balancer is being managed by a ds app runner behind the scenes for you with a s fargate, there's no included option.

So whereas lambda has function your role and app runner has a built in envoy proxy, fargate does not come with a built in ingress. You have to provide your own ingress and it's up to you whether you want to use a servers model of for example, an api gateway or whether you want to use a server full model like an application load balancer. And the choice there really comes down to traffic level. You know, if i have low traffic, maybe api gateway makes sense for me. If i have really high amounts of concurrent traffic, maybe application load balancer makes sense.

So here's the map of, of how i think about sweet spots uh for your application. When you're looking at that ease versus efficiency. If we're looking at concurrency, the dimensions are concurrency intensity and concurrency stability. So intensity is how much concurrency are you dealing with? Are you getting a scattering of requests? Maybe one request at a time or a few requests at a time? Are you receiving large amounts of traveling hundreds or even thousands of requests at any given time?

The other dimension is current currency stability. Um how predictable is your traffic? Uh for example, i worked in the past at a social media company where there was a very predictable pattern in the morning traffic would start to rise at lunchtime. As people get off work, the traffic would rise and fall back down as people get off work in the evening, around five, through each time zone, the traffic would rise and rise and rise throughout the evening until found people went to bed. So every day i could rely on that particular pat. And if i didn't see that pat, i knew something was terribly wrong with the application.

Um so that's an example of very stable, predictable concurrency whereas imagine like um something like a a ticketing uh solution. And uh taylor swift is about to uh have her co um her concert and a million people are trying to buy taylor swift tickets. That's a very, very spiky workload or out of out of the blue. Uh you know, i could have millions of people hitting the back end and trying to send a request.

So a lambda it excels at workloads that are extremely spiky and it excels at workloads where there are few requests at a time or large gaps in between requests. Um because of that pricing model because of that concurrency model. And because of that scaling model, on the other hand, you'll see it was far gate at the far end of the spectrum. It excels when you have constant high levels of traffic, hundreds or thousands of requests at a time. And you have a fairly stable and predictable pattern.

Um because uh it's easier to build all of scaling if you can predict what the scale is gonna be like. Am app on earth. Sit somewhere in the middle as i mentioned before. It's, it's ideal for like a back office um application which is used during the day, you have stable, predictable traffic during the day. But then when people go home at night, the traffic sort of dwindles down to nothing. And now app burner can scale down to zero for you or scale down to only costing for for memory.

So this is how i personally think a think about it. The other dimension to think about is is what is included out of the box. Obviously building your own scaling rules for a to s far eight can be a little bit challenging. So that doesn't really hit the ease dimension. But by the time you have hundreds of thousands of requests at a time, it's ok to spend a little bit of time development effort figuring out scaling because it's not worth it to you. Whereas if you're just building your first greenfield project and you're focusing on building business features first and foremost, maybe it doesn't make sense to spend as much time thinking about scaling, thinking about ingress. Maybe i just want to use the easier approach of a bs landing.

One more dimension that i want to talk about. This one is uh important to consider. If you have an application that has regulatory requirements or high security requirements. And that is that avis lambda is the only option that has per request isolation. So what that means is one request arrives and another request arrives. Those requests are each being processed on their own independent uh processor which is dedicated to handling that request and that request alone.

So those can be important if your workload is dramatically different from request to request. For example, maybe one request consumes a lot of cpu and another request only consumes a little bit of cpu or perhaps the requests are untrustworthy. An example that i built before in the past was um i was asked to build a solution that would take a screenshot of an arbitrary user submitted url. And i would say that's scary because they could put any ul on the internet. I don't know what that website is that, that i'm screenshotting. It could have malicious code in there that's trying to hack out of the sandbox and break into the system.

So i definitely ran that in lambda because it's perfect for it. Each particular screenshot is happening its own sandbox micro vm independent of anything else that's happening in the system. Now, the downside of that per request isolation though is you can't take advantage of certain optimizations like um in memory caches, like with lambda, i have 100 concurrent requests. Each of those has 100 different function instances. So there's 100 different in memory caches. And in memory cache is not gonna really perform the same as it will on an app run or a to s fargate where one process is handling 100 concurrent requests and is able to take advantage of cash or recently fetched results that could be stored in memory.

So definitely consider that uh once again, you'll notice that app runner and fargate are a little bit better for the super high volume uh request because of that. But on that ease and, and uh efficiency spectrum, here's where i was sort of place these, these, these products, avis fargate is very configurable. It gives you a lot of control. The downside is a little more complex. It's not gonna be as easy to set things up with a s fargate because you have to set up rules and configuration for everything that you want to happen.

Avis land is on the other end. It's very opinionated. You don't have to set up a lot of configuration. It's fully managed by aws and it's a little bit simpler and easier to get started with downside is you may not be able to optimize cost down to as low as you would be able to at large amounts of traffic on an a to s fate or aer.

An app is a little bit in the middle. I would put it a little bit closer to lambda than abis fargate in terms of things that's trying to take off your plate for you like scaling and english. Um but you can still configure more of those dimensions and, and settings uh as you see fit.

Um so thank you very much for paying attention uh to this presentation. Um i'd be happy to take uh uh questions if anybody has some uh questions i can, i can help answer.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值