Demystifying and mitigating AWS Lambda cold starts

All right, everyone. Welcome to re:Invent. I'm so excited to be here with you to talk about demystifying and mitigating your AWS Lambda cold starts.

I, if we haven't met before, my name's AJ Steenberg. I'm a newly minted AWS Community Hero, but I'm not really new to the serverless space. I have been building applications with serverless software and AWS since probably late 2016, early 2017. Midway through that time, I started working on a bunch of these open source serverless libraries, developer tools, ways that you can kind of improve the developer experience on top of AWS with serverless.

One of those companies I worked for was Serverless Inc. where I worked on the Serverless Framework. Uh maybe some of you have heard of it, but these days I work at Datadog and I work on serverless Observability there, but I'm also sort of known in the community for having a couple of weird hobbies and I've picked up these hobbies over the course of a number of years.

One of them that I do when I'm not working on serverless or cold starts is skydiving and base jumping. And one of the reasons I like it so much is it really just brings intent and focus to things that you're doing because, well, you, you have to do it right.

So, I want to talk a little bit about to, uh, today about how some of the mentality that we bring into this sport can apply to how you can approach, managing AWS Lambda performance.

All right. Here we go. 3 2 1...

So hopefully no one just ate a lunch. But the point that I want to make right away is that nobody starts skydiving or base jumping by throwing a triple back flip off of a bridge in Twin Falls, Idaho. And for the same reasons that we start small and work our way up, we do the same thing with our serverless approach to building applications, right?

When we're teaching new skydivers to skydive, we focus on this priority system, this framework of priorities for thinking about uh what you need to do to stay alive. And the reason is we, we want you to be able to skydive more than once, right?

So the very first priority is to pull your parachute, deploy that parachute. And we teach this because we tell you to do lots of other things. When you're learning the skydive, we're teaching you how to fall stably. We're teaching you how to read your altimeter, we're teaching you how to be cognizant of yourself and the people around you while you're in the air. But to be honest, for most people, when they first jump out of that aircraft, their brain turns to mush and none of that is working.

So we, we kind of jam this into people's heads. The first priority is pull. Now after that, if you're aware of the fact that you need to save your own life, we would like for you to pull at a predetermined altitude. This does a lot for you. Uh most importantly, it gives you time to deal with any malfunctions you might deal with or you might, you might end up with and it gives you time for your instructors to safely deploy their parachutes because we're in the air with you as well.

So if you know you're ready to pull and you know, you're ready to pull at altitude and you're checking your altimeter and you're responding to the correct checks that we're giving you. We love if you could pull stable and pulling stable is a very simple moniker for an important thing. And it just means please don't be doing a bunch of crazy backflips or turns or completely out of control in the air. This maximizes the chance that your parachute will uh come out uh in a safe and effective manner. And you won't need to cut away the final priority order thing we teach you to do is to have fun, which I made up because it conveniently fits this talk but I'll tell you why in a second.

So why talk about cold starts at all? Why get on the stage and, and talk about this or why are you all here to, to hear about it? And the reason is sort of twofold, there is a lot of fear, uncertainty and doubt in the space about serverless cold starts. And there's also some real pain before we jump into that though.

If you're new to, to Lambda, you're new to uh serverless, you may not be aware that scaling is an online activity. So it happens while your customers are waiting for the Lambda function uh to respond to a request. This is different than what you probably are used to. If you're coming from a container based background or something like Fargate, even where you will, you will scale those systems and and initialize the application before it can respond to a request.

Uh I'm not going to spend a lot of time on exactly what this is because this is a 300 level session. But at a high level, Lambda has to fetch your source code or container image and we'll disambiguate that in great detail shortly. It has to initialize your runtime. If you're running Python or Node.js or Ruby Go those sorts of things and then it has to run any static initialization code that is in your handler before it can actually execute. Finally after that, it can actually pass your handler the event and run your code.

So let's talk about fun every other week. I'm on the internet. I see a blog post or a hot take that just summarizes as simple as this cold starts. That's the end of the post pros cons of serverless. Well, I'll never use serverless, all these types of things.

So I want to sort of uh blow this away from the popular, popular culture a little bit. But at the same time, I need to acknowledge the fact that this impacts people. This is real. People struggle with this. People don't, don't understand how to solve these problems all the time or they find theirselves in impossible situations.

This is a screenshot of a GitHub issue from a popular open telemetry library for Lambda or it's our open telemetry library for Lambda. And this specific user said, look, I'm, I'm moving away from serverless entirely because of cold start issues including this one but not just this one.

So on the one hand, there's a lot of FUD, right? There's a lot of misinformation and just plain incorrect data. Uh and on the other hand, there's actually real people suffering who don't know how to solve this problem.

So if you haven't seen one of these before, this is a histogram of an application uh performance and what it's showing you is the frequency of requests which are occurring and their relative duration. So p50 means that 50% of requests finished in this time frame or earlier. And p99 would say 99% of requests finish in this time or earlier. A typical application curve looks a lot like this. So this is sort of our benchmark and the reason is that most requests are fine and run very, very quickly. However, when you have an outlier for any sort of reason, right, they're way out there. So you go from tens of milliseconds to minutes or more to complete a request.

When users first approach Lambda and they're first building their application, they typically see a quantile metric graph that looks a little bit like this with a bit of a fatter end out of the p99. Now, what they're seeing here typically is the impact of a cold start on their overall latency curve for their application.

At the same time, I can't get up here and tell you that this is what you should mostly focus on. What I want from this talk is we're going to spend 45 minutes on this topic. We'll answer some questions and hopefully that's the most you ever have to think about cold starts. And the reason is that Amdahl's law governs how we think about application performance, which is that if we, if our applications truly have 1% of invocations or less than 1% of invocations are cold starts, then if we spend a bunch of time on that, we're actually only optimizing a tiny sub chart, tiny sub amount of our application in total.

So the law itself is on the screen says the overall performance impact gained as a single part of the system is limited by the fraction of time it's used. So if your cold starts are already a tiny fraction of time, they're used, we can improve them 99%. And that still only barely shrinks your median latency.

Ok. Let's talk about your priorities and how they relate to cold starts. So your priority zero priority pulling your parachute is to know and reduce the code you load. This is the most important thing. If you're only gonna take one thing out of this session before you, you leave today. This is the most important thing. I want you to know. If you already know and reduce the code you load, you're aware of this, accurately observing your service and having a true understanding of how Lambda's scaling process online scaling process affects your actual overall performance of your system. Is the second most important thing? And that is our pull at altitude.

Finally, number three is don't fight the platform. Uh this is pull stable. So you know, if you've already ready to pull your parachute, you know, and reduce the code you load, you've accurately observed your service and you understand how these are impacting uh your your users. The next one is to not fight the platform and use the tools we have. And finally having fun is checking out containers.

So let's talk about these in order knowing and reducing the code you load, you have to pull your parachute folks. This is some JavaScript code that I see everywhere I see on blog posts. I find it in code samples and I see it in code reviews every single day. Uh it's the single most common mistake I see people's people make and what these two lines are doing is creating or loading the AWS SDK and then initializing a new SNS client.

And right here is the single biggest mistake I see with Lambda in Node.js. And if this code is anywhere in your application, I hope to convince you that it's a problem that it needs to be fixed.

So what's actually happening here when we load the entire SDK, we're actually loading every single client that comes in that directory. What we're looking at here is a flame graph for a feature I worked on at Datadog and we call it cold start tracing. What it's doing is measuring the time it takes to require or import any dependency you have in your application and all of its children. And it's an easy enough assumption to make.

So we can actually track and trace all the way through the dependency tree that you're loading. And then we can append that to a flame graph and show you the time this is impacting before your user's request can be serviced with that same line of code. I just showed you where you're loading the SDK for Node 18. That's going to take around 405 140 milliseconds or I'm sorry for, for Node 14 and 16 because Node 18 will be 3.0. We'll get to that here in a second.

So what's the single fastest thing you can do to improve this? You can just grab the client you need. So this is the exact same functional code...

And what I'm doing is I'm reaching into that SDK and I'm just grabbing the SNS client and initializing it. Now, before we had a cold start cert of around 540 milliseconds. We make this change and now just loading that, that client itself is only gonna cost us 100 and 10 milliseconds. This is no web packing. This is nothing fancy. This is literally just changing one line of code. It's 12 characters and you can get a 400 millisecond improvement on your cold starts.

Of course, for Node.js you have other tools, bundling tree shaking. That does have some advantages, but that has some disadvantages which we'll talk to you in a second.

Now, the second most common mistake I see is some code that looks a little like this we're using. In this case, the v3 SDK. That's great. That's already pre client defied, I think is the term I would use, it's already broken down into individual clients. So you have to pick which client you want, it prevents you from doing the mistake. I talked about earlier. But in this case, a user prepared a little bit of helper functions. Every single company I've worked with, with a few tiny exclusions have some sort of global helper, right? They provide a library, they provide a logging utility, they provide authentication or authorization, but no one really knows what's lying in these packages in a lot of cases.

So if we want to actually look into that and figure out what's going on, we see all sorts of crazy things. So this is the flame graph for that exact bit of code. This is the cold start trace. And what I'm looking at here is actually two really big sub chunks. The first on the left is loading the v3 Kinesis client and that loads in about 280 milliseconds for note 18 right out of the run time. But the second utility glass is actually loads the entire a bs sdkv two with that same mistake we talked about earlier. And this is super common because people might have like an eventbridge integration or an sn ss integration where they're using that as part of the core of their platform. And then it just kind of comes with them every time they use that library in any lambda function. This is so so common and so rare for people to really understand what is all being loaded at the run time.

Now, we talked about es build and web pac and that those are super powerful technologies that you should definitely explore. But there are cases where that may not help you or you may have to do extra work to maximize the benefit in this code. example, we have a very simple api, it's a a crud api and we have a dynamo client and we have an sns client. Now, a lot of folks are using sort of a mono lambda pattern where they're packaging many, many routes or many individual sub chunks of features in one lambda function. But they may not use all of those dependencies all of the time.

And I want to show you an example of what this looks like when you actually load this code. So here we have a total span of loading the handler of about 480 milliseconds and loading that dynamo client is going to contribute 350 milliseconds of that. But there's a pretty good chunk under that span that we also have to load. And that's loading our sns client. In this case, in this specific invocation, it loaded in about 50 milliseconds. But in so many cases, if we don't need one chunk or the other, maybe we're on the right path, we're not or we're on the red path, we're not on the right path.

So we don't have to publish something to sns. What we can do is wrap our import statement into a method and call that method when we need it. uh node python ruby all have this concept of kind of a global import or global require. So you can load that dependency when you need it and then it's available for further invocations. And the way I think of this is amortizing some of the cold start penalty over multiple subsequent requests. So instead of paying for it all at once, you sort of pay for it a little bit at a time until that function is fully warmed up.

But what can that save you? So in this case, we're back down to about 400 milliseconds. So we saved, you know, 80 or so milliseconds off of that initialization. And then when we do have a request which comes in and needs to publish the sns maybe here we're doing a put and also a publish now only that request pays that price for that first time initialization for that, that sandbox which is already warm.

So instead of having an extra 80 milliseconds on top of an already 400 millisecond cold start time, you can shave 80 milliseconds of that off and pay for that later in a later request. So we call this lazy loading. I think this has probably the biggest impact for folks in here who are building react applications or next s applications where sort of, by default, they're large bundles but they're per route and they're already sort of split up and you're able to make this, this uh change and import only what you need when you need it.

While I was working on this talk, I had a lot of questions from other abi heroes and people in the community that wanted to know what the overall impact of just plain old dependencies were on, on their, their application cold starts. So i uh i need a lot of motivation to get something going. So I decided to start live streaming this project where I would build a different version of every application uh with different dependencies at different different sizes.

So I took a note application and I just started shoving dependencies into it and requiring them and seeing how, how large they are from 0 to 250 megabytes. And we get a pretty linear curve. I don't want you to focus too much on the exact numbers here because they will vary and we'll talk about that. But I do want you to know that we did the exact same thing with python and got a very, very similar curve.

So for the most part, this is the run time interpolating the bytes and also just loading them into the sandbox, which is not nearly as straightforward as what occurs on your laptop. So you have to test this in lambda. What can you take away from this. Well, if you measure the curve, you can see that roughly every 10 megabytes cost 100 milliseconds roughly.

So what do I mean? Ish why am I up here throwing caveats at you already a few slides into this talk. The reason is that this is gonna vary wildly and I'll show you an example if you are struggling with slow load times for your application, there's a kind of a few things that i see all the time uh for compiled languages like go or rust. If you're allocating a bunch of memory out of the heap, that actually takes a fair amount of time to do for the run time to actually go and find available bytes, especially if they're continuous bytes and allocate it and hand it back to the to you and your application code.

Uh if you're running java or.net, I see a lot of initial delay with reflection based libraries and dependency injection. Uh some of this does actually happen at runtime as well, not just static initialization. So if you're seeing overhead on those first couple of invocations, that can be a part of it.

Um finally, if you're making http requests in your static initialization, the code outside your handler, this includes making a connection to a database or getting a secret from ssm or decrypting something from k ms, you're gonna pay a little bit of a penalty here.

Uh finally, one mistake, I don't see that often but does happen is if you have like a really large chunk of something like json or a cs v that you're trying to parse and split, typically, the runtime will try to, if you, if you're doing like a required statement and node, it'll try to allocate bytes for that all up front and take a really long time.

How long does that take? So to illustrate the difference between roughly equivalent lambda function sizes, we've got 100 megabytes for the a ssdkv two and 100 and 14 megabytes for uh material u I which includes react in the virtual dom and an im db database, which is about 55 megabytes of json. We measure both of these and we saw it earlier. The sdk will initialize in five or 600 milliseconds. But this application with only a few more megabytes of actual dependencies requires about 2600 milliseconds to load.

So the key takeaway here is it's not enough to look at the size of the function. You actually have to profile the code, you have to actually measure this yourself. So if you're ever like me and you find yourself uh too low and too unstable, the most important thing is to pull your parachute, you have to pull your parachute.

Ok. Point number two. Accurately observing your cold starts, this is pulling an altitude. So when is a cold start, actually cold, there's actually a lot of discussion and conversation about this. And the way I choose to define it is a cold start occurs when user facing latency is impacted as a result of online scaling.

So if you have to initialize your application, that's a cold start if that happens proactively or if that happens as a result of provision concurrency, I don't really count that as a cold start because typically that can happen or does happen. Well, before the request ever comes in, while I was working on these cold start tracing features for datadog, we started seeing a number of indications which looked like this.

And this is a really weird graph. We know here that the request doesn't come in until all the way on the right at 12 seconds. That's when the client initiative initiated the request. However, the sandbox warmed up 12 seconds ago, the sandbox was initialized before the request was even made. Now that I would expect from something like provision concurrency, right, where you're paying extra for aws to keep a certain number of sandboxes warm, but this is running on demand.

So after collecting a lot of information about this and going to aws, they confirmed that this can happen and it's expected. And they talked a little bit about why in the documentation lambda is already separated into multiple control planes and then there's this invocation plane and then the worker plane where your code actually runs.

So when a request comes to the invocation plane, the control plane for the invocation plane knows that there's like a sandbox idle here, right? So request comes in. Control plane says, oh, I've got a warm sandbox. I can run that request. That's a warm start.

So request a gets routed straight to sandbox one and it's running. Now, what happens when request b comes in? Now, we know what happens, right? There's no available sandbox. So lambda says, wait just a second. I'm going to cue this request while I spin up a new sandbox. So I'm gonna start spinning that up or initializing it. And this is our cold start.

But the control plane has gotten a lot smarter. So what we learned was that at this point when request a finishes, sandbox one becomes idle. And at this point if sandbox two, the one we warmed up for the second request, if that sandbox is still initializing, the control plane says, wait a second, I can just send your request right away over to sandbox one.

So now there was some overhead waiting for that sandbox or waiting for that other request to finish. But at this point, sandbox two is still initializing. So by definition, it's less overhead than if you had a full cold start.

So what happens to sandbox too? At some point it completes the initialization process and it's ready to receive requests. So guess what happens when a request c comes in the third request enters the picture it can get routed straight to sandbox too with no cold start.

So when we found this and went back and forth with aws, i decided to call this a proactive initialization. And when I looked for these, I found them all over, I put them in this, uh, test code into a couple of different api s that i was running. And over a day for this specific example, almost 80% of my initialization weren't causing any user facing latency at all right

They were just running in these warm sandboxes. When I talked to a couple of other folks in the Heroes program about this, I spoke with Ken Collins who runs the Ruby on Rails project for Lambda. He put the same code into his Lambda functions and confirmed a very similar result over the course of thousands and thousands of invocations. Many of these were not causing any user facing latency.

So number one, you have to understand when a cold start actually occurs and when it's not, but you might think it is. For a lot of people, they think that any time they see an init duration in the report log, that's a cold start. And if you follow that logic, you can get really confusing results.

I took the same examples that I was working with before, between zero and 250 megabytes of dependencies in a zip bases Lambda function. And then I just deleted all the handler code. So it's the same handler code at every single step hello world, but it has different steps have different levels of dependencies. If you're just relying on the init duration to tell you how long your cold chart is, you're not going to see the whole picture because what this chart shows you is that init duration is reported the exact same time over and over again, about 100 and 70 milliseconds.

So I asked about this and I learned that buried in the documentation. There's a note that says the init duration is the time required for static initialization or extensions. And uh any of the other handler runtime code to initialize inside of the micro VM. Now there's a few things that are then notably missing from that definition.

The time spent in the Lambda control plane services, the placement service and assignment service. Julian Wood already gave a really good talk about what these are and how they work. I've linked that at the end. So you can reference that if you'd like. But for you, the most important thing they don't mention is that this includes the time it takes to copy code from S3 for zip based Lambda functions.

So then we have to dig deeper and we have to say, ok, what is the actual initialization time for my Lambda function? Given that if I ship zero megabytes of dependencies and 250 megabytes of dependencies, I'm seeing the same metric.

I took the zero megabyte function and I grabbed a trace. In this case of X-ray, we can see that the overall time spent in the Lambda service was about 266 milliseconds. But that initialization duration, which is what you get in CloudWatch logs was 172 milliseconds.

So then I said, ok, let's grab that 250 megabyte zip based function and drop it into the same system. What do we get a lot different result? In this case, we spent 845 milliseconds in the Lambda service. But our initialization duration is still 172 milliseconds. It's the exact same as the zero megabyte function.

So what's happening? It has to copy those bits, right? Those those bits have to land in your, in your sandbox before your function will run even if you don't use them at all.

So I set up a project where I have a Lambda invoker service living in the same region as a bunch of different Lambda functions. And a CloudWatch event just triggers that invoker service that invoker service goes out updates the function configuration to invalidate all the running sandboxes for my test fleet. And then after that, that's completed, it invokes that function and it's measuring the round trip request time instead of just the initialization duration.

The reason of course, that doesn't include all the time it takes to run your function because that's to copy the bits and it doesn't account for the fact that I might get a proactive initialization. And I don't know exactly what the rate of those will be because that depends on other factors happening in the service.

So what are the results? I let this run for thousands of indications over a week. And we took a look at the p99 statistic because that's where interesting things happen. The top row are when we're actually loading all of those bytes in the bottom row is where we're not loading any of the bits. They're the exact same dependencies at each column. But the handler code differs in that on the top row, we actually load those bytes and on the bottom row, we don't.

So on the far left side, zero megabytes. Hello world, we get a p99 time of around 354 milliseconds round trip request time in the same region. I don't know which availability zone. My Lambda function will run in no one does except for AWS. So controlling for this is by just creating many, many invocations over the course of a week. On the heavy side with 250 megabytes of unloaded dependencies, we see a p99 time closer to 700 milliseconds.

So if you're not looking at the overall round trip time, you're probably not fully understanding and observing your service and understanding how that initialization time impacts your users.

It's a little easier to see this on a graph. And that's what I want to do here. The top line is the loaded cold start request time if we're loading every single byte in that 250 megabyte limit, and the bottom line is when we don't load it.

Now, we saw there's a 3400 millisecond gap in the unloaded line in the red bottom line between the low end and the high end. But it pales in comparison to what's happening if you actually have to load all of those bytes. And this is what I have taken to calling your opportunity zone.

If you're running a mono Lambda API, if you're running an XJS application or React application, this is your opportunity zone to slowly load those bytes over the course of many serial requests to amortize that cost of your cold starts and reduce that overall pain.

So we pulled our parachute but now even when the world is spinning and we're maybe not in full control, it's important that we pull at altitude. Ok. Number three, don't fight the platform. This is pulling stable.

So we pulled our parachute. We're gonna save our lives, right? We're we, we're at the predetermined altitude. We know we don't want to get hurt and finally we want to pull stable and maximize the chance of our parachute coming out happily.

This is where we want to embrace the platform and not fight Lambda. One thing I see all the time when I'm dealing with new users who are embracing Lambda is they're using defaults and most of the defaults that you get with AWS SAM and CDK and other toolkits are 128 megabyte Lambda function size.

People don't realize that your runtime also has to fit in that memory. So if you just deploy an empty Lambda function for Python or Node, typically that consumes between 60-80 megabytes of your overall allotted memory.

So what I did was I took a function that has a few dependencies and I started testing it at very granular levels of memory. So I started at 128 megabytes, 130, 105, 202, 210, so so on and so forth. And we can clearly see that the duration goes quickly to flatline after about 250 megabytes of RAM.

But the interesting thing happens around the initialization duration between a total time out and failure to initialize at, at 128 megabytes where Lambda will try to initialize the service, it'll fail, then it'll try doing it during the runtime. And that will also take us all the way to the limit that I had created, which I think was 20 seconds and that timeout.

But at 150 megabytes. Lambda will happily swap in the same way your laptop or another server will swap, it'll swap memory between disk and RAM and free up some space to allocate more things onto the heap. This is super, super painful.

I don't need to spend too much time on telling you how to fix this because there's already a really good tool to do this. Check out Alex Casalino's tool, AWS Lambda Power Tuning. This tells you not just that on the far left side, at those low ends, we have a very long duration with small memory, but it often in most cases, that's more expensive because you're paying by the millisecond here.

So when we increase memory in this specific case, around 1.5 gigabytes, we find that that's actually the most cost performance, memory point, memory set point to run your Lambda function. And that's because in Lambda memory is the only knob you have really for, for performance and it includes CPU as well as RAM.

So even if you're not maximizing the RAM, if you're looking at that report log, and you're saying, well, I'm under that number. So it should be fine. A, you might be swapping and B you may not be using all of the CPU or you may, you may be using too much CPU and running longer than you need to be.

So just check out this tool, it's great. The most important thing is to run it more than once because code changes, right. The other thing I hear all the time is I have a cold start problem. Can you take a look? And users typically think that this latency is driven by cold starts? Right?

So they come to me and they say I have a cold start problem. Can you look and the answer is you, you're not having a cold start problem. You have to read the documentation and this includes things like concurrency controls.

So they're running into like the maximum allotted concurrency for their account on some other function. And that's limiting their ability to consume off of an asynchronous queue or something that includes things like not correctly setting batch size and windowing controls.

I think this is the biggest one. I just created a blog post about using report batch item failures. So many users will start to throw errors in their Lambda function which are consuming from async sources. Not realizing that if you throw an unhandled runtime error, the event source mapping service which is responsible for pulling your queue will slow down the concurrency at which your Lambda function can run.

So you're actually limiting your ability to scale out when you're not using this this feature. And finally, users that are not configuring the maximum concurrency over their function and running away with all of their allotted invocations for their account.

So again, for especially asynchronous processes you need to be looking at making sure you're reading the docs because each of the services that you're gonna use are gonna have different integration mechanisms which you should be using correctly. Otherwise you're gonna experience this pain that most people think is a cold start and it's not at all a cold start.

So you've known reduced the code you load, you're going to pull your parachute right. You've accurately observed your service. So you, you're gonna pull at altitude and finally you're reading the docs and you're aware of what you need to do, which means that you're gonna be able to pull your parachute stably.

So these are the most important pull priorities or the most important things in priority order that you need to think about when you're deploying Lambda.

So what's the next thing on the list? Well, let's, let's have a little fun. Let's talk about containers on Lambda. Containers are Lambda have been out for a couple of years now, but they only recently got really, really good.

This is a photo of me and Mark Brooker at the ABU Hero summit earlier and I was supposed to watch some baseball and I spent the whole time asking questions about this paper. He and his fellow co-author wrote a paper called On Demand Container Loading in Lambda. And it's really incredible work and it comes down to sort of four key points.

They did some work on the EXT4 file system to deterministically serialize the different layers of a Docker or container image, then they chunk it all up in a 512 kilobyte chunks. They use a technique called convergent encryption which derives an encryption key from a hash of that chunk. Therefore, they can then implement a multi tier cache that's available amongst all of us.

This leads to a staggering statistic that if right now you walk out of this conference hall and you deploy an application using a container image on Lambda. There is an 80% chance that everything you ship to Lambda includes no unique bytes.

lambda has seen them all before now. That is a mind boggling scale. But what does it get you? It gets you some very impressive performance numbers.

So I took that exact same application that I was working on before, right? Zero megabyte 5101 5202 50. And I grabbed it out of my zip based lambda function and then I started putting it into a container and testing it at every step. And what I found was really interesting.

The round trip time for cold starts between containers and zip files actually inverts after about 30 megabytes. So containers have gotten really good on lambda to the point where in certain cases they're outperforming zip bases lambda functions. And that's all because of this giant multi-tenant cash and this crazy amount of work that the lambda team has put into serverless that's one of the things I love about serverless is I didn't have to do that. They went out there and this work, they figured out how to improve ext four. They figured out this multi-tenant cache, they figured out how to do that securely and I just get faster cold starts. That's amazing. That's what I'm here for, but it's not just the initialization for like a managed runtime bun is a new, no j uh sorry, a new uh javascript run time that competes at the same kind of space that no js runs at.

And right when they went 1.0 the team released a layer and code to create your own lambda layers uh which include the run time. Now, that would actually initialize code in about 640 milliseconds. If you actually take bun and compile it yourself and actually compile your handler code as well, you can get about a 350 millisecond initialization time. This is taken from my colleague, max who is doing a talk here on wednesday about this exact project lambda perf. If you haven't seen it, i highly recommend you check it out. It's in the dev lounge.

So I built on this with bun and said, well, we've already seen that container images can be faster than zip based lambda functions at certain certain levels. And I know that the bun run times around 40 megabytes. So let's test it out. Now, we were able to get an iteration down to 100 and 40 milliseconds.

Now, a couple of you up here are frowning because I just got done telling you not to pay attention as closely to an iteration because it doesn't tell you the whole story. So I did the same thing. I put it inside of our same invoked test harness to show you that at scale after three days of thousands of invocations at p 99 bun running the same exact application as the managed node 18 run time is within about 100 milliseconds at the p 99 round trip request time as the managed node 18 run time.

So I'm not telling you to leave this room and rebuild everything using a container image. That that's not what I'm here to tell you folks. However, if you have a large kind of mono lambda or lambda lift api and you're running into that 250 megabyte limit all the time, maybe it's time to explore containers or if you're in the room because you have a legacy application that you want to just sort of offload. The cost of running that service onto lambda. Maybe it's infrequently used and you're worried about the cold search for this container service because you've heard that in the past they were really bad. They've gotten a lot better give containers another look and have a little fun quick. Yeah. All right.

So in the process of writing this talk and i was kind of streaming and talking about this on twitch, talking about on, on twitter and youtube. I had a lot of questions from lots of folks all over the community about kind of some common lambda myths. So I want to address some of these here.

The first myth was just what if we wanted the increase memory? What happens, right. So we already have our test service that we saw before where after about 250 megabytes, it started running in a pretty typical, you know, couple 100 millisecond initialization duration. So it's really easy for us just to take this all the way to the extreme and say, ok, 250 megabytes is good, 10 gigabytes is better, right?

So I'll give it 10 gigabytes. You can see that this one's pretty busted. There's really not a lot of change on the init duration once you get past that initial cliff where you're not swapping. uh and you're not using any more cpu than you need. uh there is some ambiguity about how much cpu usage you get with lambda during the initialization phase versus the runtime phase. They do say that at 1.7 gigabytes, you get a full vcpu, they don't specify what you get during initialization. But in our experiments, you sort of get that full vcpu or you get up to what your provisioned number are during that initialization phase.

The problem as you can see here is that most runtime during its static initialization are not multi threaded or the multi core. So they're not taking advantage of multiple processors at all. They're loading bytes one at a time. As you load file a, you look file a requires file b, you look file b requires file c and so on and so forth. So that's not really something that can be spread across multiple cores. So I think this one's pretty busted.

So I got this question which was really interesting. The theory is, hey, we know lambda is this massively scaled out distributed system and that they have all these individual workers that are running your co your code or my code, your function, my function all at the same time. So these systems somewhere down there have servers, right? They're running on an actual machine which i'm sorry if i offended some of you, there are servers here deep below the abstraction layer.

We know that there's only so many slots for a function to run on a host. I don't know that number. I'm not even sure if aws knows that number but it's there. So the theory is that if i have a function which requires more memory or more cpu it's harder to slot onto a machine, it's harder to assign.

So maybe that will increase your initialization duration, the time spent in those services. So i plugged them in to the same service. And i said, ok, let's start at 100 and 28 megabytes and work our way all the way up to 10 gigabytes and look at the round trip request time and see if there's any impact on the lower to higher order of uh memory. And we can see that this is pretty busted.

They're all around 300 milliseconds. I mean 128 megabytes is 318 10 gigabytes is 305 at in us east one. As far as i can tell, there's pretty ample capacity for lambda. So this doesn't really affect your cold starts. This one's busted along the exact same lines.

I had the same question about what if my lambda function needs to run to 15 minutes. Lambda functions can run between, you know, zero and 15 minutes. And the idea is we know that each of the workers that are running our code have a limited lease and after that, they can no longer run workers and they do that so they can deploy the control plane code or, you know, move hardware in and out of the fleet.

So again, the theory is a five second function is a lot easier to find a slot for than a 15 minute function did the same thing. 300 milliseconds. No difference here. I'm pretty much calling this one busted as well.

All right, let's review these cold start priorities. The most important thing when you leave this room today is to know and reduce the coi load. You have to pull your parachute. If you want to live to make another skydive, you don't have the same fatal consequences in lambda. But if you're fighting with cold sars, this is the first thing to look at.

Finally. Number two, accurately observing your service. If you're only looking at an iteration, you're not getting the full story for one, you're probably missing the cases where you have, you probably miss he, he's all right, everyone. uh for one, you're probably missing the cases where you're, you're getting a proactive initialization and it's not impacting the user facing latency as much as you would expect, right?

So you see that in the in the report line, but it's not actually causing the latency. uh the other say the other side of this is that an iteration isn't including the time it takes to put bytes into that sandbox. So if you're only looking at that, you're actually missing some of the impact that you're able to function is, is actually driving into your users facing latency. This is pullet altitude. The second most important priority.

The third, don't fight the platform. This is where i'd also probably mention something like provision, concurrency or snaps start, snaps start is pretty much a free thing you should turn on and just use. If you're running java, i'd love to see it happen for more runtime provision, concurrency with auto scaling i think is a useful feature and i'll talk about that in just one second.

What i want to address here is the warm up plug in sort of thing that has kind of proliferated. I really don't advocate for warm up plugins because it's only gonna give you one warm sandbox for that exact warm, the number that you set and beyond that, you're gonna pay a cold start anyway.

So overall, you may have been able to amortize a little bit of that cost because you're just warming up a couple of sandboxes. But in reality, like users are going to face that latency at some point in your system and you're gonna have to address it. So you may as well work through these other priorities first.

Finally having fun, take another look at containers. They've gotten a lot better uh massive performance improvements in the paper, they claim 15 x performance improvement, which is super worthwhile. So if you're new to lambda or you have a really large function, uh look into containers.

The final takeaway here is establish and spend a p 99 budget. We're all busy professionals. There's only so much time in the day we talked about amal's law. We know that this only can improve the time. We actually spend initializing functions, which hopefully isn't a lot of time.

So I want you to work through these cold start priorities and move on and solve your problems where they lie elsewhere. All right, thank you. That's all I have really appreciate the time today.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值