Good afternoon, everyone. We have a packed crowd today. I'm really excited to be here with you all. My name is Abhishek Nautiyal. I'm a Senior Product Manager on the Amazon Elastic Containers team and my name is Misha Kissing. I'm a Developer Advocate on the S and Container Services.
Um Abhishek is going to start off the session. If you want to take pictures, more than welcome. The session is being recorded. The slide deck will be available online afterwards. So you don't have to take pictures. If you do, please do me a favor and tag us on Twitter, you'll see our Twitter handles afterwards because I'd like to see pictures of the room in the session as well, but I'm going to hand it over to Abhishek to get us started. Thanks.
Thanks, Ma. So today we'll talk about scaling containers to millions of users. We'll start with an overview of Amazon ECS for folks who are not very familiar with the service. We'll go over some of the key scaling considerations on ECS. We'll discuss API throttling and troubleshooting tips and then we'll go over performance optimization tips as well as best practices for scaling your containers to support millions of users.
Finally, through a sample application we will demonstrate the kind of decision making you can take when you think about designing applications to withstand millions of users.
So starting off with an overview of ECS, ECS is a fully managed AWS native container orchestration service. That means that with ECS, you do not have any control plane to manage, upgrade or install. ECS takes care of that for you and ECS by default is highly performant and scalable and also configurable to support a massive scale for your, for the diverse need of your business and applications.
You can use ECS for any for a diverse set of compute options from EC2 that works across all AWS regions, on AWS Wavelength for local zones, as well as on Outpost for your on premises workloads. There is the serverless paradigm with AWS Fargate which is very popular with our customers. And finally, with Amazon ECS Anywhere you can use ECS to deploy even on your on premises infrastructure and have a single control plane for all your applications, whether they are on premises or in the cloud.
So here we have some interesting statistics on the unprecedented scale at which ECS operates. We are really proud of these numbers. On a weekly basis, ECS supports over 2 billion task launches with tens of thousands of API requests being served every second and ECS exists on all AWS regions that AWS is available in pretty much and it's a launch block. It's a region blocking service. What that means is that any new region will only launch when ECS can support it.
Also, the last point here, about 65% of all new AWS custom container customers are actually using ECS. And this is something we are really excited, excited as well as proud of because this really tells us that customers trust ECS for building highly scalable and reliable services.
Also, it's not just the vertical scale or rather it's not just the horizon on the scale of ECS presence across all regions and wide set of workloads. But also ECS can support an unprecedented scale of for your massive workloads. The largest single account production workload that exists on ECS today is serving over 5 million concurrent vCPU. This is for a customer who's done medical research to come up with new models to treat diseases and it's it's a single production workload that runs across multiple regions.
Also ECS can support up to 750 task launches per second. This is of course for accounts with increased limits, the default option that you have is lower. But here we want to demonstrate the massive scale at which ECS can support for you as and when you need it.
Also a vast majority of new use cases that we are seeing specifically on EC2 instances are for adoption of GPU and high memory workloads, which really tells us that customers trust ECS for large scale resource intensive workloads.
This slide might look a little confusing but it really demonstrates what ECS power can do here. We are showcasing an application which has supported over 2 million vCPU as it scaled out. And this is across hundreds of thousands of EC2 instances. The key points to note here are the speed of scale out as well as scaling in both taking in about 45 minutes. And this is a workload that is supported across multiple clusters in a single region on the same account.
Also, this really demonstrates not only the scalability progress of ECS but also the the performance nature really helps you only use the resources that your application needs so that you can save on costs by scaling down to zero instances when your workloads do not require any infrastructure.
Our customers love ECS for its powerful simplicity and we see adoption from customers of all sizes as well as stages in their stages in their cloud adoption journey. We have customers like Capital One who use ECS for their digital banking solutions. Streaming services like Disney, grocery delivery services like Instacart and DoorDash as well as gaming applications like Ubisoft. All of these diverse set of workloads are being built on top of ECS.
Also, it's not just our customers that are building innovative solutions on ECS, but a lot of Amazon services that you may already be using are using ECS under the hood as a building block. This includes traditional database services like Redshift, RDS, ML services like Amazon Transcribe or Amazon Poly, as well as the batch processing system, AWS Batch, which actually runs ECS under the hood.
So now that we've looked at now that we've covered the overview of ECS, some of its scaling capabilities, we talk a little bit more specifically about the scaling and performance on ECS and then what that means for your workloads.
But before we dive into specific considerations for ECS scaling, I want to mention some of the improvements that we've made in the last year in order to support our customers growing needs of deploying applications at a much larger scale at a more faster speed.
So in this year, we have increased the scheduling rates on ECS to launch new tasks. And now ECS can support up to 500 tasks to be launched per minute, which is a 16 times improvement on AWS Fargate and over four times improvement on EC2 compared to last year linked on this slide deck is a blog post from our colleague Nathan Peck which dive deeper, dives deeper into the into how all of this works under the hood and different kind of use cases are benchmarked for the performance.
Uh I wanted to talk about two examples here where ECS on both Fargate as well as EC2 can launch up to 1000 tasks in under four, in under four minutes, 30 seconds, a quick caveat here. As many of you who might be deploying container services with or without ECS today would know that it's not just the scheduling rate of an orchestration service like ECS, but there are multiple considerations when you deploy your applications that inform the overall deployment timelines.
So things like your size of the organ container, image the container, image pull behaviors, configuring load balancers, defining health checks to deem your services healthy. All of these considerations end up contributing to the deployment times and you might see variations and we'll talk about some of these. In fact, most of these considerations in the latter half of this call where Mas will take us through some of the best practices around improving your experience on using these considerations.
So now this is the top main topic at hand today where our customers want to use a want to build a service to support millions of users during peak hours. And the questions that customers typically have is how do we build such applications and how do we build these applications on ECS?
Before we dive into ECS specific scaling considerations? I want to quickly mention that scaling and performance tends to be a complex topic regardless of whether it's for designing a net new service or scaling an existing service to support more customers.
Typically, the way that we think about scaling of our services is in terms of application specific units which for example, could be that you might have a machine learning service that is processing requests from a queue to derive some inferences and the, the scaling unit you might think of would be the number of job processed per hour per minute.
Likewise, there might be a web, web service application serving users accessing your application through a website or, or, or through an app and the scaling unit for the application, you might think of it be the true put of your API or the number of users that you can support concurrently per second.
So here we encourage you to start thinking from these business driven application specific units and then identify at an application level how these units map to the required resources for running your applications.
Typically different applications have different resource requirements. Some applications might be more memory bound maybe intensive. For example, Java applications that might require more memory to hold the JVM or applications that are processing large data sets in memory.
Likewise, some applications would require more CPU processing power. A typical example would be machine learning applications that require GPUs for computation. Likewise, other resource bottlenecks can be around your input output operations, storage, network bandwidth. And so on here, we really recommend you to first test your applications under different load scenarios and simulate what your application output would be when you increase load to achieve. For example, 80% utilization of these resources, what you will often find is that one or more of your, one or more of these resources could be the bottleneck for your application.
And that helps you in two ways. First, it helps you in finding opportunities for better tuning your application along these, along these resources so that, you know, you don't have to wait until you hit millions of users. And you can proactively design your applications to be resilient to high high user traffic requirements.
The other way this really helps you is mapping your resource requirements to ECS specific units which are the resource requirements that you specify on an ECS task. For those who are not very familiar with ECS constructs ECS task is the primitive that you use for defining one or more containers that comprise your application and express the resource requirements for those containers.
So once you understand the resource requirements of a single container and a single task, you can reason about how many tasks, for instance, you would require to run when you are supporting millions of users. And we'll dive deeper specifically on things like this in the second half of the session.
But right now, I just want you to think about how you map your resource requirements to ECS tasks and think about the number of tasks you would require to replicate to serve the growing need of your users during peak service times.
The second thing you need to think about when building for scale and performance is not just the scale and performance is not only at the application level, which is creating replicas of your application, but alongside the replicas the underlying compute infrastructure that is holding your application also needs to seamlessly scale out.
And if it doesn't do that, then regardless of how much you have to tune your application, you will be bottlenecked by the underlying compute resources for launching more tasks.
Here you have a wide list of compute options as we saw earlier on ECS, right from, from traditional VMs on EC2 to serverless to premise workload to edge and on premises, workloads with AWS Outpost, Amazon ECS Anywhere Local Zones and Wavelength.
Typically, unless you have some requirements of running your workloads on the edge, uh things like data residency requirements or very latency sensitive applications. Typically, you would not consider on premises or edge and 5G.
The typical choice our customers are faced with is whether to use serverless or Amazon EC2 as the compute layer. And it is here that we highly recommend our customers to start with AWS Fargate.
Fargate is the serverless compute engine which takes care of multiple responsibilities on your behalf so that you don't have to think about managing your compute infrastructure at all. Without Fargate, you would have to plan for scaling out your um scaling out your compute capacity, selecting the right types and sizes of instances that you require upgrading your underlying instances and capacity with the right operating system updates as well as security patches.
Fargate takes care of all of this on your behalf and provides you a simple interface where you just think about your applications, resource requirements and express them in task definition and ECS Fargate takes care of provisioning the right types of compute and maintaining that compute for you.
However, there are, there are times when our customers want to use, EC2 compute the reasons. Some of the reasons for that might be that customers who want control over, deeply customizing the right types of instance selections that they want to have. For example, you might have tested your deep learning workloads on a specific GPU centric EC2 instance. And you want to use those instances for your application to achieve better performance. In that case, choosing EC2 provides you more control over the infrastructure that is provision.
Some other examples could be that if you want more control over the underlying EC2 instances, for example, to apply security patches, um we add them to your Amazon machine images and so on. Those customizations are enabled by EC2.
But along with the deep customization, EC2 has a lot of compute capacity management concerns that our customers have to deal with. And capacity providers is a feature that we highly recommend our customers to use. For that reason, capacity providers is how we at ECS think, think about compute capacity configuration for all workloads, we have capacity providers for both EC2 and Fargate.
For Fargate the capacity provider for both Fargate as well as Fargate spot is completely managed by ECS and you just have to select that capacity provider. And for EC2, you can create your own capacity providers, bring in your instances configure them on the auto scaling group. And then ECS takes care of auto scaling those those instances within the capacity provider to closely match the resource requirements as well as to launch, to launch new tasks as well as to scale to scale out or to scale in your capacity as your task load changes over time in response to customer demand.
This is a space that we are very invested in and excited about today. Capacity provider provides you cluster auto scaling features on EC2 where you do not have to perform any of the scaling operations on your own and can again focus just on the application constructs building your container images and let perform the managed and and operational heavy lifting for you on your behalf.
And this is our vision for our capacity providers going forward to take more of management out of your hands so that you can really focus on what brings value to your business, which is building innovative applications and as does more on your behalf.
And you can of course use your uh you can use both Fargate and sport with compute saving plan as well as on spot to reduce your to to reduce your costs. Compute costs rather
So now that we've looked at ECS, um, different use cases for customers, the scalability and performance that ECS can support, and some of the considerations for choosing the right compute type and thinking about how you would want to design for scale for your applications, now we would dive into some of the scaling considerations on ECS and best practices for you.
And I'll let me take over. Thank you Abhishek.
So with, besides the platform in the background, which you have seen what it's capable of doing, I want to go into a couple of recommendations and best practices that you should look into when you start to scaling applications to really, really large workloads.
We're gonna be talking about service quotas, how many ENIs you can run up or how many VPCs and this will go to with more detail, the fact of throttling, which means how many API requests you can make to the AWS APIs.
Some performance optimization tips.
And lastly, we'll go into a hypothetical application of how this kind of exercise would be done or I would do it with a customer when they came to us with some kind of a requirement, it might look a little bit daunting.
There's all different kinds of quotas, all different kinds of areas where you need to make sure that you have enough capacity or resources within your account. Be it actually at the account level or the cluster level or the service level or the task level, we try to differentiate between these different um these different different entities to firstly give you more um flexibility that you don't have to raise everything in one specific location for all of your applications that can be done on application by an application basis.
And what I would like to also state and um emphasize is that ECS is a service which also interacts with other AWS services underneath. When you bring up a ECS task, it's not only bringing up a container, it's also bringing up. If you're using EC2 as your capacity provider, you're bringing up an EC2 instance which is allocating a network interface, allocating an IP address, registering it to alert balancer.
So besides the quotas that you have to look into, and we will give you links at the end of this session of where you should find those quotas with the documentation and best practices. You might have to also consider that when you want to get to really large scale applications and you go over the default quotas, you might have to raise other limits as well, not necessarily which are ECS proper allocated resources in under the ECS console. But other things like load balancers networks, subnets, et cetera, et cetera.
For example, over here, one of the, one of the um one of the items that you might have to think about, but we're going to listen to a little bit more detail afterwards is load balancers. By default, a load balancer has the a the ability to add 100 sorry 1000 targets to an ALB. If you are going above that in the number of tasks within your AWS, uh if within your ECS service, then you will have to ask for a limit raise for that specific metric. And it's not that difficult to do. All you need to do is to go into the console self service portal. Ask the request to add additional resources to your account for that specific metric.
If by chance you come across a metric which you don't find with inside the service. Uh the the console for um your service limits, feel free to cut a ticket to your AWS or support and they will help you raise net specific limit if it's not available in the console.
The AWS services are publicly available from anywhere in the world. And as a result, they serve all of our customers. And in order to protect not only the service ourselves but other customers as well, we limit the amount of API requests any one of our customers in any one of the accounts can make to any specific API.
When we talk about uh API request and API throttling, we look at the two different kinds of um categories. The first one is synchronous, which means that you the customer make an API call from your account, either through the CLI or through the console or through your automation tool to call the API within your account. For example, if you do a run task, API start a new ECS task, that would be one synchronous call from your tooling or your CI or your console to make an API call.
If you start to do that too many times and too short of a period, we have the threatening mechanism in the background to protect also you from having a malicious actor inside your account and start running up an extensive extensive cost and also to protect the service. So it doesn't overrun and also have impact on another account or even worse that somebody has impact on your account, which you don't want when you have some impact on somebody else. It's less of a problem when some impacts yours. That's more of an issue, at least for the customer experiencing those issues.
Besides the synchronous API calls, ECS, as we said, makes additional API calls in the background to other AWS services on your behalf. So when you say a run, for example, we had the run task API to start a new task on my ECS cluster running Fargate, you made that synchronous call. But in the back end ECS makes the additional asynchronous calls. Like for example, to start up a new instance in the background, attach a new ENI to that instance inside your VPC and that instance into a load balancer. All those API requests also adhere to the same methodology and rules that we have for throttling.
If you make too many requests as a result, you will see you start with being throttled and you can of course monitor these things because every API call which you make in your account is of course logged in CloudTrail.
So for example, on screen, you can see that ECS has made a request on your behalf. There was a throttling exception because there were, I guess too many API calls made specifically for this uh request. And therefore you will get throttled and the request will, in this case will not be successful.
But as a result of having the mechanism in the background done through the API will continue a retry until you do get these resources allocated to you. We'll see a bit about that of the retry mechanisms in a minute.
How do you make heads or tails of all these throttling events? So CloudTrail of course, can be forwarded into CloudWatch, which means once it's in CloudWatch, you can set alerts measure metrics. And of course, for example, also parse and analyze these events to go into more detail. Who was the actor or the IAM user or the role performing these requests to understand exactly what is making all these requests and why I'm getting throttled.
And of course, you also have the option of using, for example, Amazon Athena to query your logs directly without having to go and understand you can use these as a query language, a standard SQL query language with SELECT statements if you're a good DBA which I am not, but you can do that and understand exactly where your throttling your, your multitude of request is coming from and address that specific problem.
If you are using the AWS SDKs, we have built in retry mechanisms mechanisms inside the inside the code. So if you're using, for example, in Python Boto3 or the SDK for another one of our programming languages, these mechanisms will be able to automatically retry and what we call back off or it's called back off. In this case that these requests will slow down slowly until the API will allow you to make more requests and continue with the original request that you made.
If you are not using these kind of tools, for example, you're writing your own bash scripts or you have some kind of workload, which is not using the, the AWS SDK, you will need to build those mechanisms yourself to understand that you're being throttled, receive a message which you can actually do with EventBridge.
So for example, your event will be forwarded to EventBridge. Once it is received by EventBridge, you can trigger another action. If any of you know the tool IFTTT - if this then that. So if something happens, do something else, EventBridge, you can allow you to do the same thing. So for example, kick off a Lambda script or a Step Function to start slowing down the requests based on my original action so that I don't get continuously throttled in my application will not scale.
These will usually um turn up as events in EventBridge for and in the APIs which we call an ECS Operation throttle and a Service Discovery throttle when you're using Service Discovery, which is pretty much everybody which is using ECS today.
And by using these triggers and events afterwards, you can pretty much automate yourself out of your own problem by implementing the logic and allowing yourself to sub exponential back off on your applications in order to continue to deploy your applications in a safe way.
And again, just to remind these are already built into the SDKs, you don't have to do that on your own unless you're making your own tools and your own language and your own um workflow engines. For example,
Let's talk a little bit about um performance optimization. AWS s Amazon ECS uses. For example, when you deploy multiple instances of your task service, these are adding to a load balancer. The load balancer is an external service which we use and make use of as part of the offering to you.
By default, there are two metrics where could be too slow for you when you are registering registering new tasks and it could delay your scale up and also further down the road, scale down of your applications. If you don't, if you adhere to the defaults, the two metrics are the health check interval. And the second is the healthy threshold.
This means that it will take by default 30 seconds to check or wait until the application comes into service. And then with the health healthy threshold, we check this five times afterwards to see if your application is healthy.
If you know, and you've written your code properly, like you all should be doing, you can make this a lot faster and those defaults are customizable. For example, if you remove the or sorry, remove you reduce the the interval seconds to, for example, 10 seconds and the threshold count to two instead of going from 2.5 minutes until your application becomes live in your load balancer, it will take you 20 seconds that allows you to scale up a lot, lot, lot faster.
And on the other end, when you want to scale down connection draining on the load balancer by default is five minutes. That means it will wait for five minutes from the minute you send the command to the load balancer to remove one of those instances out of the target group and take it out of service. It will wait for five minutes to start draining and to wait to see if there's no traffic going through.
If you have written your applications in the correct way, you can reduce that amount of time, of course as well. So that can go down to 20 seconds. One minute. It will depend of course on your application and your requirements and how it behaves. That's one way to speed it up.
ECS of course, is built on Docker and Docker. Of course, part of the engine allows you these primitive allow you to signal the container to shut down with the SIGTERM. That means by default as the configuration of the ECS configuration, the agent which is running on the machine or Fargate ECS will wait for 30 seconds for the container to shut down. If it doesn't shut down within 30 seconds, it will kill it forcefully with the SIGKILL.
If you know that your application for some reason will not adhere to this or you would like to optimize it, you can of course reduce that amount of time. So to scale down your instances will be a lot faster for waiting for 30 seconds for each and every task to stop and above and beyond that, the draining and the load balancer, this all adds up to the amount of time for scaling in which is cost and money and anguish for people who come from your CFO and say, why are we spending so much?
There was one thing which I forgot to mention, sorry was you can also use a feature which we released not so long ago. Called um task scaling protection. If you have tasks running a job or a batch or some kind of workload that needs to ignore specifically the fact of a SIGTERM or a SIGKILL, you can incorporate this feature into your code that it will automatically let the ECS API know that I'm busy doing work and do not shut me down. Because if I do, I'm gonna lose my processing information, which I've done.
Once the application is finished processing the information, it will send a new API call to the ECS control plane which will mark, allow it to be scaled down and killed and then it will kill the application. So this is a way to make sure that work workers or tasks which are running long living applications or long term um long living. Ok? I'll get to questions in a few minutes. Ok?
When we at the end where we go, we'll go through Q and A sorry. At, at the end, it will allow you to run long living applications and make sure that it processes the data until the end without having to be killed in the middle and restart these applications in the work again.
Let's talk about container caching containers run from container, images, images are stored in a repository of sort. So I'm going to differentiate between two different two different platforms that you have for running containers for the compute platforms, which is the server Fargate and EC2 because they're different for EC2, the you can configure the ECS agent to pull the image once and only once it will be cached on the machine and then subsequent pools for that specific task or container will use the local cache. Instead of having to pull every single image each time, which will take time to transfer over the network unpack and start up. The images will speed up your image launches or the task launches on EC2.
For Fargate, it's different because Fargate is a serverless compute engine. And for every single task that we start, we start up a new micro virtual machine, micro VM and specifically sized for that task. So there's no option of caching.
So first, I want to tell you we're working on a solution for that as well in the future to make it faster as well. That's one of the most I think requested features on GitHub at the moment. If not the one before our GitHub roadmap, which is public.
And secondly, in order to do this on Fargate number one, make your images as small as possible and optimized as possible.
Um you can use larger task size to unpack these images quicker. So if you use more vCPUs with faster or more network bandwidth, these can process these large images in a much faster way which will speed up the deployments.
And the third thing of course, keep your images local to the specific region where you're working in. Don't start pulling images from one region across the world to the other. It's better to replicate those images through ECR which is possible today based on every single push instead of having to pull images from one central region with your workloads in another region as well.
The next um tip or optimization um point that I would like to touch is less something specific for containers or ECS. But it is kind of related is the fact that you have to understand your application and how it behaves. Abhishek touched on this before with the fact of that nice start of how much CPU, how much memory, how much disk, how much network bandwidth you're using.
You also have to understand where your application is performing its actions and what is taking that amount of time. So if there's a slow start, if there is a slow start up for your application, because you're downloading dependencies or initiating variables or unpacking information from S3. In the beginning, there are ways of course to measure this and you can use your code to instrument these calls and using the ECS agent metadata API you can send these metrics to CloudWatch to understand what is taking long, how long it took to start my container from full time to healthy state.
Understand those metrics to understand where you can optimize your code and make your applications launch faster. This is relevant for EC2 or Fargate. But this is just a general recommendation we give to our customers as well.
Network mode two options you have today is AWS VPC and second is bridge your host mode. AWS VPC allows you to provision a network interface for each and every task which is an ENI and which is a I would say a first class citizen in your VPC has its own IP address. You can allocate security group rules to it. You can, if you really want implement NACLs because of the IP address, it's a full, fully IP address.
It does have a drawback specifically for uh for example, Fargate or in this case, also when you deploy on EC2, it takes a tiny bit longer to provision those ENIs because it's an additional, as we say asynchronous call that in the back end goes to another service to provision that ENI attach it to the task, attach it to the EC2. It will depend on your flexibility and your requirements. What you value more or need to adhere to more if it is the security or the flexibility.
If you need things to scale up very, very quickly, you can use bridge mode which allows you to allocate local or I would say not virtual but the local IPs to the EC2 instance. In this case, it doesn't work on Fargate because Fargate does not support AWS VPC mode. Sorry, it doesn't support bridge mode. Only AWS VPC mode. And this will allow you to scale up your task a lot faster if you need to on EC2 by using bridge and not AWS VPC.
Ok. Choose the correct instance type. Once you have your unit of work for your task size, make sure that you're choosing the right kind of instance in the right, I would say multipliers of the number of tasks you can fit in to that instance. You don't want to come to a state where you would have, for example, an excess of 10% of your EC2 instance not used because you can't launch a new container because there's not enough space.
And you don't want to get to the case that once that happens if you're using, for example, EC2, you need to scale up your auto scale your EC2 cluster with your auto scaling group and capacity providers which will take longer for your task to launch, try and make sure that they fit exactly what is called bin packing inside your EC2 instances. You will need to do some testing, of course, in order for this to work and make sure that the resources that you're allocating for your clusters in the back end will fit the workload that you're running on your tasks as well.
When using AWS VPC mode on for your network for your instances, you should look into something called ENI trunking. For each EC2 instance, there's a limit of how many ENIs you can attach to it based on the instance size, the number of IPs and ENI you can attach that will also limit the amount of tasks you can launch on your EC2 instance. By using a trunking, you can increase that amount significantly significantly depending on the actual instance size and the instance type you're using.
Let's go over for example, a sample application. Before we start the sample application, I want to make kind of a statement. And that is that for most of our customers, you will not hit these API limits which you're going to be talking about. ECS has got the most has got same defaults which allows you to provision your workload. And again, of course, it's very simple to ask for the scale those quota increases for most customers. This hypothetical scenario will not really be relevant.
So let's have a look say I've had a customer that after profiling their application and doing all their testing and working out that start which we did their application uses two vCPUs and two gigabytes of RAM. And the maximum amount of concurrent requests that this application can take is 200 no matter how much more CPU or what kind of instance size or whatever else it is. The application will not go over 200.
My first recommendation to the customer is optimize your code, allow the fact of giving more resources to process more, more requests. That would be the first thing. But for this example, we'll say the customer cannot do that because of all kinds of legacy reasons or whatever the code doesn't just at maximum that the speed of light can work.
Perhaps in order for them to get to 1 million requests, they would need to run 5000 tasks in their cluster. Simple math. So let's go through the, through the exercise and the limits which we currently have the vote limit for the number of tasks per service in ECS is 5000 should be pretty much. Ok. 5000 tasks, we'll be able to spin off 5000 times two CPUs or whatever, we'll get you a million requests assuming that they have a big enough subnet, not slash 16, probably a lot smaller, but still the number of ENIs if they're using the background IP addresses, they should be pretty much fine.
They don't need to worry about provisioning a big sub. If you're using a slash 24 you're going to hit a really big problem very, very quickly. So you have to have a right size subnet and split this, of course, over availability zones. In order for you to provide redundancy to your workers, that should be ok. You will need to adhere of course and ask for vCPU limits inside your account. That is a simple request and you can't run a million tasks or a million EC2 instances in your account by default, you need to ask requests for, for, for limit increase that will need to be done by this account specifically, I didn't go into the numbers because it's not really relevant.
We mentioned the limit of 1000 targets per a LB. That's a soft limit. But in a support request, in this case, I could raise it to 5000. We'll see why that might not be a good idea in a second. But in theory, I could put 5000 targets into an ALB and that should solve my problem with my specific application by default VPC has a limit of 5000 ENIs in the region per account.
So it's in yellow here because I would suggest for this customer to raise it slightly because there's probably something else in another VPC by another user by another application running in that account, which will probably cause them to hit a limit and they won't get to the 5000 tasks. So that can also be raised with the support request.
The next limit is Cloud Map instances per namespace. How many um namespaces will i need to um how many instances will i need to put in my namespace for services? Probably 1000 should be ok. The one underneath says AWS Cloud instances for service. If I was to put 5000 or in theory, would like to put 5000 instances into my namespace. I wouldn't be able to because this is a limit which you will not be able to get to 5000.
I would suggest that you look into a new service which we released, which could provide you a solution for this, which is uh AWS ECS service connected was announced two days ago, but also was in the keynote today which allows you for service discovery in a much simpler and easier way which should probably solve this problem. But for the theory for this exercise, we're going to say that it is not possible.
So I've hit a, I've hit a blocker. I can't register more than 1000 instances in my service in Cloud Map which brings me to the topic of cells, cellular architecture is something which we do not always recommend to our customers because it has, I would not say significant but substantial overhead allocated with it. You need to do additional engineering work to provide added layers and reputation between these layers and load balancing between these layers, which is not built into the services that we provide today by default.
So you will need to do some wiring on your own and work on your own in order for this to work. But essentially what it does is it takes your big service and splits it in pieces, which allows you to go from 5000 tasks in one service, 25 tasks with 1000 services each. Again, this has operational overhead because you do need to provide a these two layers in the middle with the hashing mechanism. For example, to manage the routing of these tasks in between these different cells. And they have to stay in the right one with some kind of stickiness.
These are things which will have to be done by you in order for this to work and some kind of front layer front and proxy in order to route all these things in between, which is pretty simple to do with a load balancer today with some logic in the back end. But this allows you to overcome what we call hard constraints, which will could also be from my case, in this case, for this example was the 1000 instances in a service in Cloud Map.
But it also could be throughput to a database or throughput in a network which you cannot overcome the speed of light or the number of reads or writes from a specific database by doing that. We're going to cellular architecture which some of our services actually do in in AWS allows you to scale it pretty much endlessly.
And one more time, I'm going to say it's not always recommended to do this because it has some kind of operational overhead, some engineering overhead, some complexity involved, you need to design this properly, not something that we give you complete. perfect guidance how to do. There are some white papers of cellular architecture on our builder hub, which you can find on the internet, which you can dive a tv into.
And um this would be the way that I would suggest to the customer to overcome their hard requirement or their hard limit of 1000 instances in a cloud map service.
We're coming close to the end of the session. So before we do, I would like to leave you with a couple of resources. First one is a blog which Abhishek mentioned before of the improvements we've implemented within Fargate over the years, how this actually works in the background. Very detailed blog post from one of my colleagues, Nathan Peck.
You can see three documents which are the best practices guide for ECS, which a lot of what I talked about today and Abhishek talked about today appears in that best practices guide. It's publicly available. We like giving our customers self-service tools and documentation in order for them to do the work themselves instead of having to come to us all the time. That's the way we we, we work things in Amazon.
The last one is a blog post from one of our community heroes, which I would like to give a shout out to these community heroes are people which contribute to the AWS community. They're not AWS employees, a gentleman by the name of Vlad Unesco. Unfortunately, he's not a reInventor because I really would have liked to shake his hand, but he wrote, wrote a very detailed blog post and it's a 30 minute read, but a very, very fascinating 30 minute read of how Fargate in ECS has improved over time over the years.
He keeps on checking this once a year to see how fast they can launch tasks, how fast things work within ECS, how long it takes. And the interesting thing is in this blog post is all of these improvements which you see over time based on these, on the data which he has collected have been done without you, the customer having to do anything. It's all work which we've been doing in the background to make things easier for you simpler for you and not have to worry about all these kind of things in the background such as provisioning EC2 instances and all the other things which, which we actually do take care of for you through the service.
And this is our emails on the top on the screen. Um and twitter handles if you would like to uh give us a ping.