Dive deep into Amazon ECR

最新推荐文章于 2024-07-13 20:33:40 发布

李白的朋友王维

最新推荐文章于 2024-07-13 20:33:40 发布

阅读量96

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134836376

版权

Welcome everyone. Thank you all for being here during your lunch hour. Uh Ralph and I are super excited for you to be here and we can tell you about our service. My name is Mike Alfree. I'm a principal engineer with AWS. Um I'm gonna turn it over to Rafa. He's gonna start us off.

Hi, everybody. Uh happy to be here too. I am Rafa Bre, I'm one of the PMs in ECR and like Mike said, we're gonna talk a bit more about what the service is, how it works. Some of the inner workings of the service to do this today.

We're just gonna think about a fictionalized application that is containerized and the journey of my laptop going to ECR, going to the different services and how ECR intervenes in each one of those steps. So for the sake of having this conversation, let's make, let's create, let's think about this application that we create. And again, this is fictional, but since we are in Vegas, we decided that we wanted to play some poker because we're not very good at it. Uh we use gen AI to build a nice carved whisperer boat that is gonna help us with that. Uh not a panel.

Um so the the thing here is that because we are modern application developers, we're gonna build this in a containerized way. So what's on the screen right now is probably very familiar to everybody. I have my code which again is deceptively simple and then I have my Dockerfile with details how the application is going to be built, how the container image is going to be created and using well known tools like Docker itself. I can build a container image as we all know the container image contains everything it needs to run. And what I have right now is basically a container image in my laptop which I can use to test to run, etc, etc.

What is the next step? The next step of course is that I need to get it in my workloads EKS ECS. And to do this, I need a registry. I need a tool that is gonna help those services retrieve the container image at a scale whenever they need it. That's the job of ECR Elastic Container Registry, we basically support EKS ECS or any other container service to be able to pull the images quickly at a scale in any of the AWS locations.

So let's do that. I have the image in my laptop. We're gonna push it to ECR. Now what's interesting about container images is that there is a very well known industry standard of how you manage them, right? It's OCI, so if I go into ECR right now and I create a new repository, I can click that button on the top and it will tell me the OCI commands that I need to use to push my image. I tag it, I push it and then it, it gets stored on my, on my registry.

So let's do just that uh top left, you can see my terminal, it my image and then I pushed it. And then as I go and I build new versions, I continue to push new images onto ECR. For the purposes of this talk, I created a bunch of versions using semantic versioning so I can clearly know when I make major or minor changes. But what I have at this point is a series of images, all of them stored in my repository in ECR which are available for my uh workloads to run.

So how did that work? The interesting thing about ECR as opposed to maybe other AWS services is that in this particular case, I mostly not only but mostly did not talk directly to the front end. I talked to the proxy service which interprets OCI commands. So when I do docker push, what's happening is that the proxy service is receiving the command, turning it into an API call that a front end service, a front end for a AWS service can understand and then work with the rest of our back end storage S3 Dynamo for meta data. I don't think these are news for anybody.

Um but that's a detail that makes us here a little bit different and requires us to sort of manage that communication in a way that works for customers. So Mike can tell us a bit more about what? Oh, I pass the line.

Thank you, Arthur. So the previous slide you can see that's kind of our simplified architecture and I'm going to use that model as we dive in more. But here I want to take a step back and just talk about what a container image is. Um it's pretty simple, there's a manifest which is typically a JSON object that describes what's in it. So layers, what what number of layers, how many layers, what the size of the layers are and then the layers themselves make up, make up what is a container image and a layer you know, is kind of you can think of it kind of like a file system and then the layers get merged and then that you run your container from there. So that's, you know, a high level what a container image is and that's what ECR stores. And then we're going to walk through how those services that we saw in the previous slide take this from Rafa's laptop and store it in AWS.

Um so here you can see kind of inside the manifest. So this is the a manifested an example, this is Rafa's manifest. Um and you can see, you know, it has a schema version, it tells us it's JSON um has a digest. We'll get into what a digest is a little bit later. And then the layers themselves and layers themselves have digests. And then, then the sizes of the layers, container images can be big or small, they can be tens of megabytes, they can be gigabytes, um tens of gigabytes. Uh and they and ECR stores them as efficiently as we can.

Um another concept in container images are tags like Rafa mentioned, he was using semantic versioning tags can also be many to ones you can have. Here's an example of like a v3 tag where there's different formats depending on if you want to, you know, accept the late latest minor version or the latest patch version or the very latest version, you can use any number of these tags. Latest is the default, but you can define your own anything you want as a tag again, ECR has to store and index all that information.

Um so going back to our architecture diagram here, um like Rafa mentioned from his laptop, he was using Docker and Docker will talk to our proxy service and that's our proxy service is like an HTTP server that understands the OCI protocol. On the flip side, we have our front end service and that's a pretty typical pattern at AWS. That's what you would talk to with ECS or APIs. So what we do is we take the request from Docker to our proxy service over HTTP and then forward that to our friend service calling the same API as you could call directly if you wanted to from the SDK or an SDK from there.

You know, we have a, what we call our metadata service. That's the thing that does all the indexing during all the tags and all the metadata that information we keep in DynamoDB and then the blob objects themselves. The manifest is considered a blob, it's a JSON object. We keep that in S3 and then the large layers are also stored in S3.

So um I'm gonna pivot the diagram a little bit into more of a sequence diagram. So we can kind of walk through the flow of how a call makes it from Rafa's laptop all the way up to ECR.

Um so like I mentioned, this is this is going a generic slide about how all calls work. So like I mentioned the go to proxy service front end and that allows clients like Docker Helm anything OCI compliant can talk to our front end or sorry. Our, our proxy service and proxy will do the translation to front end.

Um the job of front end service is two, is three parts, these three things here authenticate, authorize and throttle. Um so we're checking IAM policies, things like that, making sure you know, the token is valid and making sure you can take the actions that the registry is configured to support for your, for your registry and your repose. And ultimately, we also have um, a big component of front end is throttling the request.

R we'll get into some of the numbers, but ECR is, are pretty, actually, I'll just jump to it right now. Um this is on the push side. So ECR process is quite a, quite a lot of data a month pushing into AWS, we serve about 200 million pushes per month. Um which translates to 250,000 just during this hour talk, it's hundreds of petabytes of data. And like I mentioned, the throttling, why it's important is we all AWS services do this. We set service limits that protect us and customers from, you know, interfering with each other so that you get a fair share of the compute that we run. And these are some examples of service limits that power would effectively are the push API so that, that push that Rafa did on his laptop looked like one step. But under the covers, there's actually multiple calls happening from the Docker client to ECR and these are the names of our APIs. We'll get into the next slides, what they are in the OCI spec, but this is the list of them.

What's interesting about this is it actually starts with the layers first and ends with the manifest. You can think of push as sort of a progress thing. It allows the client in this case, Docker to push the data to us in a way that if it gets interrupted, it can resume. So you're, you're it's sending up all the layers first and then it completes with the manifest. And then that's what tells us the image is now a an image.

So starting with the first one, we call this on our side check layer availability. So the Docker client wants to know if we have it. If we have it, it doesn't want to send it again because it'll just waste bandwidth, it just a lot of extra processing. So it does this call to us to say, hey, do you have it? And the way it does that is a HEAD request in HTTP from inside of Docker to our proxy service. Our proxy service then calls the API batch check layer available. It's called batch, but there's only one item in this. When it comes from proxy service. As a customer, you can actually put multiple things in that and call it directly if you want to. But Docker talking to proxy will only be one the front end service then forwards the call to the metadata service and the metadata service checks DynamoDB to see if we have that digest digests are unique, they're immutable. So they can't change. So if we have it in the database, we can tell the client we either don't or we do. So 200 means we found it, we have it 43 means we not found we don't have it just typical HTTP protocol stuff.

So that, that layer there that you see HEAD and then the responses, that's the OCI spec everything else is the inner workings of ECR.

Um so now on to the next step, if Docker is told we don't have it. So 404 they're going to initiate the first layer upload to us. Um what's interesting is most clients will do these in parallel. So there's 20 layers, it, it'll do up to five at a time in parallel. Um so we'll, we'll, we'll look at it as if one is happening, but keep in mind this could be happening multiple in parallel here.

So in this case, now they send, now Docker will send a POST requests um to tell us, hey, I want to create a blob inside ECR. So it sends us a POST, we translate that to our initiate layer upload, send that off to the metadata service. Now, in this case, instead of DynamoDB, now we're going to prepare S3 for the blob data. So we're telling S3 kind of the same thing, hey, get ready, we're going to upload a blob to you at this point. There's no bytes in the request. This is just telling us they're about to send bytes. In this case, what we do is like I mentioned before, this is sort of a progress tracking thing happening. So we tell the client an upload ID and then they use that upload ID to send bytes to us.

Next step is to send us the bytes and we call that upload layer part. So now the client has potentially a 10 gigabyte layer they want to send us

So what they'll do, Docker will do, is initiate a patch request to this URL using that upload ID to our proxy service. That's a single HTTP connection that it will stream all 10 gigabytes to us. What we'll do between proxy and front end is we actually break that up into parts and parts are typically 20 megabytes roughly around there. And so the proxy service will hold that connection open to Docker and send the bytes up in 20 megabyte chunks. It actually does that in sequence according to the spec, it's kind of how it has to work.

So it's sending each part up to front end, front end then. So here's where we get into what the digest is. The digest is a cryptographic hash. Typically it's SHA 256. So we do this, you can see the digest there at the bottom. We this is what we calculate to know that the bytes we're receiving or what the client intended to send us. They calculate it on their side, they'll send us the digest later in the manifest. But while they're sending us the layer data, we have to compute the digest.

So we're doing this digest calculation for each part that's happening, meaning it's a partial calculation. So we don't know the full digest yet because we've only received the first part and then we'll receive the second. So we have to store that progress in Dynamo. This is running inside of a cluster, sometimes upwards of like 300 hosts in there. So requests could be bouncing around each host each. So we have to then store that progress. Pull it back, continue the calculation until we received all the parts. And then we know the final digest.

Then once we've done all the parts or sorry, and then we upload the parts also to S3. Of course, we've got to get the bytes to S3. So we use S3 for a blob storage. It's an efficient blob storage and we'll get into how that's beneficial in pool. But right now, we, we put all the, all the bytes in S3 and let the client know we've created the layer. There's other parts of the spec we're not going to get into here. But if it got interrupted, the client can resume, they could start over if they wanted to. It's a whole bunch of like ways to resume things. But for this, we'll keep it simple um and move on to the next step, assuming everything went well, we now have the data, we don't actually know they're done.

So the next step is for the client to tell us we're done. We've sent you all the bytes, we don't intend to resume and send any more. And the way they do that is this complete layer upload. Um and again, just kind of flows through. It's a post requests, translated to an API request all the way up to S3, we tell S3 we're, we're, we have a full object. Now we store the JSON blob in S3 like I mentioned and now we have a container image. So we return 201 telling the Docker client we've created the image, it's now ready to be served.

Now, head back to Rafa. Cool. Mike and I were preparing these slides. One of the things that came up very clearly is that the theme of the presentation is sort of everything is more complex than it seems on one side because of the scale Mike shows some numbers. But in general, because all of these processes are more involved than what it looks like beyond just push and pull.

Um and this is another example, this is a fun slide because it's just basically empty stuff goes from ECR to ECS. And the reason we're doing this obviously is because if you go back to my, to our poker playing bot, um we managed to get the image into ECR and now we need to deploy it. So we're gonna get the image from ECR to ECS and to do so we just create a task definition where we include, you can see that in the second half the image URL and we have to manage permissions to make sure that ECS has access to it. But this just works. Our image is now running uh I triggered it and then we scale a little bit. We can see that at will when ECS needs it can go and fetch the image, it doesn't need me to be there. He doesn't need me to do anything. This is again a deceptively simple process.

One of the things that is interesting in this one is that as opposed to in the previous case where we talked about how we work through proxy, the layer data itself is not even going through ECR at all. We just flow, it flows directly to the customer in this particular case to ContainerD, which is the result for ECS, but it doesn't go to ECR. So once again, we have a bit of a twist on the regular mode of operation that we need to do in order to support the scale and type of workloads that we deliver.

So let's talk a bit about what's going on here. Hm. So some more numbers. So we looked at push, push was 250 million. Now we're in the billions pull is somewhere around 200 times the amount of traffic we receive on the pool side. So typically you think of like images, customers will push them one time, but then they might have a big fleet or they might have a big ECR cluster or sorry ECS cluster or EKS cluster where each of those nodes will get a copy of the image. So we have a very big, you know, difference between push and pull.

So here we serve, every time I change the slide ECR has served 1.5 million images globally. Um so every 15 seconds, that's how many images we serve somewhere to push pull has multiple steps. It's a little simpler. There's only three steps. And as you'll notice it kind of flips the order. Now it's going to get the manifest first and then get the layer second. But again, this is when you say docker pull, it's actually three steps and actually multiple because each of these layer steps is happening per layer.

So in the first step getting the manifest similar to push, now everything is on the get side. So it's making a get request to the manifest. Now, you can use your tags. Um if you want to look up by tag or if you know the digest, you can use the digest tags are, you know, more friendly, you kind of know what they are digests are those long strings.

Um ECR will do the look up either way and find the, find it in our DynamoDB table. And in this case, since it's the manifest, we'll go to S3 to get that JSON blob and then we'll return that JSON blob back to Docker. So then Docker or in Rafa's case, using, using ContainerD, either one can use the manifest to read it and know all the layers it now needs to then ask for ECR in the next step.

So like I mentioned, the next step is what we call get download URL for layer OCI calls it get blob by digest because Docker has the manifest, the digest are in there. So it reads that JSON object pulls out the digest makes a request back to us at this end point. And at this point, it's not actually going to get the bytes. It's going to get a layer or a URL for from us basically where we've stored it in S3.

So we put it in S3 during push and now that that location is stored in Dynamo and you can see here it doesn't really interact with S3 too much. We have the layer location in Dynamo. And what we're doing is we're sending a redirect back to the client, which in this case is ContainerD to tell it don't come to us to get it, go to S3 to get it. And here's where you can find it. And that allows us to offload a lot of the data flowing. It doesn't have to flow through our services. S3 is kind of built for this. So we let we offload that uh blob data download to S3.

Another interesting thing here is because it's our S3 bucket, we can't permission the world to access it. So what we do is we presigned URL, meaning you're gonna get a the Docker client is gonna get a URL. It can access and it's short lived and it expires and it's only for that image. So it's part of the job of the front-end service and the metadata service here to generate that URL and send it back to the client as a redirect.

So we send a URL back to the client and say go here to get it. Um and when it does that, it goes right to S3, it just skips all the services that we have goes right to S3. And it's basically a simple curl or a simple HTTP get and you can get the layer data. So you could build your own client that kind of works like this and just download directly using curl or something from the command line.

And again, the reason we do this is I mentioned it's about 70 billion pulls per month, but that translates to over three exabytes of data pulled from ECR. So that's how much data we serve from ECR in a month, which translates to about four petabytes during this one hour that ECR is serving globally, which I put a little interesting graphic up here. That's about 100 and 50 Library of Congress's worth of data. ECR serves a month container image either.

Alright, cool. But so we've talked about push, we've talked about pull. Basically the core two things that we have to do that we keep on the theme of it's a bit more complex than that because there are more things that we can do with the storage. And so from now on, we're gonna be talking a bit about how the other features in ECR work and how the architecture behind the scenes.

And we start with this. Um our image is doing very well. People are happy playing our boat and we're starting to see people deploy it to other regions, uh which is not the greatest. It means that there is some latency.

We're also having to pay for data transfer between those regions. And if we are scaling those pools that can add up, so we need to find a mechanism to avoid this. And we're gonna do this with replication.

Replication allows us to select a particular ECR repository and automatically indicate that all of the images that are pushed to it are gonna be automatically replicated to other repositories in other other accounts or other regions.

So just again, showing how it works in ECR, but I've now created a handful of different repositories. Let's imagine that we've kept working on this. It's doing very well. So we have additional workloads here. We don't want to replicate everything.

So what I do in my replication rule is specify certain repositories that I wanna have replicated and with to which other regions the way it works is with a prefix filter. So I'm basically telling ECR that for any repository that starts with card whisperer, any new image will have to be automatically replicated to any of those two regions, ECR will do that from that point on for any image in those repositories.

If there is no repository in the target region, it will create it for me. So it will just happen permanently from that point on. Uh and that ensures that I will be able to have my images locally.

Now, here is just uh this is not a demo, but it's a screenshot of a demo. So I did this in my laptop a few days ago and you can see that I pushed an image to my main repository in one of the regions and I think it's eight seconds later, it life one in one of the others.

Now, once again, we're gonna go back to the notion of complex behind the scenes. Uh an eight seconds sounds awesome. It might not always have to be 10, 8 seconds, we might have to manage that workload. So we'll see how that works now.

So going back to our architecture diagram, very similar flow here, you know, rafa's laptop pushing up the proxy. So it's a normal push flow. This is the same diagram we looked at in the beginning. The only difference here is we're adding in a bit of asynchronous processing, push and pull are synchronous operations.

So the client is waiting for our response. In this case, they're doing the push. But we want to do actions based on that. So we don't want to hold the line up while it replicates around the globe. Like rafa said in this case, it took eight seconds, but there might be more targets, there might be multiple places.

So this is all asynchronously done. And the way we do that is we basically listen to our DynamoDB table to know when the push happens and that generates events that we put on in SQS. And then using that queue a replication service using Lambda has workers that pull off those events and tell it based on those rules.

We looked at that rafa showed earlier, where does it need to replicate to? And in this case, we're sending it from one source region to a destination region.

Now, like rafa mentioned this, this can be a very time-consuming process. The it's it's relying on the globe and physical links between regions. So we have an, you know, a nice service internally called it of transport that it's not a public service. It's something we can use internally, but it allows us to move data efficiently between regions.

So we tell it take from this S3 bucket, move it that to that S3 bucket and it can move that data efficiently and as quickly as we can based on the channel capacity and we'll get a little bit into the channel capacity here in a second.

But also the other thing that's happening here is that gets the blobs across region to region. But we also need to get the metadata across the way we do that as our replication service just calls our front end. And the other region tells it about any tags. About any other thing about the digests, the locations, all that stuff calls across and stores it in the destination region.

With replication, there are other APIs. You can call in our SDK and C there's, you can check the status to see if it's complete or not. You can also change the rules or create new rules using our config APIs. And that would happen, you know, from your laptop or wherever to our console or to an API.

So with replication, like I mentioned, you can send it to multiple places, you can send it up to 25 different targets. So we will copy from. So if you think about this screen, so from one source, we could have 25 destinations and those destinations can be a mix of either different regions or if you want to stay in the same region, you could go to multiple accounts and you might be using multiple accounts for different reasons, but you want the same image in all of them, you can set up any combination up to 25 in your replication rules.

Now, I mentioned uh channels. So channels are the concept we refer to as like the link between one region and another and there and there's a channel capacity that transport gives us, it tells us you can use up to 10 gigabytes per second or whatever it is and it can vary by region.

So it kind of depends how much capacity there is between one region and another. So like a further away region might have slightly less capacity, networking events can like reduce capacity. If a physical link goes down, someone cuts it underwater, you know, something happens that capacity has to be reduced.

Um and they can dynamically tell us that. And when they tell us that we back off, if we don't back off, we can overload the entire AWS backbone. If we, if we don't back off, we would send more data than it could handle.

So we're in constant like back and forth, making sure we can get efficiently the bytes across that. We can, we want to get it there as fast as we can, but we also don't want to overwhelm the network.

Um so that's why it can take eight seconds sometimes or sometimes it could take 20 minutes, you know, kind of it's sort of variable. We try to get there as fast as we can. Also depends on the image size itself, obviously. So if it's a 10 megabyte image, that'll go a lot faster than a 10 gigabyte image.

Um so there's just some variability there, but just keep in mind we're, you know, we're trying to avoid overwhelming the, the network and we do a lot on our side to, to balance and scale that.

And like we've seen with the numbers with ECR, it's pretty heavily used. And ECR is one of our main mottos is that like availabilities are job one beyond security, of course, like we want to be secure. But after that available, we want to be up, you know, we want to meet our nines, we want to be serve data because if we can't serve data, that means you can't run your Fargate task or your EKS pod or, or your container.

So we want to make sure we're up so that you can serve data. Um and this is one way we do that by like making sure we don't harm the network.

Cool. Uh one more I don't think I need to tell anybody here to scan your images. But if I have to please scan your images, they pack all of your workload. You wanna make sure there are no vulnerabilities. At least you wanna be aware they are there just in case you want to do something about them.

ECR has two modes of scanning, basic and enhanced. They have many similarities. They both allow you to scan an image when it's pushed, which we'll talk about more in a minute. You can also trigger scanners manually.

Enhanced allows you to trigger continuously scanning. So you can maybe scan continuously the repositories that go to production but not the ones that you're using for development. And they use different scanning engines basically uses CLAV2, which is an open source database that we maintain. And Amazon enhanced uses Amazon Inspector, which is another AWS service.

It has broader coverage, more sources is more complete. So once again showing how it works, there is a console experience. Of course, you can do this through the API or you can choose what type of scanning you want to do.

And similarly to what we did with replication, you can create rules as to which repositories do you wanna have scanned in this particular case? Once again, I'm saying scanned repositories that contain cargo is better because those are the ones that I'm considering critical to production and we're gonna scan them continuously.

I'm just gonna scan and push everything else. So whenever a new image is uploaded, I get a chance to see what's going on with it. And just for the sake of the demo, we're gonna see that this particular image is not great, but I have my results. I can start to work on them. I can make some changes.

So how does this work? It's a r on the previous de dream. Because if you think about what just happened, it's very similar to the application story. I push an image to the repository and then something has to happen. So this is pretty much the same diagram. I think it's exactly the same diagram that mike showed before.

And what's happening under the hood is a bit different. We now have a different service. A scanning service built on Lambda which is waiting for a new event to trigger. A new image has been pushed. Part of the job of that scanning service now is to redirect the scan to the appropriate engine. What it is to the clear engine that we manage or what it is to Amazon Inspector.

This is a bit more complexity because if we send the image to Amazon Inspector and you've trigger continuous scanning, we rely on Inspector as well to continuously trigger new scans. But in both cases, the output of the scan then gets redirected to our metadata service. And from there, customers can see the results of the scan, they can take action on them if they were based on Inspector or at least they can analyze what's going on.

It's also the mechanism that customers will use to trigger changes in the scanning configuration like they did before.

So now I want to talk about another feature which we call life cycle policies. So you can see here in our card whisper repo maybe our pipeline has been running, producing lots of images. We versioned it a bunch of times. Now we have over 7000 images in this repository you probably don't want to keep them all doubtful. You're running all 7000 of them.

So ECR has a feature called lifecycle policies that allow you to define a policy that tell us when to clean it up. So here's a pretty simple one. This says if it hasn't been or if it's been pushed older than 30 days ago, go ahead and delete it. There's no other conditions here.

So if you put this in place as is, it'll just delete it. If it's 31 days, it's gone. Whether you're using it or not, it'll just delete it. So this is a pretty simple one. Don't use this one verbatim, but just for demo purposes, we'll keep it simple. So this is the life cycle policy. That's what we call it.

Now. How does that work?

Now, this one is asynchronous but it's not based on a push event. So behind the scenes, we're continuously running jobs to take the policies. Lifecycle policy service will take the look through the list of policies, send them off to a worker. The worker will take a policy, look at the repo see what the ages are. Try to find candidates to be deleted and it will call our batch delete image usually with like five or 10 images per batch to try to delete them as quickly and efficiently as we can.

One of the interesting things that happens with batch delete image, it doesn't actually delete instantaneously if you were to call it and ask us if it's deleted, we would say yes. But under the covers, we're doing more of a mark sweep pattern. So we mark it for deletion. So you'll think it's deleted, but we actually haven't physically deleted yet. An s3 or even dynamo, we just marked them and said this needs to be cleaned up later. And then we have background processes that are continuously running there as well that come through and clean up those all images. Not not just ones through life circle policies. If you call batch to lead image yourself, same thing, they'll all get cleaned up at some point behind the scenes.

And from the time you call batch delete image and it's mark deleted is the day we stop metering for it. So while we keep it in the background, we're not charging you for that. That's on us to keep it there for sort of a grace period. If we need to recover it or something goes wrong, we have it there. But otherwise it's kind of conceptually deleted or expired. You might say again, just some numbers.

This is just through lifecycle policies. We do, we do serve deletes directly, but through lifecycle policies, we delete over 60 million images per month where we expire them. Um two main reasons for this for two main benefits for this, for customers. One obviously, the more you clean up, the less storage, the cheaper it is to use cr so you don't have 7000, if these are one gigabyte images, that would be 7001 gigabytes, that would be a lot of data to keep around. Uh so if you expire them, you won't be charged for them.

Also, like rafa mentioned, if you have a lot of vulnerabilities, then that'll happen as they get older. Uh you don't want to run an old one accidentally that expose you to some vulnerability. So expiring in old images allows you to, you know, help ensure you don't run old ones that might have vulnerabilities in them. So there are some of the benefits of our life cycle policies.

Uh one more uh this one is near, near to my heart. I spend a lot of time working when you see our public. Uh and this is about public images. I i think glad with how we are working on this when i talk to customers. But in general, the, the rule is what it says at the top of the slide, take ownership of the public images.

If you go all the way back to the second slide in the presentation, I showed you my doctor file and it looked like this. It's pulling from ecr public. Now, I'm here to tell you that the public is available secure scalable, but still don't pull directly from a public registry to your pipelines or to your workloads. Pull through private once the image is in your private play history, you can do things to it, you can scan it, you can expire it after a certain period of time, you can replicate it as needed and you are not taking a dependency on the upstream content, not being available because the upstream registry fails or because the creator of the image makes a change that you are not happy with.

So to do that, we have a feature called pull through cache. What pull through cache allows you to do is map an entire upstream registry to a particular name space in your uh reposit in your ecr registry. What that means is that if you try to pull an image using that particular name space, ecr will try to get it from upstream for you and then keep a local copy that you now manage and we can see this here. I changed my docker file. It's exactly the same content. Just the from line is now different. It still ends in python 3.12 but everything else is different and the process is exactly the same the bill. This is the same doctor doesn't know the difference. It's all ecr doing the job behind the scenes.

So how does this work? It's yet again, another small iteration that we have here. Uh I'm making the pool i am calling proxy proxy is calling front end. But at this point, the mirroring service intervenes and says hold on. This is using one of the mirror name spaces. So i need to check what it does. Is it first evaluates whether or not the image is available in ecr. It then also evaluates if the image was pulled for the last time, a long time ago, it varies. But uh at a minimum 24 hours, if neither of those is true, then it goes upstream, find the image in the upstream registry. S redirect you to it and then pulls the image to have a local copy. If not, it's certain from ecr. So we try to minimize latency. We're not gonna keep the customer waiting for the upstream image to go to ecr and then go to you. But once you've done that, once the image is now locally available again, can be scanned, replicated um whatever you need to do for it. And that's pretty much it.

Uh the last slide that we have was to talk about some of the recent upcoming features that we're talking about. Uh we, we mentioned a lot of things at e crv s. We wanted to cover some of the things that are changing recently.

Um starting with pull through cash just to the very end, pull through cash until up until basically last week supported is public and hio for upstream registries, we now support additional registries that require authentication. This includes poker hub, big hub container registry and a container registry. So you can store your credentials with secrets manager and ecr will use those to authenticate to the upstream registry on your behalf and pull those images for scanning.

We're gonna be working throughout next year in enhancing the capacity of ba basic scanning covering more operating systems, delivering more uh support for customers. Just recently, we released support for a l 2023. 1 of the amazon linux operating systems into basic scanning as well for life cycle policies at the core of life cycle policies is the need to make sure that you control exactly what is deleted. We want customers to be able to choose very granular where images are deleted. It's a bad experience if you put something that you didn't want to. So we're adding a bunch of things to it. Wild card support is close. Uh and we allow customers to basically choose more carefully which tags are matched to a life cycle policy and thus which images get deleted.

And then we're working on a last recorded full time selector. This is something that frequently comes up from customers because it gives you an additional level of security or at least confidence that you are not gonna be deleting an image that is in use. Uh lots of exciting just co registry features.

So repository creation templates, uh you heard me say it before, but when you replicate an image cross region or when you pull an image to pull through cash, if the local repository doesn't exist, it will be created for you that can result in a repository that is not configured like the other ones, it might not have encryption or not have the type of scanning that you want. Uh life cycle policies. This allows you to configure those by default for those new repositories. Reposit recreation template is in preview right now. It's working for authenticated, pull through, pull through cash, but we're gonna be working on extending functionality a to replication as well.

Uh oc a reference types, i don't know how close people are following this, but of course, uh ecr implements the ocie specification and the oc a council has been working for a while in version 1.1 of the specification which includes reference types. We are working with them, we are keeping track of it and as soon as the specification is finalized and released, we'll make it available shortly after in ecr as well, create and push. That's another thing that frequently comes out from customers other registries that you might be thinking of support this. If you try to push an image to a repository and it doesn't exist, it will be automatically created for you. That's also something that we are having and finally performance.

Now, every time i talk to a customer i ask them about uh performance. Are you happy with pc r? Is it moving fast enough for you? Um i just honestly, i don't get a lot of answers, but i'm not like it's fine, but you know what? Everybody wants things to move faster. So we're working on that too. We recently released a change that increased pull, that uh increased pull performance by two x. And we have a bunch of things in the road map for next year that i'm gonna increase push and pull performance by between two and four x.

We're making changes to under the hood things that are gonna speed this up and final replication we support for replication is also in the road map. Uh same story as with life cycle policies, the more ways we can give customers granular controls to choose very carefully. What is it that you want to replicate the better it is for the customer and inclusion exclusion lists. This is a particular use case where you might have a repository that you want replicated, but you don't want a handful of the tax to be replicated. Maybe latest is a rolling tag that you're using just to push stuff. But that's the never one that makes it into the deployment pipeline. So you don't want to replicate that. It will save you some money and it will save you from having to manage it in all of the downstream repositories.

So those are all things that are not roadmap. This is not everything you can imagine that the road map is in flux. We'll continue changing things based on customer feedback. Um but we're focused on these things for now and we're planning to deliver some of this, this next year.

That's it. We have some time so uh anybody can ask questions out loud or just catch us on the side and ask any other questions that you want.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Dive deep into Amazon ECR

Welcome everyone. Thank you all for being here during your lunch hour. Uh Ralph and I are super excited for you to be here and we can tell you about our service. My name is Mike Alfree. I'm a principal engineer with AWS. Um I'm gonna turn it over to Rafa.
复制链接

扫一扫

Dive deep into Amazon ECR

“相关推荐”对你有帮助么？