Advanced AWS CDK: Lessons learned from 4 years of use

李白的朋友王维

已于 2023-12-06 18:47:53 修改

阅读量126

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

于 2023-12-05 14:07:57 首次发布

本文链接：https://blog.csdn.net/just2gooo/article/details/134805931

版权

Welcome everybody. My name is Matthew Bonne. I am the Chief Cloud Architect at Defiance Digital. We are an MSP based out of Charlotte, North Carolina. Our job is to help small and medium businesses start ups, things like that, who are just sort of getting started on their AWS experience. We help them manage their accounts and help them with architectures and things like that.

Today, I'm going to be trying to distill down for you the best practices and all the lessons that I've learned for building good CDK applications from the last four years of using it myself.

There's a pretty packed agenda. We're going to be talking about a lot of stuff today. As you can see here, there is a little, oh you can't. Ok. Laser pointer is not going to work today. Um there is a little icon in the lower right of that, that camera. If I think a slide is probably important and useful. I put that up there for you to take a picture. A lot of you already have cell phones out. Uh slides are gonna move real fast though. So if you're the kind of person who likes to take pictures, make sure it is ready at the very end.

We're going to have a Q&A. So if there's anything that I wasn't clear about, uh save that for the end and we'll get that answer there.

So as a backdrop to everything I'm going to be talking about today, uh we're going to talk about building this application. This is a basic ECS based website that is multi region. There is going to be a us east one deployment, there's going to be a us west two deployment. There's gonna be some little extra stuff in there as well like a little serverless based media transcoder. And then at the very top, there is a Route 53 latency based record so that whether you're here or you're over on the east coast, you're always hitting the closest servers.

This is demo we though. So there isn't really a whole lot that we're going to talk about the actual functionality of this. It's just going to help serve as an example for a lot of different things through the talk.

So some basics to get us started, every CDK app has stacks stages constructs. And when you get those things all built, you synthesize them into CloudFormation.

So the very whole, this the whole thing is an application, the whole thing altogether makes up an app and then everything that gets deployed into one of the two regions is a stage everything that's deployed into the regions is redundant and copied back and forth. So it's basically the same stage into both regions.

Stage is made up of multiple stacks. So there's a stack for the website, the ALB the bucket, the Route 53 record that's specifically for that region. There's a stack for the DynamoDB tables. And then there is a stack for this little media transcoder. And then inside that stack, I've also built a little construct that is specifically for doing that transcoding work.

Once I have all this stuff built, I pass it through a synthesizing routine and I get CloudFormation. And because we have both a dev and a prod environment and because I'm going to two regions, there's actually going to be four sets of CloudFormation templates and there's like three or four templates per you know instance. So there's actually a lot of templates get generated out of this code

Now to get started building all of this, I use Pro. Pro is a construct based API just like the CDK built by a lot of the same people who built the CDK. But whereas the CDK turns your constructs and the code that you write into CloudFormation resources, Pro turns the things that you write into files in your repository. So it makes it really easy to have a templated code base that you can share and use. Think of something like Yeoman, except where those things are one and done. Pro is something that you can use and maintain over time. I like using it for all of my projects and I'll bring it up a couple of times here

Based on the CDK community survey that I run every year. Not a lot of people actually use Pro. And I think that's really unfortunate because it provides a tremendous amount of value for all the same reasons why the CDKp provides a lot of value to you. So I highly recommend you check it out

To get a new project started is as simple as running this command. This replaces the CDK init command you may already be familiar with. In this case, I'm using npx to execute the projen cli. And I'm saying that I just want to create a new TypeScript based CDK app. And out of that, I get a whole bunch of files that are generated like gitignore, GitHub actions, sample files, tests all sorts of stuff built for me.

Now, before I actually start writing any bit of code, I'm going to probably start off by writing a CI/CD pipeline for that. So if I take a look at my main application, this is actually the only stack that I'm going to see directly within the app itself. It is my pipeline stack. I deploy that out to a management account and into my primary region. I give a couple of values and then within that stack. I'm going to go ahead and start creating the CodePipeline construct and now don't get this confused with the CodePipeline service and the construct that's part of that service. Although they are named the same, they're different. This is a very purpose-built construct for building and deploying out your CDK application to an environment. And yes, it does actually use CodePipeline behind the scenes and unfortunately, naming things is very hard. So there's a lot of confusion with that.

But I go ahead and I just tell it, here's how you synthesize my app. I tell it where the code comes from. In this case, it's off of my GitHub am bonics re invent repo. I give the basic commands to go ahead and synthesize my app and then it really takes care of a lot of the wiring and plumbing for me.

Now, the next thing I have to do is I actually have to define those environments and those things I want to deploy out. So I create what's called a wave and a wave is a group of stages or stacks that will be deployed in parallel. So I create a wave for my dev environment. That's the very top line 39. And then I create new instances of my application stage construct that I've built and I add it to that wave. So my primary region, my secondary region, things like that get added to the wave.

Here's my secondary region, I provide a couple of parameters to it, like what account it's going to what region it's going to. And then things that are specific for that particular region's instance, when I'm all done with this, I have now my application deployed to us east one and us west two.

But I also need to put out that sort of latency based record that handles routing to one of the two regions based on where you're at. And that is itself its own stage in stack called the dns stage. But I'm going to take an extra step once that's deployed out. I want to run my first of many tests that I'm gonna build throughout this whole thing. And this test is a very simple shell command on line 83 that just tries to curl the website. And if that succeeds, then I can proceed with the pipeline and move on to my product account. But if that fails for any reason, because the site's not available and there's a problem, the pipeline cannot proceed. And this is the first of many safety checks that I'm going to put in the system to make sure that when things go out in the environment, I'm not just throwing it over some wall and hoping things are fine or I deploy it out. And I have to go manually open up the page in a web browser. This is some good automation to put in.

Now when it comes to production though. I get paranoid. I get really paranoid about things like an RDS database or any sort of state for resource. Because if I make a mistake with that, I'm gonna have an outage. And then people call me up at two in the morning and they say there are problems and then they start yelling at us and say we want our money back. You aren't doing your job and I hate giving money to people.

So what I'm going to do is I'm going to add in some additional steps to my production environment to give me a little bit more comfort. And it's going to start by adding in here, this public property on my application stage called stack steps. And then I'm going to go ahead and identify the stack that I want to be paranoid about. In this case, my database stack. And then I'm going to go ahead and add a new object to that stack steps array that says for the database stack only add in a manual change approval process so that when the pipeline runs, i can go ahead and check things out.

So back here now when i'm adding in this application stage for the prod primary region, i add in this additional little step at the end that says, go ahead and use the stack steps that that stage defined. And when i go ahead and i deploy this out into environment, and i look at code pipeline. i'm going to see something like this with all the stacks and everything going out. But then for that production database only, i'm gonna have this little review button. The code pipeline gives me and i can hit that and i can go out and i can look at the change set that's been set up by the code pipeline and i can go, ok, i know that everything in here is safe or no, i don't like what i'm seeing and either approve or reject that at that time. It just gives me a little bit more comfort before i have a lot of robot robust testing and things in my application kind of a nice little safety check.

Now, some of you don't like CodePipeline and that's totally fine. I do actually like GitHub Actions for a lot of stuff. So if you want to, you can go ahead and use a GitHub action based pipeline as well. There is a construct that you can get readily off of the Construct Hub that has the same API surface area as the that i just showed you. So it's pretty much a drop in replacement and it will generate out a GitHub action that does all the same things. The only difference with it is that because GitHub actions doesn't have that manual approval thing that i just showed you, you'd have to remove that step out of there, but otherwise it'll do everything else just the same.

Ok. So I got my pipeline going. I basically have things squared away and I can start writing off code.

So how do I organize my code? It's a very common question. We see it all the time come up on the CDK dot dev Slack server and a lot of people go, well, what's, how do I put things? Where do I put them? You know, I just don't know.

So this is a general pattern that I've gotten into a habit with on new projects. I start off everything in a source directory. This is the project convention. Everything is in source and in the root of that source directory. I'm going to put in my main.ts, which is my main application and then a constants.ts file which holds all of those constant values that I'm going to share across a lot of different files and things like, what's my account numbers for my prod, what's my account number for my dev account? What are my primary region, my secondary region? Any sort of thing that I'm going to share a lot? I'll put in constants.

All of my constructs go into a constructs directory. If I've got a construct that is multiple files. This is very often things like lambda functions where you have the function in a file and the CDK code and then you have the handler for that function in a separate file. I'll put those in their own directory for good organization stages, go into a stages directory and then finally stacks, go into a stack directory.

But over time after I build out an application and it gets to any sort of significant scale, this no longer works, I end up with more than about five or six files in a directory. And once you get to that limit, it kind of gets a little hard to navigate those directories and see where things are. So I start to split out things a little bit more. I'll go into my stacks directory and maybe I put a backend directory in there and then everything that's backend related stacks I put there. And if I have constructs that are specific to the backend, I also move them over.

And one of the things that's really important about this is that it actually doesn't matter what I do or what you do as long as it works for your team, it makes no functional difference to the CDK whatsoever. You can organize your projects any which way you want, whatever works, follow your favorite language conventions, follow your team conventions. It doesn't really matter.

Alright. So I get it off and I start building out my stacks. Now, the first thing is do I put everything into a single stack or do I put it into multiple stacks? Now, the benefit of a single stack is that you really don't have to worry about cross stack dependencies, things where you define something in stack A and you want to make a reference to it in stack B. This is a cross stack dependency and the CDK does a lot to automatically handle a lot of the complication of doing this through things like CloudFormation exports and imports.

But if you've ever removed something from a stack that has been shared between them, you know that there's a lot of pain that comes with it. It's called the deadly embrace and it's a giant pain and there are ways to work around it, but it's often very manual and very tedious and multi-stage deployments and it's not fun.

So that's one good reason to go with a single stack, you never have to deal with that. You also get assured that every time you deploy out that stack and that stack is in a healthy state that everything within your application is on the same version you don't have to worry about. Oh, this part of my app is a little bit older codebase and this one is on the latest version of codebase and the problems that could come out from that.

But the downside is that you could still potentially hit resource limits. Uh they upped it recently, probably a year and a half, two years ago. Now, up to 500 resources by default in a stack that's pretty hard to probably hit. But if you're using a lot of L2 and L3 constructs and over time, it's just going to keep growing and growing. And also CloudFormation is going to complain that, oh you've hit your resource limit and now you probably have to split them out into stacks anyway.

Uh it does mean that any time you do a stack update, you're going to have longer updates because it has to do more work to figure out what's changing within your stack. There's more resources to consider. If you're doing a look at those diffs between your stacks, you're going to see a lot of noise and you're going to have to really sort through. Ok. What's happening here, what's happening there? Whereas you have things isolated, it's a little bit easier.

And then finally, you have what I call the atomic block problem. This is a case where you have a really large application and you may have a problem with one small portion of that stack. Maybe you have 200 resources in that stack and one of them cannot be updated because maybe someone did some click ops on it and now it's in a bad state. Maybe you were referencing an IAM role or policy that has been deleted behind the scenes and now you can't make that change and you have to go fix that problem.

Well, while you're trying to fix that problem, no other changes can go out in that stack. You are blocked because a stack change is an atomic change. And that can be problematic if you need to get a bug fix out into an environment. And you can't because, well, someone wasn't really doing their IAC very well.

Now I typically go with a multi stack deployment because I've dealt with the cross stack dependency issue enough times I know how to work around it. And all the pros and cons of a multi stack are basically just the inverse of the single stack. So I won't go into too much here, but I'll leave that up because I did put the camera icon.

Alright. So I want to talk a little bit about assets. Assets are anything in your CDK code where you are building those sort of ancillary things. They're not directly IAC but they are part of your application. So in my case, this is the ECS service that I built out and I have to make reference to the local directory that holds the contents of that docker image that I'm going to build. And I said, well, I want to go ahead and just say asset.image.fromAsset.

But there's a big problem with this, I'm going to build this stack four times which means I'm going to build this asset four times, which means that the asset that I deploy out to the dev environment in the primary region isn't exactly the same asset that I put out into the secondary region. And when I go to prod, I have two new assets as well, they're technically different. And while they were all probably built at the same time and are probably very similar to each other, there can be technical differences between them and you could run into problems.

So we like this build once deploy multiple features. So how do we get that? Well, instead of building it here, I'm going to have this as a property that I pass in. So this property is going to be given to this construct, given to the stack and it's going to actually bubble all the way up into my application. And now at the top of my app, before I do any of my real CDK work, I'm going to build the asset there. I'm going to build it once I'm going to pass it in, in every stack, every resource, every region, they're all going to be using that same asset. I don't have to worry about any slight small differences between the things build one deploy multiple times.

Now I mentioned before uh using a GitHub pipeline. Now, if you want to do that, then to get authorization, don't use long lived keys. You don't want to be putting an access key ID in a secret and those sorts of things in GitHub, you want to use those OIDC providers and roles. So to get those things created. I'm going to create a secondary app. This isn't really part of the app as a whole. It's not going to be changing at the same pace. It shouldn't be deployed out every single time I make a small change to the website.

So I'm going to build a whole new app. So at the very top there, this is going to be in a file called source/github-support. It's going to be a separate app. It's going to have its own stacks in it. I'm going to build and deploy this thing manually and to make that extremely easy to do, I'm going to go into prod and I'm going to create a new task.

Now, the CDK an app to the CDK is one of two things. It's either a directory that holds all of those synthesized assets and, and templates and things like that or it's a command that generates them. If you open up the CDK.json file on any one of your projects, you're going to see a line very similar to line 39 there at the top, it's just going to sit there and say that we want to run ts-node or pythons cli to go ahead and generate out the file.

So this is a task that simply says go use the github-support file as the app and generate the content out. And then the bottom line just sits there and lets me run it and deploy it out and synthesize it just like before. And on a lot of our projects, we end up having anywhere from 2 to 6 or seven different apps within the same repository, very common for us.

Alright. So constructs, let's, so we talked about apps, we talked about stacks. Let's talk about constructs themselves. Constructs are supposed to be things that you are building. They're not just a case where you've got a stack and you start just throwing a lot of L2 or L3 constructs from your, from the CDK library directly in it. You are actually supposed to be building your own constructs.

They can stay in the same repository. You don't have to put them into a separate library and package them and distribute them. But they're supposed to be representing business functionality unique to your problem. So they should be small, they should be discreet, they should be reflective of your particular project in your particular business. And I recommend you do this early and often because if you decide later to do this work to collapse things into different constructs, you're going to likely change the logical idea of the CloudFormation resource. And when you change that logical ID, CloudFormation isn't smart enough to understand you change something. All it's going to see is that you deleted something and you created something new. If you do this work early and often, then you have less of a problem with resources changing long term. And we'll talk a little, little bit later about how you can actually work around this if you need to.

For example, when I go back and I look at the video transcoder that I built out for this and the stack, this was originally the code that I wrote. And not just because this is a small font, it's not just hard to read because this is a small font, but because there's a lot going on in here. And if you came back to this code six months later, or you handed it to another developer and said go make so and so change, it would be very hard and it would take some time to read through this and understand how things are related so that the change that you make is isolated to just the change and, and the the feature you want to, to, to tweak.

So what I'm going to do is I'm going to refactor this, I'm going to go ahead and take all of this code here and I'm going to actually turn it into this. I'm going to have one construct at the top that is my transcoder and then online 28 create a second construct that encapsulates everything that is the test of that transcoder later on. I'll be talking a lot about testing. Testing is my favorite aspect of the CDK, but this is now a lot easier to read. And if I go and I dig into the transcoder construct, I know that everything that's in here is related to just the transcoder and nothing related to the test because I've got good isolation. So I know if I have to make a change, I know exactly where to go to do it.

It's also a lot simpler and less good to read my test. Same sort of thing. I know that everything that's in this construct is related to just the test. And if I need to make a change to that test in any way, I know exactly where to go to do it.

Now, very often when I'm giving talks about the CDK or we wrote the CDK book, I don't talk about specific constructs and that's because there's 150 or so AWS services now and every one of those have tons of different. I could spend hours and hours talking about any particular thing. But this is one in general that I'm going to highlight.

This is the AWS Secrets Manager, Secret construct. It represents a secret in Secrets Manager that will hold some sort of configuration value that I may use with Lambdas or an ECS service or something in general. It has a really kind of at first glance and each feature this generates secret string.

Now, if I don't use this parameter, what's going to happen is the secret will get created in the content of that secret, the value of that secret is going to be a bunch of random numbers and characters and symbols and things like that. And that's never really all that useful. So you're probably going to have to update it sometime later anyway. But if I need to put some structure behind it, because this is maybe a JSON value that's gonna be used by ECS or something else. I can go ahead and use this generate secret string. And I can say, well, here's the basic shape of that, that value that I want to store. And I can say that has an API key and an API token. And then line 43 says that the API token should be that randomly generated value.

But the problem with this is that it gives you the impression that you can actually manage the value of the secret here within your code. And you can't a you shouldn't be putting anything that is truly secret in your IaC because it's too exposed. It's in CloudFormation templates, it's in your CloudFormation console, things like that. But what's going to happen is sometime later, someone's going to come down the line and they're going to say, well, we need another field in that secret. We need something else that we want to store in that. So you think that you can go in here and do this, you can just add that thing in there. But as soon as you deploy out this code, everything that was in that secret has now been completely replaced. So you may have updated things like the API key or the API token sometime later because you do those things manually and now all those changes are gone and you're now back to a broken system.

So my one recommendation that's constructs specific is just don't do this, don't ever use this value. It gives you the wrong impression. There are still some edge cases where you probably might have to, especially when we're talking about ECS services where you're typically referencing specific fields within that secret. But if you want to talk more about this, uh either with the Q&A or hit me up in the hallway after the talk and we can chat about that a little bit more.

All right. So external data, this is one of my favorite parts of the CDK because you're using general purpose languages like Python Java, C# or TypeScript or Go, you may think to yourself. Well, can't I just go out and get some data from an API that I've got or from some sort of external system and use that to control the generation of my CDK code.

So let's say, you know, this application has now been out in the wild and you've got a team that comes to you and says, well, we want to get access to those, those DynamoDB tables. We need a service account to do it. So I don't like doing things manually. I, I, so I build out an application, that application lets all my various teams manage their service accounts to the DynamoDB. Access. The importance of that application doesn't really matter. The gist of it though is that service account information is now going to be stored in a DynamoDB table. And I want to use the data that's in that DynamoDB table to pass to CDK code that will generate out IAM users. And those service scouts that I want to use now, you could go and do something like this.

And this was the first thing that I ever tried doing with the CDK as far as external data, I went and wrote lines 19 through 22 which goes out to that table and retrieves the information and then lines 24 through 28 it goes and takes that data from that API call and just hands it over to my CDK code. But this is really, really bad actually. And the reason why is because a it obfuscates your inputs. You never really get to see what that input is. What the result from that DynamoDB table is before it gets passed over into your CDK code. You could console log it and you could do other things. But the fact is by the time it gets from the API and it starts hanging over your CDK code, it's, it's already there. It doesn't matter if you look at it afterwards and this can lead to things like nondeterminism them in your code. And determinism is really real, really important in code, not just the applications we write, but especially in your infrastructure's code. You have to trust that every single time you write your CDK and you synthesize your CDK code, you're going to get the same outputs. If you gave it the same inputs and this obfuscates those inputs, you can't see them.

So if they change, you may have different outputs and when that happens, things could easily go boom and you could have outages and you could have customers complaining at two in the morning and they could be asking for money back and I don't like giving money back to people.

So how do we do this? Right. Well, we start off and we start taking the logic of making that call either to DynamoDB or my external API and I put it into a completely separate file and that separate file is going to go out and get that data and then it's just going to write it into a data.json file and that's all it's going to do. And I'm going to rerun this script any time that I think that data might have changed, I'm going to add a task into pro that makes it really easy for me to run this script. And then in my CDK code, instead of making that API call directly, I'm just going to take the data that's out of the data.json file and I'm going to hand it over to my CDK code.

So a little bit of indirection here. Not a lot though. It still fundamentally does very similar things. But the main advantage to this is now my inputs are actually visible within my code repository. I know what they are. They're part of the PR review process. And this actually replicates what the context API within the CDK does itself. So if you've got, and you've ever, if you've ever used the imports the dot from whatever. So a VPC dot from look up in your CDK code that's using the context API, what it does is it goes into a local cache. The CDK dot context dot JSON file, it looks for the information there. If you can't find it about your VPC or whatever you're trying to import, it's now going to go make an API called AWS and it's going to get that information and then it's going to write into the file, that file should be checked in as part of your code review process as far as your PRs and everything you get to see that it's part of your inputs and this largely replicates what's going on there.

Now, I wish that you could just use the context API and you wouldn't have to kind of roll your own version of it. But unfortunately, the current state of things is the context API is not easy to, to interface with. It's not really meant to be a public API. So you can't really do it. But this is a much better way if you do have to get down to that and if you're thinking to yourself. Yeah. But now I've lost a lot of the automatic portions of the, this again, bigger discussion. I didn't want to cover in this talk, but come talk to me out in the hallway after we're done and we can talk about some ways that you could actually automate some of these steps so you can still get some nice automation behind them.

All right. So now at this point, I've got everything built, it's all generally running, but I got to do all that sort of little polishing steps, the, the little buttoning up of things along the way and I'm going to use aspects to do it. Aspects are a really, really cool feature of the CDK that comes from the term aspect oriented programming where you take and you build functionality that is applied widely across your entire code base at one shot rather than at individual points.

So the first thing we're going to look at is the case of the transcoding service. So this is a very simple thing that just sits there and monitors an S3 bucket and when an object, an MP4 file gets put into the uploads prefix, it's going to trigger a lambda function to run using a VentBridge. And that lambda function is going to go and ask MediaConvert to make some changes to convert that file in some sort of way and then write the resulting media file back into the converted prefix.

Now, this lambda function is something that I'm gonna want to monitor and we like using DataDog. So we're just gonna go ahead and use the DataDog construct that they ship. And on line 36 I can go ahead and say transcoder.triggerFunction, go ahead and add that to my monitored functions and everything sort of works. But there's a problem with this that I don't like. I have to know that that lambda function exists within my trigger function. And that sort of breaks the contract of an L3 when I use an L3 construct. When I use something that someone else has built or even something that I has built, I don't necessarily need to know what's the inside of it. I don't need to know those individual parts. But in this case, I do. And if this is a construct that I didn't build and it's something I pulled off construct hub, I may not have that public property that I can access for the lambda function. I can't hand it over to it.

But there's a way I can get around this. Instead, I'm going to build a DataDog aspect. And then I'm going to go ahead and say I want to add that aspect to this scope. In this case, this would be a stack, but it could potentially be any scope. It could be a stage, a construct a stack, whatever. And what this aspect does is that it is given every single node, every single resource, every single construct within my CDK scope that I gave it. And if that node is a function, a Node.js function or a Python function, now I'm going to call the a lambda functions, call to it instead. And the big advantage to this is now I can add this to any part of my application. And now all of my lambda functions are going to get monitored and logs forwarded and metrics and all this stuff. And I can basically write this one time and cover all of my stuff.

So even in those cases where I've used a construct off construct hub that I don't even know what's inside of it. I'm still going to get my monitoring set up. And that's really great.

I mentioned before about if you change the logical ID of a construct, then you are going to get or sorry if you, if you move constructs around, you're going to change the logical ID.

"Well, what happens in the case where you don't want to recreate that resource, but you still need to do that refactoring? There is an option within the CDK to override the logical ID that was generated and you can do this directly in your code. And it's part of an escape hatch into some L1 constructs. But I like using this aspect instead. It's publicly available. Uh it's on the Construct Hub, it's on NPM JS, things that I built it a while back. What I do is I give it the new logical ID that my new code is generating, that's on the left, that's the new cluster, logical ID. And then I say I want you to rewrite it to this old ID. So if I refactor my cluster and I stuck it into some other constructs, if I move things around, even if maybe I stuck it into another uh whole area of my application, um I can still do this rewrite on the ID and I don't have to worry about recreation of things and it's a little bit easier to manage, a little bit easier to write along the way.

Tags you can apply on a per resource level. But if you do it by aspects, it's a whole lot easier. I could apply an aspect to my entire stack or even to an entire stage that says here's the environment that I run in and every single resource automatically gets that tag, I can do it at any level. I want any scope that I want. Very simple, very easy.

And then finally I get this whole thing built and then my compliance and security officer came to me and they said, yeah, but is this thing secure at all? And I go, I don't know, I'm not a very good developer and they go, ok. Well, let's do this. Let's go ahead and we add the CDK Nag aspect to my code. Very simple. One little line I do at the very top app level. And the next time I synthesize my code is going to produce a whole bunch of errors. And now I can go through all of this uh one by one with my compliance officer. And I can say, is this something that we can safely ignore and throw an exception or do I need to fix this? And we work through these problems one on one. And we know at the very end, everything is nice and secure.

Now, the one of the things that really gravitated me early on to the CDK was the fact that I could write my IAC code and my lambda functions and other things in the same language. And there's some benefits to that. Go back to the media transcoder. Part of what that lambda function does is it has to tell MediaConvert a role, an IAM role to use during that conversion process. That role has to have access to read from the media bucket and right to the media bucket. So my IAC code defines that IAM role and I need to tell the lambda function what that IAM role was that I created in my IIC to hand over to MediaConverter. So I write some code like this in my, I see at the top in my CDK code, I have line 30 that says that the environment variable mediaConvertRole is set to this roll ARN. And then in line 72 is inside my handler where I read that value off of the environment variables and I move on. But the eagle eyed viewer out there will notice I have a typo in here. If you didn't notice that typo, you're probably like me and you'd beat your head against a wall for about an hour wondering why nothing was working because you forgot to add that little ARN at the end of the environment variable name.

So how can we work around this? How can we fix this? Well, that's actually really easy when you're talking about everything being in the same language. In my lambda handler is the top two lines. I've got it exported constant value. It doesn't really matter so much what the value of that constant is. I defined it as basically the same thing because it makes things really readable, but it doesn't actually matter. It could be gibberish at this point. So the lambda handler is the top two pieces. It uses that constant value for reading things off the environment variables at run time. And then the lower portion is my CDK code where I import that same constant value. And I use that to set the environment variable in my IAC. Now it cannot be wrong, it cannot be different. And these sorts of things allowing you to share values and share data back and forth between your IAC code and your, your runtime code can be really, really powerful. And I love it because I hate beating my head against a wall like this. You just always feel so dumb after you spot it after an hour and things.

So some best practices when working with stacks:

Put staple resources in their own stacks. This is probably a little controversial. Again, I'm a multi stack guy. I don't mind it. I like putting all my staple resources in stacks because it means I never run into an atomic block problem. And I never have to worry about making a change that's going to affect and potentially destroy a staple resource because I can always control when that stack change happens.
Always provide environments. So the account and the region. Every time you create a stack or a stage, every instance of a stack or a stage that you create should be for a very specific account and region. Never use anonymous stacks. There's almost no point to it anymore and it will only get you in trouble.
Refactor your code early and often and use snapshot testing for your stacks. And we'll talk again, talk about testing here a little bit more in a moment.
I wouldn't put more than about seven constructs into a stack. If you start to get to about that limit, then it probably means you need to start refactoring things. Create some new constructs, move those things into the constructs again. Refactor early and often.
And then the last don't ties back to point number two of the do's. Don't reuse a template in multiple accounts or multiple regions. Every time you synthesize an account or synthesize a stack, it should be for a specific accounted region. If you're not doing that, then you're not really using the CDK correctly.

As far as constructs go, constructs are supposed to represent business functionality. They should be small and discreet. You should again refactor early and often and you should use fine grained assertions for doing those tests. And finally, do not use imports within constructs. Stacks are responsible for importing resources in constructs that you build should always be given objects, not just a roll ARN is a string value. And then inside there you passed around, you should be dealing with objects back and forth. Everything works a lot better if you do.

All right. So my favorite part of the CDK, we're going to talk about testing. There's a lot of different ways you can do testing. We're going to hit quite a few. The first one is unit testing. You can use snapshot testing or fine grain assertion snapshots work really well for stacks and constructs or fine grain assertions work really well when you're dealing with constructs specifically, and we'll also talk about how you can mock assets in your tests so that you can speed up your tests and make them more reliable.

Now, if we take a look here, this is the snapshot test of the website stack that i built. Uh pretty basic. It looks just like a normal application. But right here at the very end, rather than synthesizing my code, I'm going to use this template class that comes from the assertions library that ships to the CDK. And I'm going to say here's my stack, give me back this object that makes it easier to test. And the thing that i'm going to use on it is this assert dot to json call. It just basically takes everything and renders out a big json blob for me. And then I'm going to use just to match a snapshot call. And this is where the snapshot term comes from.

Now, the way that a snapshot test works, if you've never done it before is the very first time you run this test, then whatever the result of that stack was is just correct. It is a passing test by default. It's going to take that result then and it's going to store it in a local file and that's your snapshot. The next time i run this test, it's going to compare the latest version of code from that to dot json to json to that stored snapshot. And if they are the same thing, then the test passes. If anything has changed the test fails, and I'll show you very clearly what those differences are and you can sit there and say, oh, this was a mistake, i'll go fix my code or you can say no, this was expected because i was making changes to my stack. These all look good. This is exactly what i expected to change in my cloud formation code. So you hit a green button, you say update my snapshot and you move on.

But when it comes to assets, this can cause a problem when you're testing assets because every time you generate an asset, you're going to get slightly different s3 objects. And s3, the bucket is not going to change, but like s3 object names and things are going to change. So you can use mocks within jest now. Some of you may have never come across mocks before. Mocking is simply a way of you replacing functionality of your code. So in this case line 11 there at the top jest is going to replace the assetImage.fromAsset function with that code that i have there that says that's in between the mock return value. Basically don't do what the fromAsset call normally does. Instead do this thing in its place, mock it out, change it out. And this is going to give me a static result every single time.

This is in the case of when i'm doing container images using the assetImage class or if you're going to be doing lambda functions, the code.fromAsset, this little line of code is in almost every single one of my test cases for every single stack because i always end up having a lambda somewhere in there. So this isn't going to generate any zip files when i run my test, which means it's going to be faster. And because i am statically giving back an object key of my key, it means that my snapshot test isn't going to change every single time i run it. So this is really important if you're going to be doing snapshot tests have this ability to mock it up.

Now when it comes to testing constructs, that's where i like to use fine grained assertions, fine grained assertions are specifically testing for very specific things within my constructs and generated code. So in this case, i'm testing my transcoder construct and i have a group of tests around specifically on line 15 around the mediaConvert role that that construct was going to generate. And i've got four tests. The first test checks to see whether or not that role that was generated can be assumed by the mediaConvert service. So if we dive into that, it's going to look like this: "

"So line 17 is just sort of hiding all of the, you know, generate the stack, give it, you know, initial stuff and things and i get back that same template object. And now on line 18, i say i'm expecting there to be an i am roll somewhere within this code that has this assumed roll policy document. And if it finds that, then my test passes, but if it can't find that role somewhere that has that assumed roll policy document exactly as i have it there, then it fails.

Now if you notice i'm not testing anything else. This is a very specific test that's only checking one particular aspect to that im role. And then those three other tests that you saw before are going to test the different aspects. Like does it have the correct policy document on it for reading and writing to the bucket? But once you get done with all of your unit tests, you're not really done to be honest because there's a lot of things that you could write wrong in your code that you would just replicate the wrong problem in your unit test.

So we move on to doing integration tests and integration tests are really easy to enable with pro there's this flag in your pro construct that says experimental in teg runner, it'll install all the dependencies needed to get the integration runner that comes with the cd k and now i can go and i can write integration tests. Now, integration tests look just like normal apps you can see here. In this case, i'm going to do an integration test on the database stack. But line 13 instead of being a synthesized call is i'm creating this in teg test class and i'm giving it that test stack, the database stack. And what's going to happen here is it's very similar to a snapshot test, but it takes an additional step that the snapshot testing and unit testing doesn't do. It's going to actually try to deploy it to your aw s account.

Now, the very first time i run the in teg command with pro it's actually going to fail because unlike just this doesn't automatically assume the first run is correct. I have to specifically go in and run this update command. And when i run this update command, it's going to take this thing and it's going to go ahead and try to deploy this into three regions and you can override that if you don't want to go to all three, but it's going to try to deploy this stack into those regions. And if it can do it successfully, that's considered a successful pass. And this will uncover a lot of potential errors. The unit testing just simply cannot. So this is a great next step. If you got all your unit testing done and you're still sort of running into problems. And you're like, well, how do we check for those integration tests? Is your next step? And what i really like about it is not only will it tell you, you know what has change, just like snapshot testing normally does, but it'll also tell you if something is going to be destructive. Oh, you made a change that dynamo db table, you change the sort key on it. Well, you're going to wipe out that table. That's a big problem. And this can be a great thing to add in to your pr process into your uh build process within your code to make sure that nothing is going to go haywire on you.

But even if you get things deployed out into an application, that doesn't mean the things are actually working, testing, our applications is very hard. So what can we do now? Well, the first thing is you can use the triggers module from within the cd k itself or you can use the cd k intrinsic validator. That is a third party construct generated by a guy named josh kendon. He is fantastic. I love this thing. I've been using it on a lot of projects for quite a while now.

So the triggers module is very simple. We'll kind of start from the end and work backwards. So on line 28 there at the bottom i create this trigger construct. Now behind the scenes, this is a custom resource within cloud formation and a custom resource is just, you know, your ad hoc do whatever i want to sort of thing. So it's going to go and it's going to call the lambda function that i have to find online. 14, if that function fails for whatever reason, then it's considered a failed test, the custom resource fails creation and it causes the stack to roll back leaving me in a healthy state of my application. If the lambda function executes just fine without error, then the custom resource completes creation and the stat continues on with whatever it is doing and everything is good.

If we take a look at the actual lambda function is a very simple fetch against the end point, but that's for a website and doing a fetch against an endpoint might be a sufficient test. But when we're talking about testing, things like the media transcoder, things get a little bit more difficult.

Now this is the cd k intrinsic validator. Ah it comes with a lot of out of box checks. So if you want to do a simple http check, you can do that. Uh you can tell it what you expect the status code to be to return. Uh you how long to try it and just keep retrying until it fails sort of thing or you run out of time. But you can also do things like give it step functions and state machines.

So going back to the transcoder, there's no end point that i can hit to test to make sure this thing is working, but i can use a state machine to test it. So i'm going to go ahead and create a state machine. I'm going to hand it to the intrinsic validator. And what this state machine is going to do is it's going to go out and it's going to copy a test file into that bucket into that upload prefix and then it's going to wait about 30 seconds and then it's going to go check that bucket in the converted prefix for the expected transcoded vi video file. If it finds it, everything is good, everything works just fine. We're going to go ahead and clean up those resources that we dropped in there for tests and the stack could continue and everything is gravy. But if something didn't work, right, if my lambda function wasn't coded, right? If the im roll wasn't with the right permissions and things like that, when i try to read that object out of the bucket, after 30 seconds, it's going to fail the stack again, rolls back and i get myself left in a healthy state because the previous code worked. So things are probably working just fine.

So this is a great way of being able to do a lot of end to end testing is inc incorporate this directly into your cloud formation code and your cd k code so that you don't have bad stuff lingering in environments, you test it as part of the deployment process, as part of the the stack update process. And you know that at all times your code is working well.

So i'd like to wrap this up with a little bit of fun. One of the things that i really like about the cd k is because i used to be a.net developer, i used to do a lot of object oriented programming. And with object oriented programming, you can often replace functionality in other people's code. It's kind of the nature of oop and the cd k allows you to do this as well in the cd k. The thing that actually does all the work to turn your cd k code into cloud formation code is called the stack synthesizer and you can replace it.

So i created my own and it's called the hero's synthesizer. Now, it actually does the same thing as the normal default stack synthesizer, but it is additionally going to dump a couple of json files into that given path. And if you notice that given path is actually the same website that i build and deploy out as my website.

Now, uh if you go and you pull out your phones, you can go to re invent dot matthew bog.com, that qr code is not trying to fish you or scam you it's just a qr code for the website. I'll give everyone a second to load it up.

Now, when you get to that website, at first glance, you're going to see the, uh, session named for today. You're going to see the time for today and you're going to see a very shameless plug at the bottom for my advanced cd k talk. But if you click on that middle line, that's the time for today, you're going to see this.

Now, for those of you who do not recognize these faces, the gentleman on the left is ben kehoe. He doesn't like the cd k. The gentleman on the right is jeremy daley. He also doesn't like the cd k. So i decided to put their faces in my talk.

Now, if you click on one of their faces, what you're going to see is their mouths are going to open up south park canadian style. Uh i think one of them might be canadian actually. No, i think they're both from boston. It's close enough and they're gonna start spewing out all of those cloud formation resources that were generated by my cd k code.

Now, you're probably wondering is this useful at all? No, no, this is completely pointless. This is really just trying to have some fun with some fellow aws heroes. But the point to all of this is that you can write your own stack synthesizer. If you really wanted to, you could have that stack synthesizer, turn all of this into internal documentation diagrams. Uh it could be automatically uploaded to other things later on. If you wanted to, you could really do pretty much anything you want because that's, it's that injection point in which you can, you know, make own modifications to the cd k.

Uh one thing i was actually really came close to building for this talk instead of my, a beautiful animation with cs s was a im policy generator that would create you an im policy that you could hand to cloud formation during the deployment of a stack. they would only allow you to create update and delete those resources which your cd k defined because i'm guaranteeing that if anyone in this room has ever deployed a cdk stack, they probably used administrator access to do it. Raise your hand if you've done that at any point. Yeah, that's really against the well-architected framework. It's not least permissive. It's generally a bad idea, but we don't really have a better option. So i thought, well, what if i created that better option? Well, after having a discussion with one of the cdk book co-author, we realized it actually didn't really buy us anything to do it. And if you want to know more about that, i'd be glad to talk more afterwards out in the hallway, but that's it. That's the end.

Um we're about to go into q and a if you want to see any of the code that i built. It is available publicly on my re or on my github page at ambon slash re invent. Uh again, it's a demo where i mentioned the things about a media transcoder. There's no way to actually upload any videos to that bucket. It's just i had to have something in there to talk about.

Um i do want to say a special shout out to our cd k dot dev community. Uh they've been great for the last few years helping everybody learn stuff. Unfortunately, a couple of the people really wanted to attend this year. Uh gentlemen, particularly who's been really great about um helping everybody in the cd k community glib. He unfortunately cannot be here with us today. I really wish he could have.

But that is it. We are now open for q and a. There is a microphone here."

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Advanced AWS CDK: Lessons learned from 4 years of use

Welcome everybody. My name is Matthew Bonne. I am the Chief Cloud Architect at Defiance Digital. We are an MSP based out of Charlotte, North Carolina. Our job is to help small and medium businesses start ups, things like that, who are just sort of getting
复制链接

扫一扫

Advanced AWS CDK: Lessons learned from 4 years of use

“相关推荐”对你有帮助么？