Scaling AWS Well-Architected Reviews through the enterprise

最新推荐文章于 2024-09-17 06:49:23 发布

taibaili2023

最新推荐文章于 2024-09-17 06:49:23 发布

阅读量409

点赞数 10

文章标签： aws

本文链接：https://blog.csdn.net/weixin_46812959/article/details/134595878

版权

Andrew: All right, good afternoon, folks. It is. Yes, it is afternoon. Um I hope everybody is enjoying being back at re:Invent. I know I definitely am and it's fantastic seeing so many of our customers, partners and AWS colleagues here as well.

So um quick, that went loud. Um so quick introduction. Um I'm Andrew Robinson. I'm a senior manager for solutions architecture in the AWS Well-Architected team. Um I'm joined today on stage by Ray.

Ray: Hi. My name is Ray. I'm the enterprise cloud architects working for TUI and also by Duncan.

Duncan: My name's Duncan Bell. I'm a Jewish listens architect on the AWS world architecture team.

Andrew: Awesome. And I'm now gonna ask you guys to vacate the stage so that you don't distract the audience. Ok.

So uh in this session today, um we are gonna be talking about um scaling AWS, sorry. Uh one of the monitors has just turned off. Oh, no, it's back, it's back. Ok. All right. Fair enough. We, we're working again. So we're going to talk today about scaling AWS well architecture reviews through the enterprise. We're gonna be hearing a little bit from Ray um about how TUI have adopted the AWS well architected framework, some of the lessons that they've learned and a bit about the success story that they've been able to uh to gather from using the framework.

Um and then Duncan's gonna share with us after that, uh a little bit about some of the new features and functionalities that we brought into the well architected tool in the framework and talk a little bit about how some customers have adopted those. But before all of that, um I just want to make sure that everybody is on the same level that we've all got the same sort of level of understanding here.

So with a quick show of hands, how many people are aware of the well architected framework? Ok. That's, that's a good number. How many people have conducted a well architected framework review of a workload? Ok. Pretty good. How many people have made improvements based on the outcome of that? Ok, slightly fewer people.

Ok. So for those of you who aren't aware, the well architected framework consists of these six pillars, operational excellence, security, reliability, performance efficiency, cost optimization, sustainability. Each pillar consists of what we call design principles. These exist to help you adopt a mental model for how to architect, how to build and how to operate workloads on the cloud.

Those design principles then lead us to questions and those questions then lead to the best practices that we share for how you should build architect and operate that workload on the cloud. So really well architected exists as a a mechanism for you to use on your cloud journey to help you build better workloads. It exists to help you learn.

As I mentioned, we have these best practices and we also have these design principles. They're there to help you learn and help you to understand those best practices and how you can adopt them in what you are building. And when you're designing it and when you're operating it these best practices, this is not AWS standing in our ivory tower saying this is how you must build on the platform. What it is is this is what we've learned from working with customers, working with partners, working with professional services consultants, our solution architects, our TAMs um our support engineers and gathering this data from them about where they have been successful and putting that together to form this framework.

As I mentioned, we have these questions in the framework as well and these questions allow you to measure your workload against these best practices. So this allows you to then gain an idea about where there might be some gaps where there maybe are trade off decisions that you had to make and why you had to make those trade off decisions. And then the outcome of that allows you to generate an improvement plan and then make those improvements to that workload, make it more secure, make it more reliable, make it more cost optimized. Um make it more sustainable. Whatever the priorities of your business are, you can then use the output from the review to make those trade off decisions.

So you'll see this word mechanism cropping up quite a lot. And anybody who's been to re:Invent before or knows about AWS has probably heard the word mechanism throw up, thrown up a few times. So we like to see well architected about being this cycle as we go through.

So identify a workload and prepare for the review. Make sure that you've got the right people with the right level of knowledge about that workload or about that design, make sure that you've got architecture diagrams or low level designs or high level designs or even RFI/RFP documents and information so that you've got all the data that you need to make an informed decision with.

Then you actually go through the review process. This normally takes four hours or so per workload that you have for some companies. It takes less time for more. It takes for some, it takes more. It really depends on the complexity of the workload and also the level of knowledge that you have, it might be that there's some missing bits of information that means it takes a little bit longer once you've done that review and you've then identified where some of these gaps are. You can then start to plan to make those improvements there might be some best practices that you've not implemented and you've got a really good reason for doing it. Maybe there's a trade off decision you've had to make and you decided to prioritize this best practice over that best practice and that's fine. You don't have to implement all of them, but you at least then know that you've made an informed decision and you can record the reason for doing that and then you can cycle back again and review that workload again over time.

Because your priorities change, that workload might change. AWS releases new features, new functionality, new services and all of that together means that that workload is not gonna stay static. So some customers we see them reviewing workloads perhaps every quarter, some every six months, some every year depends on you and your business and how quickly that workload evolves.

And then as you learn these lessons, you can then start to bring other workloads into here. And the lessons that you've learned from reviewing this workload, you could then apply to all of these other workloads that you've got operating within your environment. And that really helps to scale this knowledge out.

So one of the challenges that we sometimes see with this is around how to communicate what these best practices are. So as a simple case in point here, let's take a small, very, very small start up with two people in it. Ok? Between those two people there is one single communication channel, they talk to each other. So for them to share knowledge and share information and share best practices or share an understanding of that workload, it's fairly straightforward. There's there's one communication channel.

However, if we scale that up and just look at a small business unit here, 6 teams, each team has 6 people in within each team, there's 15 different communication channels and across that whole business unit, 630 different communication channels. So trying to get the state of a workload that even just this one team is working on, there's 630 possible different channels for how that information can flow.

So well architected can help here. Because by using the well architected tool within the console, it gives you a single place that you can go to to review and record the state of that workload.

So the well architected tool. So this is available at no cost in the AWS console for anybody who's not familiar who hasn't used it yet. It allows you to review your workload and then present you with areas where there are high risks, medium risks and then an appropriate improvement plan for how to implement those best practices. You define a workload in the tool, you then go through the questions for each one of the pillars, you answer that by the best practices that you are or are not following. Um and then we'll give you this improvement plan at the end of it for helping you to then recommend where you can make these improvements.

So just as an example to show you what this looks like, this is one of the questions in the cost pillar. So this is cost four. How do you decommission resources? So that's the question to get you thinking. Ok. So how do you re decommission resources within this workload? You'll then see that there's a description underneath and that description just provides a little bit of additional context to help understand the question with a question like how do you decommission resources? it's generally fairly out, fairly easy to understand what it is that we're asking for. But some of the other questions in the other pillars get a little bit more complex. So we wanna make sure that there's a description there so that you understand what we're trying to get to.

And then for each of the questions, there are best practices. So these are the four different best practices that apply for this. You'll also see next to it that it says info if you click on that, that will give you some extra information about what that best practice means. Some of the reasons that it exists and what you should be doing to follow that for anybody who's familiar with the tool, you may have noticed a little extra bit at the top here now, which is called trusted advisor checks. And this was a feature that we brought just a few weeks ago into the tool. And this allows you to do some level of automated checking for this.

Now, this isn't available for every question. It's not available for every best practice. Uh it's available for selected best practices and selected pillars. So in this example, here, you can see the best practice is implement a decommissioning process and then using the built in trusted advisor checks that are here. You can see that we've looked for any idle load balances on associated elastic ip addresses and so on and so forth. And then you can see the status for each of those. So you can see that we have a green tick next to idle load balancers. So trust advisors detected no idle load balancers, we're good. However, it's detected one and associated elastic ip address. So you now can make a more informed decision. It's gonna help you reduce the amount of time you need to spend during this review.

Um as I said, this is only available for selected questions, selected best practices. It's not across the whole framework. However, if you go into the tool and look at the trusted advisor checks, you will see there's a little feedback button. If you click that, you can then tell us what you would like to see in there. You can tell us that you would like to see automated checks for this best practice. And here's some of the checks that we think would be valuable for us as a, as a company.

So my ask there is please go and have a look at that. Um I will just say that we can't do this for all of them. So for example, if we think about operational excellence, one of the questions in there is about how do you define an RTO and an RPO? There's no API that we can query for that. Unfortunately, because we as humans. Well, we do have an API it's called talking to each other, but we can't actually have an API to programmatically check that. Um not yet, at least maybe we will do so.

Um but yeah, that's my ask really is go have a look at that, see what's available in there with the trusted advisor checks. And if there's something that would be useful to you, put that in there and that will help guide our service team on where the next integration they can bring in.

So I talked there about the framework and these pillars and some of the questions in there. And I wanted to go through what we call a bit of a hierarchy. So this starts over on the left hand side with the more general guidance and then moves through to more specific guidance on the right hand side.

So on the left hand side, we've got the AWS Well Architected Framework. What I just spoke about this is general workload wide best practices. These are not specific to an industry or a technology area or a vertical or something like that. These are the broad general best practices that we think all of our customers should adopt across their workloads.

They're moving more specific, we now have what are called Well Architected Lenses. So we've got 15 different lenses now and these are specific to domain technology or industry. So for example, for financial services or SAP or service applications, and these provide more specific guidance for that type of workload.

And then at re:Invent, last year, we launched a feature called Custom Lenses. This allows you to create your own lens with your own pillars, your own questions, your own best practices, your own improvement plan and your own logic for high risk, medium risk issues. So it really lets you build your own version of a best practice framework based on what you need from your organization.

So as an example, if we take a financial services customer with a servius workload, they can use the broad well architected framework, they can also use the financial services lens and they can also use the cus lens and then they could write their own custom lens on top of that for any specific regulatory or compliance requirements they have or for any specific best practices that they need to follow. And that way when they're reviewing that workload, they're comparing it not just against our broad best practices but against the specific ones for their industry, for the type of workload and also they can then build their own within there as well.

So this is just a quick overview of some of the lenses that we've, we've got available. Some of these are available as white papers and some of them are available within the AWS Well Architected tool.

Uh what we've actually started doing is working with some of our specialist SA teams within uh AWS to create custom lenses for each of these. Um so if there's a specific one of these that you would like to see, reach out to your SA or to your TAM or to your account manager or your partner development manager or some contact at AWS. Um and they'll help to be able to get you connected with somebody who might have written one of these custom lenses already.

Um in the same light. If there's a lens that you'd like to see for a specific industry, please let us know we're always looking to publish new lenses. Um and we'd love to get some feedback from you about what they should be. Uh and what topic areas they should look at.

So having said all that, I'm now gonna hand over to Ray who is gonna talk us through how TUI have adopted the AWS well architected framework and I can see Ray's got some colleagues in the audience. So give him an easy time, please. Thank you.

Ray: So much, Andrew

Hi, everyone. I'm really excited to be here today to share with you how we apprise well, architecture to our rela grows. I also will show you how we apply configuration consistency to our enterprise and still based on well, architecture as well.

But first of all, who is three. So tourist is a travel and tourism company that been operated for more than 50 years. And not many people know that we are really as a listed company. We own our hotels today, we have more than 400 hotels. We also own our cruise ships and also we own our allies. We have five airline that spread across more than 100 aircraft.

Today, we also have more than 1000 of retail store across our europe that can offer our holidays, make us to go more than 100 destination.

So back to our journey. So our journeys of well architecture start back into 2017. And at that time, we do not have well architecture to like today. We have been using well architecture based practice from aws and we combined it with our tweets own standard to own standard such as you know, access management or how you do the encryption. And we combine this together and we call to cloud standard.

We published the three standard through just normal chair point so that our community such as a solution architect or any our developer can use as a guideline so in 2018, we began to adopt well architecture too. We deploy well architecture to to some of our workloads across our enterprise. And at this point, we're beginning to see the benefit of why we're using well architecture to our workload in the clouds and how well architecture help us to operate within the clouds.

But the most important thing we began to see the metric, we began to see the technology risks, metic such as on the security or on reliability pillar. We also see the engineering metric which is on cause operational and performance fast forward into 2019.

At the beginning of the covid-19, we had a program called uh we have a global repat program called trips. And at this point, we're using well architecture to access our trips program by using well architecture in the distributed way, which is i'm going to show you it later on. And i also using, we also have a center view as well to control this maturity level and how we get to this point.

We not just started straight away. We have like our cloud adoption start back into 2017. And at that time, we create a program with aws called cn ac n a stand for cloud migration access program. And this program stand up quite a few capability to help us with adoption. We stand up like landing stone capability. We stood up quite a new tools to help with our adoption. We also stand after the team called g cd a, the d cd a stands for global cloud design authority as you might know it like ccoe or cloud office.

Now today, this is a very important team. This is a team that going to help you to scale out your wealth architecture to your, to your enterprise in 2019. When we adopting um well architected to two hours like enterprise and also apply to our global repla programs or trips. We also introduced a new principle. The principle is the owner of the workloads, create and manage their own maturity and this is very important and today is no doubt.

Well, architecture help us to come out of adoption phase. 80% of our workload, they are come out from adoption phase. They are now moving to optimization and some of our communities going to uh innovate and phase now exacerbate by using well architecture and the scale of our enterprise that well architecture have been apprised is across our seven master domain and those seven master the men, they are spread across 12 countries.

The master domain in this content is main customer domain and engagement domain or any data analytics domain. And today we have more than 1000 of our our it colleagues that applies well architecture and even more, we have more than 700 abs account that applies well, architecture on their workload as well.

So i have been talked a lot about our global repat program. I would like to give you a little bit more information about the scale and what it trips is. If you think of trips single, our engine room for two, it allows our holiday maker to customize that holiday plannings exactly the way they want. Either they want to just book the ready package holiday or add on additional holidays such as excursion or if they want to just buy the the flight only. This is available to trips platform and the trips platform, their green field deployment in the cloud and also combined with our existing technology stack as well.

The result, we have a single platform for reservation system, inventory system pricing and sourcing system. These are trips platform and the result we create a scalable single platform and also we create the travel technology ecosystem.

Let's go back to our journey. So well, architecture tools provide you so many benefits. One of the benefit that we found the most useful is when you integrate the well architected to a p with our reporting system reducing t as soon as you to integrate together, you begin to have the different dimension of visibility also for different community as well.

This report, for example, this for our senior management community, our cio our cto. Now they can see the maturity level of our technology risks and they also can see the level of maturity on the engineering, maturity as well across our enterprise. And then they also can see the maturity timeline as well even further.

Now, the product owner or the environment owners, they can able to see deal out on their own maturity, they can see how well on their maturity. Also, they can compare their maturity with another product as well across our enterprise.

Next view is this the one that i like the most, this is provides the workload owner the depth information of how very good they are. They also can see how not good they are or the the the the area that they need to improve their maturity so that the workload owners, they can manage their own road map or manage their own time to do the remediation.

So with all this visibility that we have from well architected to api we go one step further, we raising the bar by indu platform standard. The objective of platform standards is to help increasing your maturity faster and also it helped to deploy conflict rate and consistency across the enterprise.

The most important thing, the platform standards now help us to apply the security exactly the same way across our enterprise. Today, we have four platform standard. We have infrastructure standard. The policy around infrastructure standard is for example, the vpc standards. So to help you to to deploy the the vpc exactly the same way and also in high availability, of course our enterprise. And also another example, policy is about a high bridge dns standard. So to ensure that you can have the configuration of your dns exactly the same and consistency configuration, we have um application standards as well as the um the security standard. We also have the sustainability standards as well the policy on the sustainability standards such as how you review the footprints on your object in your s3. For example, also how you select the right instant to help you reduce the carbon emissions.

So the overall our global transformation program and our cloud adoption journey with aws, it allows us to concentrate more on our customer and also allow us to cure an unforget moment when our customer have a holiday with us. But with that, i pass it on to duncan and thank you so much. I will help answer the question afterwards.

Cool, thanks ray. It's awesome to hear how to have been using well architected to, to really help their customers and make data driven decisions. Um i don't know about anyone else, but you know, i'm loving being in vegas, but all those pictures just really made me want to go to the airport and like fly somewhere, fly somewhere to the beach and go, go, go on holiday.

So um following on from the the awesome stuff ray has been doing in two, i'm going to deep dive into three areas uh to sort of help you in this area.

Ok. So firstly, we're gonna look at well architected custom lenses, ok? And how you can develop and automate that process. Uh secondly, we're gonna look at reporting and dashboards, ok? So how you can identify risks and improvement areas um and, and have that sort of centralized visibility area? Ok.

Um and lastly, we're going to look at streamlining your review process with templated answers and all of these uh three areas that we're gonna be looking at have got reference architectures so you can go away and uh look at implementing them yourself.

So earlier, andrew talked about a feature in the tool called custom lenses. Has anyone used custom lenses yet? A couple of people have, have been building some lenses that's cool. Um so i'd like to share a little bit more about how our uh customers have been, have been using this this feature. Ok.

Uh so when you review a workload um in the well architected tool, you use the standard framework, we've also got three lenses that exist in the tool. We've got the serverless lens, the ftr lens, which is the foundational technical review and a software is a service lens. So you can use the main framework, one of those three lenses um it out of reinventing as andrew said last year, we, we, we released this new feature of, of custom lenses which allows our uh customers to create an author, your own custom lenses with your own pillars, your own best practices, your own hr i logic um and your own improvement plans, these are all, these are all directly aligned to your specific requirements.

So you can basically build out your own best practice framework. Um you know, for example, you may have like a specific sort of governance or compliance standard you need to follow or maybe some development, best practice standards that you want your teams to, to use and you can use a custom lens uh to, to help your, your, your, your teams follow those guidelines.

So to help you uh with this approach as well, we've also released a new feature in the tool this year. So you can um share these lenses using aws organizations. So who's uh using aws organizations in, in, in the audience? Quite a few people? Awesome.

So um custom lenses can be shared between aws accounts using iron principles. You can use users roles, account numbers, but you can also use aws organizations. So you can share custom lenses with either an entire organization or a subset of organizational units.

So we have a recommended pattern here. So we're going to create like a new aws account, which is gonna be our central custom lens repository. We can then share out our lenses either to the entire aws organization from there or to individual organizational units. And this means that we're getting a consistent approach. Ok. We've got the same lens shared across to each of the individual accounts. We're not getting a sort of divergence of versions and things like that.

So you know, to, to, to move on and evolve this, you know, we should be using following our own aws best practices when we're implementing something like this. Uh so, you know, we want to make sure our lenses are in version control, we're building an automated pipeline here. Um you know, deploying the new version of lens using ac isd pipeline.

Um this, this, this sort of architecture enables like an automatic publishing and sharing of new versions of your custom lens. So you know, you commit the new version of the lens and you could commit repository. This is triggering an aw lambda which is then, you know, interacting with the well architected tool api and uh sharing the lens across all of our accounts and then our user can then um you know, use that new lens to review their workload is that is anyone using the well arched tool api to do anything? Cool. Ok.

So cool. Hopefully you'll have something to take away from the session and go and go and build something. Um you know, and when we're developing a custom lens, we need to think about this. Um you know, the same as we do with any kind of like development cycle, we need to do small um continuous improvement iterations. Ok?

Um we want to be delivering the most value with the least amount of effort and rather than making assumptions, we want to be, you know, testing our hypothesis testing our lenses with our end users. So not just going and building a big massive lens and then just applying it to people and we want to get feedback from it.

So, um andrew talked a little bit earlier about our um cycle of um you know, how we apply well, architect as a mechanism and we've got a similar pattern here. Um we've got this plan phase where we're, we're able to uh you know, sit down and define the scope of the, of the lens, think about what outcomes we need to deliver, who are the stakeholders that we're going to need to talk to um in order to get the relevant information and who we're going to be able to um you know, apply the lens to in in the implementation phase.

Um you know, we need to get all the relevant people together do a bit of a brainstorming session, find all the relevant bits of documentation for the improvement plan, what sort of tools we're going to be using and things like that, then we're going to measure the success. So we need to actually test this lens against a subset of users. So find the people that are going to get the benefit from it. You, you know, get them to use the lens, see if they can get some feedback from them.

And then we're going to then finally take all of those findings and apply that to the lens and, and iterate on that. And by keeping that in version control, we kind of can see the development of the lens.

So having like a centralized view of your architectural health um across all of your aws workloads, you know, provides those metrics that you need to make those informed decisions. Um so viewing um insights into patterns and you know, the improvement areas that you, that you have can really drive like a strategy such as like budget allocation, whether you need to invest in certain areas of enablement for your teams.

Um you know, or whether you need to bring in maybe like an aws partner to, to sort of help um focus on a specific area. So by having that dashboard view, um you know, like that can really help.

So an event driven solution like this, you know, again, it's using the well architected tool, api and a lambda very similar architecture to before is exporting the data from the well architected reviews um and, and exporting it into a bucket.

Um and, and we're using glue and, and athena to, to, to deal with that data and then we can use amazon quick site then to sort of make some visualizations. So similar to the way ray was working.

Um you know, we've got this dashboard now where we can see patterns across all of our um architect, uh all of our well architected reviews from all of our accounts so this this example uses amazon quick site and there's a link at the end of the presentation to a well architected lab where you can go away and um you know, deploy this and try and build something.

Yourself, i think rays example, you're using tableau, you know, so you can use any tool you want to visualize the data um afterwards.

So next, we're gonna look at the pattern. So how you can help scale and streamline your well architected reviews. So one piece of feedback we've had from uh some of our larger customers is that when they're running well, architected reviews on many, many workloads, they often have repeat answers that apply across all of those workloads.

So, um for example, you maybe have uh used like a third party identity provider or you have like a centralized logging and monitoring solution, maybe you've deployed control tower and you have that for vending accounts and you've got, you know, certain compliance and controls deployed across all of your accounts using that. And so, you know, those answers are going to be the same across all of those accounts.

So, you know, you're kind of wasting time by when you're doing reviews by having to answer the same questions all the time. So this this aws solution, again, there's a link at the end and allows you to pre fill a templated aws well architected review.

So again, it's working in the same pattern. We have a central well architected management account um where we can basically prefill the answers.

Um and this is fully automated, you know, so when you update the template and then, um you know, propagates out all of the, the the child accounts and when, when our users then come along to do a well architected review, um you know, really streamline, streamlines that process because they've already got some of the, the content and data already filled out for them.

Cool. So these are, these are some links to uh to our uh the the four areas we've talked about. So we've thought about how an overview of well architected. We've looked at um you know how, how ray to use well architected framework and we've looked at custom lenses, how we can automate that development process, how we can follow a best practice.

Um you know, development process, we looked to those templated answers as well on how we can template and streamline our review process.

Cool. Thank you so much.