Scaling AWS Well-Architected best practices across your organization-CSDN博客

本文链接：https://blog.csdn.net/just2gooo/article/details/135090313

I am sure we've all had this thing where we want to follow best practices. We want to make sure that we are building cost efficient, reliable systems, right? But it it can get overwhelming, it can get confusing, it can get complicated.

What if I told you there is a solution and that's the Well Architected Tool and the Framework, it can really help you structure how you operate and manage in the cloud. And I'm Samir Kopal and I'm going to take you on this journey with my colleague, Elana Morris here. Let's get started.

The first thing. Let's talk about the history of Well Architected. Where did Well Architected come from? Why did it come long, long time ago in 2012? This is an event with one of our popular services and unfortunately, some of our customers were impacted and we did ask the question of who's impacted, what happened. How can we make sure this doesn't happen again? Just the standard questions that everybody asked.

But there was one question which was who was not impacted. And that led us to recognizing that there are certain customers who were resilient to this event who managed aid by having the proper redundancy into their systems. And when they did that, we recognized the need to share that with everybody else.

We had learnings as AWS, we had learnings from our customers. So we got a bunch of solution architects together and said, what can we do? And they came up with what today is known as the framework. In 2012, it was launched as a white paper.

What happens next? We start using this framework to educate our customers to teach them about these best practices. We had four pillars back then, security, cost optimization, performance, and reliability.

What we recognized later as we started doing these reviews and talking to our customers and learning more is that architecture doesn't operate in a silo. What we need with it is people and processes. And that led to the introduction of the operational excellence pillar in 2016.

When we started putting these together, we had our solution architects, our account teams, a field team starting to use this with the customers. We also recognize the next need was to how do we start documenting some of these things? How do you actually measure right? Whether we're improving? I mean, not these are conversations, it's great. And that's when probably exactly five years ago to this particular minute. The Well Architected Tool was launched in 2018.

So happy fifth birthday to the Well Architected Tool today and we continued to evolve the tool based on feedback that we heard. So we heard feedback on, you know, we should go ahead. And one of the biggest things that we did was increase the size of a note sphere. And i touched upon that why that was important. But there was a first feature request as an engineering leader back then. I was like, why do people want, you know, do they not want automation? Do they not want all of these other things? Why the notes here? So hold on to that thought, i'll come back to that.

And then in today's environment, it's very important that as your architecting as your building, you think about the environment, right? You think about the climate and that's when sustainability became an important thing for us to introduce. And that was in 2021 that we introduced the sustainability pillar for us.

And we added more features as more content comes in. We keep reading what new best practices are our customers are getting more mature in the cloud, the kind of guidance they need is different. So we continue to evolve the framework. So you'll see a lot of new versions of the framework with a lot of new content coming up, we added features to see a consolidated view and we'll get into all of that later in the session.

So you'll get to actually experience how that can be used to scale some of these best practices. And i'm also very happy to announce that earlier this week, we launched a new feature called the lens catalog. As the cloud landscape has evolved, we've realized that it's very important for us to not just rely on these foundational best practices. We need industry and technology deep dives that need to happen. So we introduced a lens catalog that introduces a lot of these lenses, for example, a health care industry lens that has to deal with compliance. It's a very regulated industry so you can dive deeper on how these compliance things impact your architecture. So that was the lens. That's a little brief history of the Well Architecture Framework.

But what is the framework, the framework? One of the things we also wanted to make sure is how our customers consume this content. Think about anyone here by show of hands would love to have a single page with all the best best practices in there. One page has everything i see some hands there. That's awesome.

The thing is we wanted to make sure that we categorize these and pillars was a great way to do it. But why pillars, if you look at an organization, there are teams that walk through security, there are teams that deal with the operation benefit. There are teams that talk about budget and other things. So those pillars became important because now that content could be consumed in a way in these categories that we put together. So that was where we had pillars.

Now, pillars have something called design principles that's subcategories. So for example, you'll have observ ability as one of the design principles that has a dive that is a deep dive into much detail concepts. And then you have incident response, we have identity and access management for security. So there's a whole bunch of these design principles that allow you to consume this content in a manner that makes sense.

And now you're telling me why questions though. Well, the reason for questions is we want well architected to be a conversation, it's not an audit or a checklist. Guess what happens with an audit or a checklist, you make the same mistake over and over and over again. So every workload that you build, you operate and your architect, you'll see the same things popping up because you're actually not educating your teams.

Let me give you an example of one such question that helped me when we were building the well architected tool. We said we'll do electric review at this point. You know, you, you are the engineering manager for the electric framework, probably read it a few times and you're ready to be like there are going to be no high risks in my system, right? Guess what? It was far from true.

We started having this conversation. One of the things that came up was why do you name your resources? And i looked blankly around the room with my engineers and say, i don't know. But that led us to say you can manage it better if you, you know, you don't really need to name it. Let the order naming happen and will work well. And we did that. It was easier for us to deploy. It was easier to maintain. There were no typers, right? We didn't run into issues because somebody typed the name of resource wrong so that those are things that don't come up in your code reviews, no one's asking those questions in your design reviews. This helps you ask those questions. That's why the framework is structured in a manner that you make this a conversation. You have those questions and of course, the core of the framework is best practices, right? This is the what you should be doing and how you should be doing it.

So that's the voc framework. Let's talk about what is well architected tool. So once we had the framework, the next bit was to say, how do we actually track some of these things? How do you measure, how do you make sure this is consistent, adopted, et cetera? And with that, it's very obvious that the tool helps you identify those risks.

So you go in, there's a question you have the best practices, you check the box to say whether you're following the best practice or not. There is guidance available in terms of trust, advisor checks that come in to provide you with you know, hey, here's the resource, here's the check, which is red, yellow green, but you can identify those risks.

Now, the other thing that the tool actually does and this is where i want to circle back to my conversation on notes is the ability to document decision and tradeoffs. Let's do another show of hands of how many people have been in a room or a discussion or a meeting where someone says, why did we make this decision? And the response is, i don't know, this was three years ago. There you go, lots of hands there, right? And then you're like uh ok, let's ask the person who made the decision and the answer there is, oh, well, they don't work here anymore, right? And you're like, ok, so how do we reverse engineer this? I don't know why we traded off cost for performance. But now with the well architected review, you get the ability to actually make those decisions and document those decisions. That's the reason the notes we are so important, right? Because now it became a tool that these decisions were being tracked. It's historical context for future decisions.

And then you can now track improve workload, health. You often go ask somebody, how's the security posture of your, of your workload? It's great. Nothing wrong with it until there is an issue, right? How do you manage cost? Have you improved on your cost? Risks or have you not? And the answer to that is I think i'm improving, right? But now there is a way to actually measure that in the tool, you can see what you're doing. The conversation becomes more structured. The conversation is about, did you address the risk that we had? Are you sure that we are programmatically managing access to our resources? Are we rotating our keys? Do we have multi factor authentication enabled? So now it's a very structured conversation rather than these vague and conversations that say, how's your cloud posture? How's your security posture? Right.

So the value of the tool is to improve and actually measure consistently across all your workloads. So we touched upon what the content does. We touched upon what the tool does. What it brings though is data

Now we all know Amazon is a very data driven company. So we we put a lot of emphasis on data. So now you have these trade of decisions that you made, you have these risks that you have identified. You have the improvement plan on how you can actually improve, right?

When these things come together, you can make informed decisions using data, think about it where you were having these conversations without the tool to say, are we doing this? Are we not doing this? But there was no one looking across a portfolio. The data allows you to do that. You can look at it as an IT leader and say, oh I see we have security risks across the board. Can we run a security campaign? We are spending way too much money. Can we optimize here? Right. Can we rearchitect those decisions now are data driven? And when you put the content, the tool, the data together, what you get is what I call a well architected mechanism, we learn a lot more as we proceed today to talk about how does it actually scale? Right. But the well architecture mechanism is what allows you to actually manage and govern your cloud resources in a more structured, efficient manner. It allows your teams to learn measure and improve their workload health.

So it's no longer a conversation of like how's your workload doing? Think about when you go to your doctor, how are you doing? And does your doctor diagnose you by just asking that question? Are you feeling ok? Or do they actually run some diagnostics and say, well, I can see some here you have some issues, right? So those are things that become important because now these are data driven conversations, these are from recommended resources.

So AWS learns these things from, you know, things that happen within AWS. It happened with our customers new features that we launch. It's an ever changing landscape of best practices. And the tool is the way where you can look and figure out what's going on with that.

I'll hand it over to Elana to talk us through the how how do you actually scale this, Elana.

Thank you. So before we actually get into the, how I want to take a step back and think about what's on some of our customers minds. So, so when we think about the video that we saw at the beginning of the session, we saw that they were talking about compliance reports, CCOE reports, security reports, all tracked in different places.

Now, one of the biggest questions that we get is how do I actually standardize all of these best practices across my organization? And then we also get questions around. How do I actually bring in my business context into my review? How do I make sure that my engineering teams and my development teams are focusing on the right outcomes and business needs. And then how do I actually make sure that I'm adhering to not only the architecture, best practices and the well architected tool, but also some of our internal best practices, whether that's compliance or internal security needs.

So all of these things are on our customers minds. And today I'm gonna walk you through how you can actually answer those questions using the well architected tool.

So to go ahead and get started, you can find the well architected tool in the uh AWS management console. So you start by going ahead and defining a workload. When you define a workload, you fill out some metadata about what your workload is. So we ask you for a name, a description who's reviewing it some other information about what type of workload you're reviewing a little bit later on as you're defining the workload, you can enter your account IDs and or your application ID that your workload runs in. This is significant because that's where you can actually use those account numbers to activate trusted advisor.

When you activate trusted advisor, once you start performing the review, we'll pull in resource level checks compared against the AWS well, architected best practices and actually show you evidence to help you make a decision on if you are or are not following that best practice that's listed there.

So I've already gone ahead and defined a workload. So when you go into your workload, you can see all of the metadata that I just walked you through populated in this workload properties section here. Once you're ready to go ahead and start reviewing best practices, you can go ahead and click, continue reviewing and start answering questions.

So Lara, one of the things that i constantly hear from customers when i'm having this conversation is, you know, how do you scale this? Right? And the thing is we have large enterprise customers and customers that are scaling to like thousands of workloads that they run in the cloud. One of the things is that there is a lot of data that they want to use, like for example, they have a platform of security, so they have authentication and authorization and those things that are built in and they apply across all workloads, right?

So how do we use this to scale for that challenge?

Yeah, that's a, that's a great question and a lot of our customers tell us, you know, we have common best practices that apply to all of our workloads or multiple workloads, right? Whether that's security of best practices or best practices from the cloud center of excellence team and what they really want is to reduce redundancy, right? Answering the same questions over and over again for every single workload, review is really tedious and it doesn't, it's not a good use of time.

So we have a feature called review templates that actually makes you uh that helps you create a template where you can um fill out the different questions that you need.

I'm just gonna pause for one second here. Give me a second. Um so you can actually create a template where you can select all the different lenses that you want to um create in your review and go ahead and prefill the answers to those questions.

So I've already gone ahead and created a review template and you can see I've included the well architected framework, a custom lens that I've created for my organization, which includes some of those internal compliance best practices and it also included the SAS lens.

Now, I'm really excited about where the SAS lens came from because it came from our lens catalog that Samir mentioned that we uh launched earlier this week. So this lens catalog is a collection of different lenses across different industry and technology domains. So we talked a little bit earlier about the health care lens. You can see there's a serverless lens iot data. These are more specific topics that are most relevant to your workload.

So this allows you to customize your review even further. So now you can actually do a well architected review using the well architected framework, any custom best practices that you want to include, and you can also add additional lenses from this catalog here to your review. This is currently available and launched earlier this week. So we're really, really excited about this, right?

So, so one of the things is, is the great where right, where you can go ahead and add these review templates in and add the lenses that you want your teams to follow. So it drives consistency in terms of how people do this reviews. How do I actually scale? How do I actually share this?

Yeah. So um before I tell you how to scale, I also, I wanna show you how you can actually answer some of the questions in the review template. So again, just to remind you, you can fill out a pre existing template there, you can actually select those best practices that are common.

So I'm gonna show you for the well architected framework, how you actually answer those questions so the experience looks very similar to what you would see when you're actually performing a well architected review when you're just defining a workload.

So you see here, I've gone ahead and already answered some questions from the operational excellence pillar here. So these are all best practices. The first four that I don't want the engineering teams worrying about these are already taken care of and I can actually also add notes. So this is really great to help add context to why these best practices were selected.

So, Samir to answer your question, when you actually wanna go ahead and share this review template, you can go back and go to the shares section and this is where you can share your review template with an individual account or an AWS organization.

So what this does is then anybody who has access to the review template can then go ahead and define a workload from that template. So they're starting their review with a baseline. So some of these questions will already be answered for them. They don't have to worry about it and they can focus on what they need to focus on for their specific workload.

So I will show you what that looks like when you go ahead and go to a review uh workload that already has review template attached, you'll see the same thing. We see the same exact answers that I had in my review template in my actual workload already filled out for me.

So, so this is great now that you've got it down to the teams to say this is common. This is driving that consistency, right? This is essentially helping your teams not guess. I think I'm using a third party federated security service here and I think they do that right. There is a team that's responsible and it's the same answer across all workloads that are actually doing this.

So the other thing that we commonly hear is, you know, this is allowing the central teams to send that out and scale those consistent best practices. But there is a constant requirement like putting on my engineering hat where I have nonfunctional requirements to take care of. I have product managers telling me that we need this feature out on this date. I have compliance needs and all these operational best practices and guidance that I need to follow. My engineers typically ask me, which one do you want me to do? First? Do you have something that I do? And I, I thought you might ask me this question.

So we know that at different points in the development, you know, life cycle, there are different priorities that you wanna focus on. If you're launching a new product or a new service, you might wanna focus on security more than you might wanna focus on cost.

So how do we actually translate that into prioritizing what's most important to you in your workload? Review at any given point. And the answer is profiles.

So profiles allows you to actually go ahead and provide more business context into what is actually important to you at any given point when doing a workload review. So you can go in answer a short list of questions around where you are in your cloud adoption journey. What workload, what kind of business value this workload represents, what improvements are most important to you. At this given time, you can create your profile and share it with anybody that needs access to it, right?

So we talked about engineering teams needing to know the priorities to focus on. Great, create a profile, share it with them. And then what they can do is you can go back to your workload and actually attach that profile to your workload. And what that does is it prioritizes a list of questions to focus on.

So when you actually go back into your well architected review, you'll see a specific section that gives you a prioritized list. So now you know that based on the profile that I attached, I now have 50 questions that I need to focus on. Now of those 50 questions, some of these are already answered because of review templates. So now you have some of those common best practices already filled out, taking the guess work out for those engineers and you know what to focus on based on what's important to you. And what is important to the business because that profile was added to your workload.

So that's perfect right now, this is where you connect, what your leadership wants all the way to how you actually implement this. So I can take business context. Like I was talking about a security campaign earlier, right? You want your teams to focus on security? It's important, it's required for compliance, it's required for an audit. It is something that you want to pay extra attention to

You: Go fill up this profile, identifying security as your top priority, share it with all workloads that this applies to and then you prioritize a set of things that you want the engineering teams to work on. So that's great, right?

Um one of the things here is you prioritize the questions for them, but how do they know what to work?

Yeah, the so the most important part, right, we wanna make sure that they actually make improvement. So you'll see here that we have overall risks. So that's all the risks from your workload review. But in addition to the prioritized questions, we're also giving you prioritized risks. So now you know that out of the 18 risks that you have 12 are the ones that you should really focus on.

Now, it's important to know that profiles is not telling you don't focus on the other risks or don't focus on the other questions. It's helping you actually make a prioritized decision of where to focus first. So now, you know, ok, I'm gonna focus on these 12, get that accomplished, align to the goals of the profile and then go back and fill out the other risks as well.

Alright. So, so now engineers are starting to just go from those. Like here's everything that you should be doing to. Like here's the 10 things that i need to work on in a dress for a launch, whatever and save that as a milestone as well.

Alright, let's go back to the cloud center of excellence persona here, right? Like where I'm looking at a view, this is great for a workload where you know, a dev engineer, an engineer can go in and start working on the right priorities. But as a central team managing that portfolio, so you saw that oscar worthy performance on the video, right? Which had this big portfolio of things like how do i manage that?

It works well when it is one, uh i would like to see a macro level view of what's going on. I'm so glad you wanna see that because I want to show you. So we have a consolidated view uh where you can actually see your risks across all of the workloads within your portfolio.

So you can see here, i have four workloads and i see, ok, i have most of my risk that lies within security. Ok. What does that actually mean? How do i actually action on that. Why does security have the highest risk across all of my workloads?

So what i can do is i can scroll down and see, ok, if i configure service and application logging for se for the security pillar, it will actually apply to two of my four workloads. So what that's doing is it's increasing the efficiency that you're having by making the improvement once, but it's applicable to two of my workloads.

So this dashboard here gives you that macro level view where you can actually see across your entire portfolio, where do my risks lie and where should i focus my efforts?

So this is great, right? Because you're also scaling your resources with this. So you're not just scaling in terms of saying now i have a portfolio of you, but you can look at this and say, where do these risk across your organization? And when you look at that, you start realizing that there are, you know, some things that we can do which will impact 468, 10 thousands of workloads. And that becomes a priority that becomes that you can form a little team that goes in and addresses this for somebody else. You can build architectures, you can build tooling.

So this is the data aspect that i was talking about earlier. This data becomes important on making those decisions. You're not making decisions based on one workload, you're not making decisions based on two or the team that you're aware of. You are looking at a much broader picture to say across our organizations, how do i scale to be more cost effective, more secure? Right? Or how do i even go ahead and improve performance for certain things? And you will realize that there are certain things that you do in one workload can be used and reused in other workloads.

So that's great. This view is something that is very valuable but on that, why should they take our word for this? So what we're going to do is bring on melissa cazalet. She is the vp of software engineering at mule. So sales force to talk about how they used well architected across their organization to safeguard.

Welcome melissa. It is ok. Thanks. I think the only thing we're missing is some really cool walk on music. I got really inspired by the big band uh that was playing yesterday uh during the keynote and i decided that that needs to be in my future for presentations.

Um so thank you very much for the intro introduction. I'm melissa casi. I run software engineering for neil soft and a vp role there. I am responsible for all things trust and that's something that is a sales force term. This includes security grc oversight cost to serve all of our reliability engineering, incident management, problem management and the like i'm the person that is responsible for making sure that neil soft is secure compliant and reliable.

There's no stress or pressure whatsoever in my job. Just kidding. Alright. So just a little bit about mulesoft and who we are. Mulesoft was founded in 2006 as a project to really give users an easier way to integrate applications and data. In 2013, we released our any point platform, which is a full life cycle api management platform that delivers api design integration, security and governance.

In 2017, we had a ip o and a year later sales force bought us for a cool $6.7 billion. We keep adding to features um you know, to our platform. And we're really excited that we have added our robotic process automation um capabilities, which is really giving our users the opportunity to automate repetitive tasks and processes where those traditional api l strategies may not be feasible.

So that's a little bit about who we are. So today we're gonna talk a lot about our cloud operating model. So for us having a mature cloud operating model is really critical to being able to scale well architected across our entire organization, we're able to build that well-architected culture from the top down by using this.

So taking a look at this graphic here, really, what we're looking at is this cloud stages of adoption and whether you're born into the cloud or on premise and making that cloud migration journey, a lot of uh similar behaviors and capabilities come into play.

So one of the things that we discovered is as you build over time, eventually your value um really gets to a point where your value realization stalls and your ability to continue to innovate in an agile way, sometimes just flattens out. And this is really where we found ourselves in april of 22.

So this can happen for a lot of different reasons. We have um things like, you know, resiliency issues um due to disruptions, you could be spending a ton of resources on tasks that are manual and not automated, like patching or probably the most popular one. And the one that affected us the most is you get to a point where your cloud consumption costs start to overrun your budget.

So for us, this was a critical point in our uh time frame. So we discovered in april of 2022 that we were about to overrun our budget by the end of the year, by the tune of $12 million. This was a really big. Uh oh, so of course, what do you do in these times of crisis? You spend together a bunch of teams and say, all right, we're going to have 85 new cost savings initiatives. Stop what you're doing. We need to make this a huge priority.

As you can imagine, this was hugely disruptive for engineering teams to get to that point where all of a sudden they really had to figure out how to deal with cost, how to work all of these cost savings programs into um you know, their road maps.

In order to be able to recoup the $12 million we simply did not have the monitoring and visibility needed for us to get these signals sooner, for us to empower our engineers to be involved in the cost management process to make good decisions. And so that's when we decided, look, we have to be more proactive about this.

Fortunately, we weren't the first company to go through this and we certainly were not the last, won't be the last. Um and we really were able to use our strong partnership with aws to think about how do we become more proactive, how can we create a cloud operating model that has a foundation in the well architected framework.

So as we're going through this journey in 2022 we had several key takeaways that really helped influence our strategy going forward.

Number one realized that governance in the cloud is really critical as you're dealing with scaling and growth and new services that are being able uh come online. We really have to think about how to do this and govern this properly.

The second area is, you know, around cost reduction. Our key takeaway here was um had many parts. One of them was look, this is a cultural change. We have to really educate folks on how to think about costs differently and how to be more advanced um and proactive in making their decisions here, we realized that we needed to serve up data to our teams that's accurate.

We needed to serve them actionable intelligence, making sure that it was easy and a self-service model in order for them to be able to do it. Because if it's hard, the engineering teams are certainly not going to pay attention to it.

And lastly, we learned that long term operational efficiency is really key to being able to move forward and continue to innovate in a way that we want to.

So to do this, we decided to pin our cloud operating model to the well architected framework and come up with some well-architected goals.

So our first step in this journey was to figure out what team we're going to create to drive change in this area. This is where we created our mulesoft cloud oversight engineering team, which was really there to make sure that we engage the right stakeholders, make sure that we got buy in from our executive leadership team and make sure that all of our teams are really on the same page for moving forward with a well architected framework.

This team is responsible for engaging with stakeholders like product management platform and infrastructure engineering service teams, security and compliance finance to make sure that everything that we were doing all the decisions we were making were founded in our strategic business goals equally as critical was ensuring we had that strong partnership with aws.

Not only did they teach us about how to come up with a mature cloud operating model, but they also taught us how to automate it, how to make it easy like i talked about before, so that our teams would really drive it forward.

So when we're thinking about our cloud operating model, we came up with basically a four tenant model, we call this our cloud oversight engineering framework and it all starts with preparation. That's really the first step where you get your leadership buy in, you make sure that you're attached to the right business goals and that you're deciding, you know which data is most important for your teams to see um and making that visible, then we move on to the data enrichment piece.

For us. we developed an in-house data lake house um which is really the foundation for this step where we took in all of that uh cost data and operational data from aws tooling and enriched. it made it visible to our teams via cloud central portal

And then from there, we move over to action. So after you've decided what data you need, after you make it visible and serve it up to your teams, then you need to get to the piece where you need to say, ok, what do you do with it?

And so for us, we use the Trusted Advisor recommendations and you know, basically build them into our model and allow our teams to take those under consideration and take action um on each of the recommendations from there.

We figure out, ok, we've done x number of things. What's the impact? Was this successful at all? Um if it was we celebrate those wins. If it wasn't, we say, ok, what are opportunities for improvement? And we go back and forth through this model, it's really a continuous cycle.

So I mentioned that we attached our Cloud Operating Model to the Well-Architected Framework. You can see the pillars here. So we have the six pillars from my intro. You probably know now that I run pretty much everything in the Well-Architected Framework. So every pillar is critical to me and everyone should be prioritized. But in real life, you can't prioritize everything at the same time.

So here's an example of some of the projects that we've been working on the fall into each of these pillars over the last 18 months and keep in mind we're not going to be mature in every single pillar at the same time. So for example, for the last year, we've really been focused on the first half of the Well Architected Framework and working through these recommendations and we'll be transitioning to other areas.

Um you know, once we've achieved a lot of those goals of utmost importance to us is how are we executing against our goals? So, what we've laid out here is pillar by pillar, some of the largest projects that we've put forward in the last 18 months that have had a lot of success.

So going back to that first example that i mentioned about our $12 million cost overrun. The first pillar focus was of course cost optimization. We were laser focused on cloud financial management. What we did was basically overall overhaul our analytics infrastructure by creating our lake house. We took those cost and usage data from AWS, incorporated that, pulled that into our uh data lake house.

We also really amped up our usage of CloudWatch to give our teams that full operational health overview, which was really critical to understand things from a big perspective. And we drove all of those cost savings initiatives really hard over an eight month period. And at the end of the day, we did get back that $12 million in cost savings.

While that was a really tactical effort like i talked about, we're really interested in moving away from the tactical, moving towards the strategic and making sure that we don't have to be in that reactive mode. So that's where a lot of the work that we're doing. Um that I'll show you in the next couple of slides comes in for the other pillars.

Um operational excellence, we uh incorporated AWS Systems Manager into our patching. This allows us to do live patching for about 400,000 EC2 instances on our CloudHub product which is our big revenue generator for MuleSoft. This is so critical for us, not just because it automated a really painful manual process for a ton of hosts, but it also really up, uplifted our security posture because we moved from really being constrained to patching once a month to being able to patch whenever a vulnerability comes available. This really helps improve customer trust, helps us hit our security SLA's and is really effective.

We have also been spending a lot of time investing in our folks investing in talent and their skill set. So in the reliability pillow, you see that we participated in a two day AWS School of Resiliency session, which was really fantastic. And for those of you who are um struggling or really interested in kind of amping up your reliability and resiliency, I highly recommend this. This was an opportunity for us to dive deep with our AWS SME's in reliability for two days, bring our custom workloads to the table, bring our challenges and really figure out how do we, you know, amp up our resiliency architecture and patterns for NeilSoft specifically, really, really invaluable.

I had um you know, 15 of my most senior architects and engineers come back from this really enthusiastic about diving into this problem. We're also utilizing the AWS Learning Needs Analysis Questionnaires to identify those gaps um and skills for all of our engineers. And so basically, each team goes through this process of answering long questionnaire about their skills. And then we take that and work with AWS to say, ok, here's our gaps, here's what we want to prioritize. What content can you help deliver to us. Um and this is a process that we're really excited about.

And then the last pillar here we have sustainability so soon we will be able to serve up our carbon consumption data to our engineering managers which is really gonna help them be able to say, ok, i want this type of balance between performance and sustainability. Sustainability is one of the core values at Salesforce. So this is something that we're really passionate about and excited to serve up to our teams.

So this is where it all comes together for us. This is our MuleSoft Cloud Central portal. This is where we provide in a self-service model. All of the information that we've been talking about today from the Well Architected Framework in one single pane of glass.

So we worked with AWS in their Trusted Advisor recommendations to really pull all of this data together in one place. So you see at the top here, we have Asset Inventory. So this is basically an overview of this particular services, complete asset profile, all all of the costs associated with this on a daily basis. This really helps us see those fine tuned processes as fine tuned patterns in our cost consumption and allows our engineering teams to go wait a minute. This isn't right or you know what, i'm really not tracking well against my goal here. I need to make some changes and you can click on every single thing in this graphic and dive down deep into things on an individual asset level, which is really fantastic and super empowering for our teams.

You'll also see on the bottom half here, the six pillars from the Well-Architected Framework. Again, these are coming from the AWS Trust Advisor and these are tuned specifically to those Well-Architected goals that i talked about earlier in the presentation.

So we have decided that we have certain goals for every single one of these pillars and we're able to tune this to show teams. How well are you executing against those goals? Where do you have room for improvement? Where are you winning?

So you see here, you know, going through the first column, we've got Operational Excellence. Um asset tagging was a massive project for us this year. We wanted to make sure we had 100% coverage on MuleSoft, half a million assets for all of our pro proper asset tagging. And so you can see teams can dive in here and say, ok, i've got a little more work to do or i'm good to go.

You'll see recommendations covering the other pillars like Security and Compliance, Reliability, Performance Efficiency, Cost Optimization is obviously a huge one and then the Sustainability one will be coming soon.

As I mentioned, this Cloud Central portal is highly automated. We leveraged Amazon's RedShift for this. And we're really working with Amazon to think about how we can amplify automation with this even more.

So we're looking at integrating things like Amazon Lex for a chatbot, Kendra for our intelligence search. We're even looking at how do we integrate large language models into the picture to further refine our search capability and amplify it.

So we're really excited by all of the things that this drives. We're really excited about how we've been able to operationalize and serve up the basics of the Well Architected Framework and, and manage our execution of each of the pillars in a really succinct place.

Alright. So the last slide here is our Cloud Operating Model maturity diagram. Um you know, as we go through these processes that I've talked about, we start with those Well, Architected goals. This is really a snapshot for your team at a particular point in time. How are you maturing? And where do you wanna go from there?

We add that infrastructure data, our Trusted Advisor, Well, Architected recommendations, we serve those up to teams on a team by team basis. They don't need to search through a ton of stuff. It's right there in the Cloud Central portal for them. They can see how they're doing from there. They can choose to take actions to make improvements to really be accountable for how we're maturing as an organization for how we're managing our cloud costs and take action from there.

We decide like i said before, did this work? Yes or no. What do we need to iterate on? And we make those iterations and then we move to the next super happy era wheel that goes along the gold line of this chart, but this is a continuous process.

So as we go through each of these phases of assigning goals executing and improving, then we move up to the next level of maturity and that's it. Thank you all. This is how MuleSoft has put this into place. Again, we're super grateful to AWS for our partnership there and helping us really automate this process. That's super cool. I mean, yeah, what do you guys think? Was it cool? All those numbers of 400,000 instances? That's, that's huge, right?

And that's the key that we want to talk about today is when you look at Well Architected, it becomes a structured conversation. So if you look at how they set goals, how those goals were then turned into actual actions and then scaled to multiple things. So if you look at that graph, i was just looking at it as Melissa is presenting this particular graph right there daily cost analysis. How cool is that for an engineer to say, right? Like you know, we have these graphs that talk about latency or your latency is gone up or you have a spike somewhere. But look at cost, right? That's what allowed them to get to that 12 million savings.

So, really cool story. Thank you for sharing with us. I want to conclude this by letting you know that. Well, Architected is a mindset. It's not just a tool, it's not just a framework, it's where you use it to build a mechanism in iraq, you use it to make sure that you're operating reliably securely making sure you're looking at those costs, making sure that you are looking at how sustainable are these things that you're building? Are you making an impact?

So, with that, thank you so much for taking the time today and spending the time here. We'll open it up for questions now if you have any questions. Thank you so much, everybody.