Optimize cost and performance and track progress toward mitigation

最新推荐文章于 2024-08-21 21:03:58 发布

李白的朋友王维

最新推荐文章于 2024-08-21 21:03:58 发布

阅读量69

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134795298

版权

hi, everyone and thank you so much for coming. hope you are as excited as we are about all the launches and announcements and you, you feel motivated to get back home and try all these new services and features which you learned about during this re invent.

and i have a question for you when you run bar codes on a w how confident you are that you run them in the most cost effective way or maybe raise a hand if you would like to be more confident and stay on top of it.

ok. nice cost is not just a number, it's a reflection of our decisions, priorities and ability to optimize value in a complex world.

my name is yuri pri koka. i'm principal technical account manager at a wes and i'm one of the founders and leads of cloud intelligence dashboards framework. and i'm super excited today to share the stage with jr and mike.

hey, i'm jr stormant, uh executive director of the fin ops foundation. uh so we are a cloud neutral, which is always a weird thing to say at an a bs conference, but we, we're across all the clouds. uh organization that focuses on community for those who manage the value of cloud. we do education training uh and standards like the focus specification, which yuri's gonna talk about in a bit here.

and i'm mike graf. i'm a infrastructure architecture director at dolby laboratories. we're based in san francisco.

all right, let's kick off today. we are going to talk about how to optimize and track cost efficiency of your workloads and operational performance. it's not a surprise that global cloud spend is increasing and therefore it's not a surprise that managing that rapidly increasing spend becomes top priority for organizations.

in fact, this year marks the first time in decade when managing cloud spend has overtaken security as a top challenge which organizations face across the board. at the same time, many customers acknowledge that successful implementation and adoption of cloud financial management and fop's discipline in their organizations at scale helped them to drive their business and grow their revenue

cost is like an iceberg. what you see is just the tip. the real challenge lies in understanding underlying drivers and making informed decisions that shape unseen but significant portion beneath the surface.

if you take a look at any of our cloud on aws or in the cloud, any architectural detail like security or reliability or performance is reflected in the cost and vice versa. by deeply understanding every cost detail and cost inside of your clothes, you can find plenty of optimization opportunities across all well architected pillars in the cloud

cost is operational metric, just like performance or availability and the scalable metric with the growth of your business and the scale of your business, your cloud usage naturally grows because you need more infrastructure to serve your customers and what we want to make sure that your business and amount of customers grows faster than aws bill. and therefore you need to make sure that your clothes are not just well performing or tolerant, but they are also cost effective.

but cost effectiveness. it's not about paying less than previous months. cost effectiveness is a is about getting maximum business value of your investment in the cloud. simply comparing uawes invoice with previous months won't work here because cost is also a variable metric. and this we know from operational excellence that any metric is only useful when it's connected and related to business goals and outcomes. and the same with the cost by measuring not the cost of infrastructure itself, but the cost per your business driver or a business units like cost per customer or cost per transaction depending on your business domain. you can relate the cloud usage of your workloads with the business value which these workloads provide in the cloud.

every architectural decision is a purchase decision and there are many ways how you can improve your architectures of your clothes and maximize business value of your cloud workloads.

you can right size your instances and use the most appropriate and cost effective instances by type. like for example, graviton based instances or by size and align the size of the instance with your cpu and memory utilization, you can improve elasticity and use resources only when you need them and scale with the demand. for example, ask yourself a question. do i really need to run this dev or test environment 24 7 or i can implement something like instant scheduling and turn off these workloads when i don't need them, you can also identify and eliminate idle resources or infrastructural waste based on the data we have. we know that about 30% percent of workloads falls into this category. and it's a golden mine of optimization opportunities.

once you optimize the baseline, you can take a look on pricing options with reduced trade like sport or say covering your clothes with savings plans or reserved instances. you can optimize storage or change your architecture to more cloud native patterns like event driven design or uh serverless workloads.

so there are plenty of ways how you can actually get into that fully optimized state and reduce your unit cost even with the growth of your usage. but understanding which cost optimization opportunities you have and how effectively your engineering teams, your technology leaders leveraging all those opportunities. it is challenging to get scale and therefore to achieve that you really need to bring together people, tools and processes.

and that's where fops comes into play as enabler for costa a culture fops is a cloud financial management discipline which helps to maximize business value by bringing together engineering, product technology and business teams to collaborate on data driven spending decisions.

in the simple words, you probably all know about devops and the concept of you build it, you run it or you're responsible for its operations. the same with phenoms, you build it, you run it and you're responsible for cost effectiveness of your workloads. you and you're responsible for keeping cost under control or cost predictability.

ps similar to security, it's everyone's responsibility, not just financial teams and it starts with great visibility and cost awareness which will guide your teams and every stakeholder towards optimization opportunities and then acting on these optimization opportunities and optimizing your workload and incorporating all the best practices which you learned from optimizing workloads into your operational guidelines, architectural blueprints and release readiness playbooks.

so when you optimize workloads which are already deployed on nwes, your future workloads are optimized at review stage or release readiness assessment stage and they come in the cloud already optimized and fops read.

it's very important to understand that fops is not a project or time binded activity. it's a process of continuous improvements and collaboration between all the stakeholders within the organization.

i'd like to invite jr to speak more and tell us more about phenoms and what trends you see in industry across the companies who implemented. ps

thank you. y um yuri mentioned something earlier that i thought was kind of summarize it a bit, which is that every architectural decision is a purchasing decision. and the important thing to think about with fin ops is that it's not really a technology discipline as much as a cultural transformation, particularly with engineering teams within finance teams.

a lot of the time when people start to look at this discipline, they think i'm gonna use fin s to save money. i'm gonna try and cut my bill. cost optimization is a part of that. uh but the business value portion is more important because a lot of times you're using those architectural decisions to decide when to invest and where to invest and sometimes you are investing more, sometimes less.

uh we run an annual survey, the state of fin ops. we're going to the fourth edition of that and during 22 when we were doing data collection, it was during the economic downturn. and uh there was a big jump up in the aspect of reducing waste and uh getting rid of uh unt architecture. but in the last year or so, we've seen a big increase of what yuri was talking about, which is the unit economic aspect. how do we get a better cost per transaction value or customer?

the best story i can share uh that i'm seeing in the community right now and there, there's a ton of great ones, but uh we run an annual conference called fin sex. it's in san diego in june and one of the keynoter at that conference last year, a guy named rich steck uh who works at adobe who runs cloud there. he's here around the conference, but he shared the story of how adobe launched uh firefly, which is there. i hate to say it. it's the buzzword. it's a ia i powered product. i was trying to get through to talk without saying a i um but firefly essentially came to be because they realized in order to build this, they needed more engineering resources and the way they did it to engineer this product was they got leadership support to go and do a big cost take out from their cloud.

they worked to finance to figure out how much it would cost to fund this project to build this entire new product. they were going to launch. they had a central efficiency team that essentially found opportunities that when looking for areas where they could reduce. they embedded cost engineers in the individual product teams. and then those product teams made a bunch of architectural decisions, found a bunch of areas of optimization, took a bunch of cost out. but instead of just trying to increase margins or something like that, they were actually able to use that with the leadership support, with the finance to support, to fund an entirely new engineering team that built and launched this product called adobe firefly and he didn't talk about this because they're a public company. but if you look at their stock after firefly launched, you'll see a big jump up there. and it was a great example of using fin ops in the real world, not just to reduce cost, but to take that cost and invest in innovation. and that very much is what fin s is about the top of this list.

this is a loosely correlated to the core fin ops principles uh version but is organizational alignment. how do we get different people in the organization, finance teams, leadership, engineering teams, procurement teams, we're seeing sourcing teams come a part of this together to have conversations about how we want to change the architecture, how we want to invest in cloud or change resources, how we wanna mix those things up to make better decisions, the cost, visibility and awareness piece cannot be overstated enough at the the core of the masso's hierarchy of fin ops, the top you might want to get to optimization in economics. but underneath all of it is how do we understand what our costs are? this is the hardest challenge has been for 12 years now in fin ops which is understanding where your cost is going through, tagging through labeling through linked accounts or projects or subscriptions or whatever you're using so that you can take that and generate clear kpis and performance tracking for individual teams so they can make better decisions, getting the data out to the engineering teams in the right time and the right way to drive data driven decision making.

so there is a cost and usage optimization component there. absolutely. but ultimately, we want to make sure to reset the conversation to be the cultural shift over time that takes place with this.

now, i would say i i can't repeat it enough like fops is not a one and done exercise. uh this is something that requires ultimately broad organizational support and executive support. and one of the questions we get all the time is how do i get my executives to care about this initiative? uh and i wish i had a better answer on that. uh because ultimately where this usually starts is a champion that's often in an engineering org who starts to care about this area

"But the larger implementation of the cultural disciplines of the processes ultimately comes when the bill gets large enough that the CFO or CIO or CEO depending on the size of the organization starts to care about it and starts to implement the processes. So we've been in this for about four years now. A lot of the work is cultural transformation I mentioned, but it's also educating the teams to understand how many of your FinOps certified. Oh, nice. See when I used to do the question of how many of you knew about FinOps years ago, no one raised their hand. So it's cool to see there's more of those now.

But um you know, under underlying getting that awareness, getting teams certified, getting teams educated comes to the next phase, which ultimately is how do we start to integrate the FinOps process into ongoing engineering areas? One of the trends we've seen in this last year is that FinOps has become a part of engineering orgs and I don't mean reactive cost optimization but actually spending the time to get cost related changes into each sprint doing the shift left, sorry in front of the buzzword into cost conversations at the architectural decision.

So you're looking at that as another performance metric, engineering teams often push back initially around cost because it's seen as another thing to worry about right next to scalability, reliability, deadlines performance all these areas. But if you really look at it and try to reposition cost as another first class efficiency metric, you can kind of introduce it as a new constraint that lets them be more innovative. Like can you deliver this feature in this time within these cost constraints?

So one of the things that's changed, I think in the last four years since we started the FinOps Foundation is initially we started with just a core set of practitioners who are doing this work. And often the FinOps teams were were kind of siloed, which is ironic because one of the core focuses of FinOps is to break down silos between the various teams. But you had these FinOps teams who were chasing the rest of the uh organization to make change.

We have now seen uh these teams become a core part of engineering disciplines, finance disciplines, sourcing, procurement, et cetera. Uh but now we've also seen, which is very excited. Uh some of you may have seen the news uh about a month or two ago. We're seeing the clouds now leading to the practice, I can't tell you how happy it makes me Yuri to be on stage with you. You talking about FinOps because that's something that a WBS historically has not been able to get as deeply into uh a WBS joined the FinOps Foundation uh this last year and I wanna give Yuri a shout out because he's a big supporter of this.

Um the dashboards that you should consider looking at that, Yuri has built a kudos dashboards um are now one of the top uh tools that we see people in the ecosystem use because the critical thing there, as I mentioned is getting that cost, visibility out to teams, but also having the ability to customize that data in a way that makes sense to each of those teams and to put that data into the path of the engineering team so they can make better decisions along the way.

Um I will say uh there is no one right tool for FinOps uh in the state of uh FinOps survey. The last version we saw companies tend to use about 4.1 on average uh tools within their FinOps practice. I'm gonna mess up the numbers, but it's roughly 97% use some form of cognitive tool. 65% use some sort of platform observability, et cetera. Uh but one of the fastest growing segments uh is customizable and homegrown tooling because you really need to start to put together the right tool for the right job.

And we've seen now within the FinOps practice, there's about 18 different capabilities, right? It's not just about cost optimization, there's forecasting, there's organization alignment, there's charge back, there's all these areas. So whichever way you go with that. It's about I think finding the right way to put the data in there.

So with that, uh I think Yuri is gonna talk a bit about some of the ADBS tools that they offer.

Absolutely. Thanks JR and I'm super excited, just cannot agree more on the fact that we join FinOp foundation and it's all about being closer to the community to our customers and help to solve FinOps challenges at scale. And we already started to contribute into FOCUS specification, which is Fin's open cost and usage specification, which aimed to simplify reasoning for everyone about cloud cloud cost and usage.

But let's have a look how FinOps journey looks like on AWS. Please raise your hand if you have dedicated role of FinOps practitioner in your organization. Ok. Ok. Don't worry if you don't, the session is still for you.

Um so if you don't have dedicated, you know, practitioner role, I just wanted to let you know, it could be someone also shared responsibility head. But FinOps practitioner is someone who actually drives this change within the organization, helps with organizational alignment, helps with better visibility and cost awareness and helps engineers to highlight what exactly they could start acting on and build this habit of continuous improvements.

And AWS provides many tools for FinOps practitioners and in FinOps field. And usually FinOps practitioner journey starts with services like Cost Explorer which provides an entry point to cost visibility and details. They also can enable AWS Budgets and set their spend and usage in some boundaries and uh control. Then they can use Saving Plans and Reserve Instances recommendations to understand what are the better purchase options available for them.

Then they can dive deeper into right sizing recommendations and use Compute Optimizer which is free service for everyone. So I really recommend you to have a look at Compute Optimizer. And if they are on Enterprise Business or on RM support levels, they can use Trusted Advisor to get visibility over idle resources.

And then if they want to dive deeper into more like resource level insights, some data mining out of the C user data they can enable Cost and Usage Reports or AWS Data Exports as we call it now after the launch of Cost and Usage Report 2.0.

And the reality is that the cost in the cloud is also decentralized metric. And there are many other stakeholders who have their goals and requirements around cost and FinOps.

Finance and procurement teams require better visibility and more accurate budgeting and forecasting. They require consolidated reporting on top of your old business entities and including many AWS Organizations which company might own.

Engineering needs require better visibility over spend per application or spend per per environment. Product owners want to know things like spend per feature or maybe spent per customer or spend per per transaction to build successful business case and executives looking after spend for revenue, organizational margins and overall growth of the business.

So another interesting fact here is that to calculate your unit cost, you need bring together and join your business metrics like amount of customers or revenue with AWS data or you to allocate the cost, you need to bring together organizational taxonomy like cost centers, business units, departments with with AWS data to successfully allocate the cost.

Another challenge here is that not every stakeholder actually has even access to AWS console to use the services and some of the stakeholders require access only to cost and usage data from group of accounts and not all of them.

So I think we clearly see the pattern here that there is a big requirement and big need for collaborative environment which could be adjusted to the need of every stakeholder where they will all work together on data driven decision making.

And to help with that, we've created Cloud Intelligence Dashboards framework. It's a collection of open source and customizable dashboards in Amazon QuickSight which every customer can deploy in their accounts by following our Well Architected labs and guides and CloudFormation templates and then use as a self service tool.

You can scan this QR code to bookmark the entry point to Cloud Intelligence Dashboards. But I just want to highlight key features that when you deploy them, the data and all the resources stay within your organizations, stay within your AWS accounts. You do not need to share data with any third parties and it's very secure.

The dashboards provide you the comprehensive actionable insights and in depth resource level details for your cost and usage data. And we do not charge you for the use of these dashboards themselves. You pay only for the use of underlying services, which is really a fraction compared to third party tools.

Let's take a closer look of on Cloud Intelligence Dashboards. Once you provision and deploy the dashboards in your accounts, you actually turn all the services which we talked about into a data sources which starts delivering data to S3 bucket. And you also get table definitions and views in Amazon Athena. So you turn this all into optimization data lake.

And you also get out of the box a collection of the dashboards to visualize the data lake and to bring your decision makers closer to the data and to minimize the amount of time which they need to investigate the useful information out of the data and leverage this time to act on optimization.

You can share the dashboards with any stakeholder in the organization by leveraging native QuickSight functionality. And they don't need even to have access to AWS console or you can use row level security feature in QuickSight to provide access to particular stakeholders only to accounts or data which they need to see or which they own.

You can deploy foundational dashboards which build on top of cost and usage report and this dashboard structures, the way to help you get quickly into the most important parts of your spending usage such as cost efficiency, KPIs or in depth insights and details with FinOps dashboard.

And also you can deploy advanced dashboards which provide you visibility over recommendations from our services like Compute Optimizer, Cross Anomaly Detection Service or AWS Trusted Advisor. And you can consolidate all the solar recommendations in a single place across your multiple payer accounts.

And to get into resource level granularity or resource level recommendations, you do not need to actually switch between accounts or regions, everything in one single place.

And we have thousands of customers who are already using Cloud Intelligence Dashboards and this concept and approach of help bringing ready to be used visualization in the dashboards actually works very well and we started to go beyond just cost on AWS.

I'm excited to announce that recently we released Cloud Intelligence Dashboard for Azure. So you can bring your cost and usage data from other cloud provider and visualize in Amazon QuickSight together with AWS spend and usage.

We also release Sustainability proxy metrics dashboard which allows you to visualize the proxy metrics which we talk about and recommend to track in Sustainability white paper.

Alright. Who wants to see it in action? Ok.

So now we are in KPI dashboard and as FinOps practitioner or for example, technology leader, I want to see what optimization opportunity is available for me and for my organization and what can I track so I can go to this KPI goals tab and I can see a collection of foundational KPIs which we learn from our customers which they usually track and we recommend customers to start with those and those KPIs actually give you some hints what you can track in your organizations. What you can optimize.

For example, for EBS, you can migrate to GP3 volume uh which are the most cost effective volume types. Or for RDS, you can migrate over close to Gravitons potentially. Or if you talk about optimization of idle or not used resources, you can maybe reduce the amount of EBS snapshots which are older than one year.

And for every KPI, you can set your own goals depending on your situation. And then if you go to KPI tracker tab, you can track where you are in heading towards these goals over previous previous couple of months. So you can clearly see now the direction where, where you are heading to and what I like the most in KPI dashboards that we also provide potential savings opportunities, which you can get if you achieve the goal which you set.

So it allows you to prioritize the actions towards optimization and focus on things either the most impactful or the ones that have for example, lowest effort but still provide some decent optimization savings opportunities.

So now I can share this dashboard with my engineering teams who own the resources, who actually need to make the optimization actions, right? I share this dashboard in the QuickSight and they can open and see exactly the same view. But now they can go to EBS tab for example and get additional context for optimization.

They could see the progress of migration to GP3, for example, um over time and they can see how their optimization actions and this progress impacts the infrastructure unit cost or cost per gigabyte of data. So you can see that migration to GP3 allowed us in this case to reduce average cost per gigabyte for EBS."

And then there is a view of all the volumes with resource details with current GP2 costs and potential savings with GP3. So now engineers can prioritize resources and know where to start from to deliver impact very quickly.

What about right sizing recommendations in Compute Optimizer dashboard? You have consolidated view on your right sizing recommendations in one single place which you can track over time. So you could see your savings opportunities for right sizing of two of Lambdas of EBS volumes or out of scaling groups.

And because we consolidate data in this industry, you get this historical context here. So you could see how much savings opportunities you have over time and you compare your current progress with your progress previous months, which is very powerful to understand where you're heading to.

And again, if you're owner of resources or you're someone from engineering team, you can use respective service tabs here and work interactively with this dashboard and select, for example, over provisioned instances and select accounts which you want to focus on and the rest of the visuals interactively filter based on your selection, you have a list of instances which you can optimize order by savings opportunities.

And now when you select any of the two instances, you get right sizing options and recommendations with estimated savings opportunities, current and projected memory and CPU utilization and also his context of the status of this instance, whether it was over provision just recently or it was it's been a while and also CloudWatch metrics over previous time provided by Compute Optimizer service. So it's all about giving you additional context for this data driven decision making.

You can also use Trusted Advisor dashboard which provides the recommendations around idle resources in Cost Optimization tab. You can find here insights and resource level details for your idle RDS instances or idle load balancers or under utilized Redshift clusters.

And again, you have these trends over time and resource level details. So you know every cluster and you see it like how many days it was without any connection. And when it was first flagged and the flagged with estimated savings per cluster in Trusted Advisor dashboard, you can also see different types like security for tolerance performance. So you can actually share the dashboard with more stakeholders so they could leverage those recommendations as well.

You might ask a question, what about all other services and how to stay on top of that? And I want to show you Kudos dashboard, which is one of the most popular dashboards and it has a similar concept. It starts with three executive tabs which provide you a high level overview.

For example, B summary might be popular for financial teams because it provides details about spend and usage like invoice, spend, amortise spend and all the savings and discounts which you get in one single place. Additionally, you can consolidate data from your multiple payer accounts or multiple organizations. So you have, you see here, consolidated reporting over all organizations which you own, for example, in reserved instances and saving plan summary, you can find insights of how you're leveraging savings opportunities, how much you're saving on them. And what if you have any unused cost and also some visuals which help you to make a charge back on your organizational savings plans and reserved instances per linked accounts.

If you want to dive deeper into the most moving parts from month, over month, in your accounts, you can go to interactive month over month trend stuff and you see the insights and details of your differences of top movers and bottom movers per service per account. You can dive deeper into the table view of month, over month trends and you can sort it by percentage difference over months or by absolute values.

Then you can focus on any service which you want and interactively select the service. And now I can, in a matter of one I see in which accounts, for example, when I select RDS in which account RDS increase the cost. Now I can select account to zoom in in that particular account for years and they see other other dimensions here like spent per region per operation.

But what I can see clearly here is that there is an increase of system operation product family. What is that? I don't know. Let's have a look, I can select now system operation and scroll to the bottom. And here I see a visual which shows me daily spent per resource ID. So I managed to track down the resource which generates the most of the cost and it's some DS cluster and I can select it and I can see the usage details of this. So I could see that it was 30 30 billion of IOPS which actually generated me 6.5K of spend in a matter of a few clicks. I managed to get from high level insights and trends towards resource level details and track down my top cost generators.

When you start with us using Kudos dashboard, you can use use only these three first t to get into constant usage details on high level um high level. But if you want to get into more recommendations for each of the services, you can use other tabs here like compute storage S3 and it would be pretty hard to cover all of them. And Mike, Mike will talk about data transfer, but I just want to show you an example of IML tab.

You probably all heard about generative I I this week, right? And um in IEL tab, we provide details for services like SageMaker where you can see a different dimensions and breakdown of your SageMaker spend and usage, whether you're leveraging sport for training, training your models and again with up to resource level details and granularity.

But a couple of weeks ago, we added a section here for like which called Amazon Bedrock Summary, which allows you to see consolidated view for your Bedrock spend and usage, regardless whether you're purchasing models from third parties on Marketplace or you use Foundational Models from Amazon. And you can see breakdown by pricing model, whether you're leveraging provision throughput or using on demand inference. And again, with operational insights up to resource level, we spend data spent per model where you can select a particular model and focus on its usage insights. So you could see again in a matter of a few clicks, you can stay on top of your spending usage on daily basis. And when you start adopting these new services, you can with confidence, monitor and track the cost usage.

All this available out of the box once you deploy Cloud Intelligence Dashboards, but I mentioned that you can customize the dashboards. So let me show you one thing of what you can customize it, what you can build on top as product owner. I deployed a Kudos dashboard and then I customized it. I will show you just the art of possible. I created here an additional tab business unit cost. I brought to QuickSight my daily amount of transactions. So I'm in fintech business, for example, and I processing transactions which bring me revenue and now by joining your daily transactions with data spent, you can see on the same visual, both cost and amount of transactions.

So I could see here that my business grows at some moment but my cost stays under control maybe because of my optimization efforts or maybe because of someone else, something else. But what does it mean for our business? Now, I can also calculate the cost of infrastructure per transaction. So I could see that my optimization efforts actually helped me to reduce my cost per transaction below the average. I can also build a forecast and and expand these use cases further. So that's what you could do with Cloud Intelligence Dashboards and adjust to the dashboards to the business needs of any stakeholder in your organization. This is very powerful.

I'd like to invite Mike to talk about FOP's adoption in Dolby and how Cloud Intelligence Dashboards helps to improve operations and improve architecture of workloads.

Thanks Yuri. So as I mentioned, I'm Mike Graf. I'm a leading literature architect at Dolby Laboratories for those that might not be familiar with Dolby. We're an entertainment technology company founded almost 60 years ago by pioneering inventor Ray Dolby.

Now when Ray founded Dolby back in 1965 movies and television featured just one channel of sound uh and record record producers had just a handful of audio tracks. Much of what has happened since then to improve the sound, the sound of entertainment can be traced directly back to Ray, not just his technical innovations, but also his the impact he had on artists ever since our founding, our mission has been to enhance, sorry to revolutionize the science of sight and sound. Through our innovative research and engineering, we empower creatives to elevate their stories. And we give you fans unforgettable experiences.

Our history of innovation began in the 19 sixties. For those of you who are old enough to remember what a cassette player is. You may remember the little Dolby button on there, you press that and you, it just made the tape sound amazing for those of you who aren't old enough to know, ask your neighbor, I'll tell you about it. And then in the eighties, we brought out Dolby Surround, which really enhanced the movie experience. And we continue to set the standard for high-quality entertainment today with our groundbreaking technologies of Dolby Atmos and Dolby Vision.

So Dolby has been using the Cost Intelligence Dashboards for over three years now. And we focused our initial deployment on the Kudos dashboard that Urus is talking about. We felt that it gave us the broadest complement of kind of those top-level spend visualizations that we we liked as well as the ability to deep dive into the specific service areas and resources like Rita demonstrated. And most importantly, it gave them the our users the ability to kind of self service that data, right? They could go and get the information they needed themselves. Look at just the stuff they're interested in and filter out the rest.

And lastly Kudos uh with all the rich visualizations that were available, gave us the ability to kind of customize those dashboards. And at this point, we've actually gone so far as to deploy multiple customized versions of Kudos within our environment targeted at different service teams where they can basically have the tabs that they're interested in and remove the rest.

How do we provide access to Kudos and the other dashboards? Um well, we already were AWS Identity Center users providing SSO access to AWS console and the QuickSight integration with Identity Center made it really easy. Um we basically just add users to an Active Directory group and then they have the QuickSight button on their AWS SSO console for access. And it allowed me to basically give people access to QuickSight without having to give them console access to the payer account, which is something I would like to avoid.

Uh in addition, Yuri talked about row level security so we basically can use row level security to control what data users have access to. We take our AWS accounts, we map them to QuickSight groups and QuickSight and then we assign users to those QuickSight groups. So when they log in, they only see uh cost cost data for the accounts they have access to.

Now, these capabilities have basically allowed us to expand this offering out to dozens of distinct cloud teams within Dolby. We have almost 100 finance and engineering personas that are leveraging these dashboards today.

How do we market them? So we include the dashboards in our new cloud user orientation. So when somebody is given a cloud account, uh we, we train them on what these dashboards are and how they could leverage them. Uh in addition, we roll out new dashboards or new features on existing dashboards via an internal cloud development community, which is an email newsletter that I generate as well as a Teams channel where we kind of populate this information.

And then lastly, I went through and through our Cloud Center of Excellence, we created a series of self paced training videos that walk users through the process of how to access the dashboards, how to use them, how to tweak them to get the most benefit out of those dashboards.

So I've talked about how we deployed it and how we use it.

Let me give you some examples of how we've been able to get benefit from putting these tools in the hands of our engineering teams.

So the first thing I want to talk about is internet data transfer. This is often a source of great consternation among account stakeholders, but it's also one of the areas that's kind of most opaque around AWS costs, right?

So using the data transfer visualizations within the Cost Intelligence dashboards, it allows users to see data visual, see data transfer costs by transfer type as well as drill into specific accountable details on what their data transfer spend is. And a lot of times it can help you to find architectural improvements in your environment.

And I'm going to tell you a story about one of those right now. So we had a team that's using Databricks on EC2. And as you can imagine, there's a lot of data transfer going on sending data to S3. Now, you may not be aware of this. But when you send data to S3 from EC2 by default, that actually goes out to the public internet and hits the public S3 endpoint.

So if you're doing small amounts of data transfer, you may not even notice that. But if you're doing large amounts like this team was it resulted in a $7,000 a month gateway bill.

So how do we solve that? We deployed a constructed a US office called the Amazon S3 Gateway endpoint. This basically puts a network endpoint inside your VPC and all traffic to and from S3 gets routed to that endpoint instead of out to the public internet.

So it stays inside the VPC. It's more secure because you're not going over the public internet, it's usually faster and it's a heck of a lot cheaper, right?

So in our case, we made a five minute configuration change and saved the company $84,000 a year.

Next, let's talk about inter AZ traffic. Now you're not usually as much of a cost impact as internet traffic, but it's still something that needs to be looked at.

Now, in some architectures inter AZ traffic is expected and normal, you might have a Kubernetes cluster deployed across multiple availability zones, and you're gonna see that you've got stuff happening there that's, that's normal, but it also can be that you have some inter AZ traffic that's a sign of a bad architecture or maybe even a configuration mistake.

So in our case, we had a team that was deploying the NAT gateway, right? And they're being good AWS customers and deploying for high availability and they're deploying their NAT gateways, two of them, right? One for each availability zone, they have route tables, one for each availability zone pointed to each NAT gateway.

But oops, somebody who deployed the second NAT gateway in the same AZ as the first one. So you've got here basically a route table that's sending all the traffic from AZ2 over to the NAT gateway that's in AZ1.

You know, so they, they had good intentions. The result was kind of not what was intended.

So how do you fix that? Delete second NAT gateway in AZ1, deploy a new NAT gateway in AZ2 and then update your route tables again. A 10 minute fix but could result in significant cost savings.

Lastly, I want to talk about how you can use the dashboards to drive best practices. So Yuri talked about the KPI dashboard and GP3 volumes.

So GP3 was announced in 2020. It reinvents and basically you get the same performance as GP2, but typically 20% lower costs for the GP3 volumes and it's easy to change. Right. It's, it's a settings change, you can do it live and it updates in the background.

But we still found that we had low adoption among certain teams that they hadn't actually gone through and done that.

So we decided to leverage the KPI dashboard to kind of drive that change. As Yuri talked about, we set a target of we want to get everybody to at least 80% of the volumes on GP3.

And then we were able to use the visualizations in the KPI dashboard to show people exactly how much money they'd be saving if they were to make that change and how much they would save on an individual volume basis.

And then we kind of used it to drive kind of a gamification effort, right? Competition between cloud teams, you can see here on the bottom screenshot team A they, they're doing great. They got 98% of their volumes on GP3. Team B, they've got a little bit of work to do still, we're reaching out to them after the conference.

So that's kind of a great way to leverage the KPI dashboard to make it put the decision in the hands of the engineers, give them the tool and let them kind of drive their own improvement.

So what are we doing next with dashboards at Dolby? So we have been using the Compute Optimizer dashboard within AWS to kind of drive that right sizing conversation for our own workloads.

And we're planning to basically push that out to the remaining teams at Dolby using that same kind of marketing effort that I talked about around. Teaching people, making the videos, you know, publicizing it to get people to use that for themselves.

When we feel like putting that dashboard directly in the hands of the engineers will make it a lot easier for them to see specific areas where they can improve their compute workload, they can get specific instance right size recommendations and see exactly how much money they'd be saving if they implement those changes.

And then in addition, Dolby is pretty focused on sustainability, like most companies are. And with more and more of our footprint running in the cloud, we're keen to understand what the carbon footprint of that workload is.

Now the current carbon footprint tool that's in the native AWS console. It has somewhat limited capabilities. I've given lots of feedback to the product team on that. But so I'm eager to put the Sustainability Dashboard to the test and see how it can help us to have those sustainability conversations.

And again, put it in the hands of the engineers. I've talked to several people at the FinOps meetup the other night and I think that sustainability is something that engineers actually sometimes care more about than cost.

Right? Because cost at the end of the day, it's the company's money, but we all have kids, we all have families, we all want to have a future. Sustainability is I think going to drive a lot of those future FinOps conversations.

So that's my story. I hope you found it useful and has something you can take back to your own environment and I'm gonna hand it back over to Yuri.

Thank you, Mike. With Cloud Intelligence Dashboards, you can build this collaborative environment for data driven decisions and share it with any stakeholder in the organization and they can customize and adjust to their business needs.

And as you could see in a couple of examples which Mike provided, cost visibility allows to identify some optimization opportunities across high availability, across performance and many others.

So with Cloud Intelligence Dashboards, you can prepare your organization for the next step to make to get into cost aware culture.

So I'd like to take a moment and discuss what would be the next steps to get an organization into cost aware culture.

So Mike, what would you say?

Yeah, I mean, I think what I just talked about - putting these dashboards directly in the hands of the engineers and so it becomes kind of something that helps them understand how their design decisions affect cost, right? So it's not just a retroactive thing, it's more proactive and being able to see immediately what how their their decisions are impacting things and just kind of making that FinOps review, like you talked about part of the application life cycle, right? It's, it's a, it's a part of a holistic process and not just something that's happening after the fact.

What about you, Yuri? What do you think?

I actually want to pick up on the sustainability theme a bit because that, that is something that we're seeing as a trend this year really hit FinOps, which is bringing the carbon data in particular into the consideration.

The bullet point on the top, right, whichever way you're looking, you know, talks about engineering and finance, collaborating to achieve business outcomes. That's another great example. It's not just about cost savings. A lot of times you may wanna measure and weigh your, your carbon outputs next to your cost next to your time to delivery.

I was talking at the, the Foundation meet up LA. It was just last night, by the way, it wasn't days ago? Time flies. I agree. What day is it? I don't know, I can't go home soon enough.

But I was talking to, Natalie Daly from HSBC and, and she was talking about how they've integrated carbon into their practice and they're actually looking at when they deploy, they're trying to give engineers a sense of this is gonna be the cost impact and this is also going to be the carbon impact because yeah, in a lot of cases, the motivation is uplevel for a lot of engineering teams.

The other aspect of that that gets really interesting is, you know, often you'll have in some teams really advanced optimization practices that you may not even be applying to other teams, even though the skill set and muscles there in the organization because you're really trying to get out a new product feature.

You know, with that collaboration aspect as well. Yuri talked a lot about ways to get data into the hands of engineers and into the hands of finance and how things like focus are like trying to align those, those common lexicons.

One of the best things you can do is provide that data in a format and get everybody educated about how they talk about it so that they can make decisions on their own.

Co-author of the Cloud Fins book Mike Fuller said he was running Fins and he was saying that, you know, he was most, he felt like he was most successful when he could put the right data in front of finance engineering teams and he could not be in the room and they could have a conversation on their own about what was happening and go forward from there.

There is often a central FinOps team, but it's an enablement aspect.

The leadership point I think is really critical. As I mentioned, you can't necessarily engineer leadership support, you sort of have to wait for the right factors, I think in the organization and the economy to hit for that to happen. But you can set up the organization for success, building the muscle with the right reporting with the right tactics.

This talks about successes and wins a theme. I picked up on your talk a bit was how much, you know, you're talking about gamifying between engineers or you're talking about putting out the data to all the right people.

There is a big aspect here of internal marketing. You've got to not only like evangelize and support the building of the practice and the best practices, but you've got to like get people excited about, look at this team, they delivered cost savings or they were able to be more efficient or they improved carbon, not just the normal we shipped a new feature or you know, this team reduced latency, those are important, critical but you've got to celebrate cost wins.

A big part of getting that engineering leadership support I think is getting those, you know, CTOs, CIOs to say, yeah, this is an important part of your job now and we're gonna pull it into your organization reviews and make sure the dashboards and metrics reflect costs, not just everything else.

Yeah, absolutely. And from my side, I want to add to just reflect to something I started from - treat FinOps metrics as just another operational metric which your engineers review, which your engineers track and help them actually to provide the tools and right visibility to be successful in looking for FinOps opportunities and put them, put these optimization opportunities with the resources with the resources which they can optimize just in front of them.

So like minimize the amount of effort for them for reviewing this all but gives them more time with that to make actions towards optimization.

What's next? I'd like to invite you all to deploy Cloud Intelligence Dashboards, try them out in your environment, get into cost aware culture. And if you want to build something on top, you can learn how to customize the dashboards, how to build business unit cost views and visuals and maybe show, show us what you built and would be happy to hear your feedback and hear from you how you managed to optimize your expanding usage and get into FinOps framework with Cloud Intelligence Dashboards.

I'd like to thank you and we are happy to take questions here near the stage.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Optimize cost and performance and track progress toward mitigation

Ok.Right?
复制链接

扫一扫