Innovate faster with generative AI

最新推荐文章于 2024-06-07 18:01:10 发布

李白的朋友高适

最新推荐文章于 2024-06-07 18:01:10 发布

阅读量530

点赞数 17

文章标签： aws

本文链接：https://blog.csdn.net/weixin_40272094/article/details/134717416

版权

Kimberly: Hi, how are you doing?

Well, how about you?

Good, thank you. Can you believe it's Q4 season all over again? I cannot believe time really flies, doesn't it?

Really? Um, so how are we doing on our slides? If you remember, you made a really big bet last year.

I did. Here's your money.

Really? I don't mean bets. Are you sure? I won the bet.

I'm sure. Wow. So this is the one where we can use generative AI to make my slides, like you can just type in some text and out comes the slides.

Yeah, that's exactly right. I'd like to show you what I did using generative AI in Amazon Bedrock to work on your deck. I think it's pretty cool.

Nice. So, first I go into Amazon Bedrock and I just click, get started. I'm gonna go into the text playground to generate some talking points for you. In this case, I'm going to select the Amazon TITAN model and TITAN Express. If you remember, we've been working hard on building some messaging. I'm going to use that as context for the TITAN model from the marketing messages to the slides.

That's amazing.

Yeah. That's right. So I put all this context in, which is the messaging we've been working on for Q4, and I'm gonna ask TITAN to come up with five key themes that you should highlight in the slides.

It does great. This is as good as what you and I would have done.

I agree these talking points are right on. I think we can use them for your talk. Your slide deck is ready to go. I think we can take it to Vegas.

Please welcome Vice President AI/ML AWS Dr. Matt Wood.

Good afternoon, everyone. Welcome. And thank you for being here. A year back using generative AI to create slides for my presentation might have seemed fanciful, but here we are. Not only can it generate slides but a whole lot more. And in today's talk, I'll talk about what it takes to build and scale generative AI for the enterprise because when you are building for the enterprise, it's important to pay attention to some key considerations.

This slide generated by Amazon Bedrock lists those key considerations. And in my talk, I'll discuss why these considerations are important and how AWS helps you address these considerations.

We'll also have customer speakers come and talk to these key considerations. So for example, we will have Ryanair, one of the largest airlines in the world, talk about choice and flexibility of models. We'll have Fidelity, one of the largest financial companies in the world, talk about differentiating with your data. We'll have Glean, one of the most popular AI driven enterprise search assistants, talk about responsible AI in the enterprise.

We'll also have Hugging Face, one of the most popular open source foundation models, talk about machine learning infrastructure. And finally, we'll have Netsmart, one of the largest healthcare community providers, talk about using generative AI applications.

But first, let me take a step back and discuss why generative AI is so transformational. Over the last five years, the pace of innovation in compute, in data, and in machine learning has accelerated and this acceleration was driven by the cloud. The cloud made massive amounts of compute and massive amounts of data available to everyone. And as a result, practitioners in industry and academia were able to innovate rapidly in machine learning. In fact, almost every frontier machine learning model was born in the cloud.

Now let me give you a little bit more context on the space of innovation that has been driven by the cloud over the last six years. The amount of compute that we use for machine learning has grown by more than 100,000 times. The amount of data that we use for training machine learning models has grown by more than 100 times and the size of models has grown by more than 1000 times. This is a pace of innovation that we have never before seen in the history of information technology. And this pace of innovation has allowed us to create models that are trained on internet scale data, the so called foundation models.

Let me give you a little bit of a feel on what it takes to build one of these foundation models. A human being, you and me, in the course of a lifetime, in the course of an entire lifetime, a human being listens to about a billion to 2 billion words. Now, when we train these foundation models, we are training these models with trillions of words. In other words, we train these models with thousands of times more information than a human being will listen to in their entire lifetime.

There's another way to look at this. When we train these foundation models, we train them with terabytes of data that is thousands of times more than the information contained in Wikipedia. And so when you pack so much information into these models, they start having very interesting properties.

But the question that most customers care about is how do I build applications out of these models? How do we put these models to work? And so I'm going to talk about how we at AWS have been building generative AI applications because I think the lessons we have learned and the considerations that we have to keep in mind will also apply when you want to build and scale your own generative AI applications.

Earlier this week, we launched Amazon Q. Amazon Q is a generative AI application that uses generative AI to transform how employees access the company's data. So you can use Q to ask questions about the company's data. You can use Q to create content on your company's data. You can also use Q to act on your behalf on your company's data. And so I'm going to use Q to illustrate the considerations that we had to keep in mind and how we went about building an enterprise scale generative AI application.

So the first question that we have to ask ourselves and I suspect you will have to ask yourselves is where do I get started? How do I choose a foundation model to build an application with? And this is not an easy question to answer because every model has its own strengths and weaknesses. And so it's important to do a lot of experimentation to figure out which model to use.

Let me illustrate with some examples. Suppose I asked this question to two different foundation models and these questions and answers are actually real questions and real answers that we asked many different foundation models. So I asked, what is the Amazon return policy?

Model 1 gives me a quick concise answer: Free returns within 30 days.

Model 2 gives me a longer, more complete answer. Both of them are accurate answers. Which model do you want to start with? That actually depends on the application you have in mind. For example, if you want to build an application to generate ad copies, you want Model 1 because you want brief concise statements. On the other hand, if you are looking to build a customer service chatbot where you actually want to have a verbose interaction with the customer, then you want Model 2.

Let me give you another example. Suppose I ask this question, what is your check back policy?

Model 1 gives me a quick accurate answer. Model 2 also gives me a correct answer. It's more complete, but it takes longer to generate and it takes longer to generate because it has to do a lot more compute. And because Model 2 has to do a lot more compute, it's an order of magnitude more expensive. And so now you have to ask the question, do I really want to pay an order of magnitude more for Model 2? Or am I better off paying a fraction of the cost and using Model 1 because it gives the answers that customers care about.

And so it's very important as you think about building and scaling generative AI that you run a whole set of tests and you work back from your use case. And so let me show you the results of running a whole set of tests for Amazon Q on many different models.

But first, let me talk about some of the parameters that we use for evaluating these models. So there's cost effectiveness - how expensive is it to use this model? There's completeness - are there no hallucinations? So when a model has low hallucinations, it's a lot more accurate. Then there's this conciseness that I talked about. And for me, there's latency - how quickly do I get an answer back?

Now, when we did our actual evaluations, we used a lot more parameters, but I put up these five because they get the point across.

And so now let's look at the results from the first two models. And what we will notice here is that Model 1 is not as cost effective, but Model 1 is a lot more complete. And now it's not clear which model to use. And so we said, you know what, let's go and try a few other models.

So we took another model and ran the whole set of tests and the results were again the same, the models are good in some dimensions, they're not so good in other dimensions. And we ran our tests against many different models and the results were always the same - models have strengths and models have weaknesses.

And I can bet you that as you build generative AI applications for the enterprise, you too will likely have to go through a similar process where you'll end up with some models that are good for some things and others that are good for other things.

So where did we end up? Here is where we ended up. We picked a model that's good on the cost axis and said let's go and optimize it on the other dimensions. And it's very likely that as you build applications, you too will probably have to make a similar choice where you pick something that's good on some dimensions and then go in and optimize it on other dimensions.

So what were the optimizations that we had to do when we started building Q? We thought we would use a single large model. We thought we would take the largest model and run with it. Turns out that's not where we ended up. We actually ended up using many different models, each one of them somewhat specialized to the task. Let me explain why this was the case.

When a user sends a query to Q, Q has to do a bunch of things. It has to first understand the intent of the query - what is the user trying to get done? It then needs to retrieve the right data for the query. It then needs to formulate the answer for the query. It then needs to do a bunch of other things. And so it turns out that using a single model wasn't the optimal experience - using multiple different heterogeneous models ended up giving a better experience.

Now, we thought this was counterintuitive and you may also think this was counterintuitive until we realized there's really a very interesting analogy to how the human brain works. It turns out that the human brain is not one homogeneous thing, it actually has multiple different heterogeneous parts that are each specialized to different tasks.

So for example, the frontal cortex that deals with reasoning and logical thinking is constructed differently than the limbic system that deals with fast spontaneous responses. Even the neural structures are different. And so it's probably not surprising that when we considered all of the tasks that Amazon Q has to do, we ended up with a heterogeneous model architecture.

What were some of the other optimizations that we had to do for QE once we take care of the models? We actually had to spend a lot of time on the data engineering. Let me explain why.

Suppose I ask this question to you: "Tell me about my customer meeting tomorrow at 10am." Notice that Q now has to access multiple data sources. It needs to first go and look at my calendar to figure out what meeting I have. It then needs to look at my CRM system, my customer relationship management system to figure out details about the customer. It then needs to look at other company documents to understand how we are interacting with the customer. And so what Q has to do is it has to aggregate data from multiple data sources to be able to give me a helpful answer.

And so we spent a lot of time on building enterprise data connectors, on data processing, data preprocessing, data post processing, data quality checks to ensure that Q had the right data quickly and efficiently.

Now, once we got done with machine learning model design and once we got done with the data engineering, we thought we were done. Turns out that was not the case. Let me explain why.

Suppose I ask this question to Q: "What is the expected revenue of products this quarter?" This is company confidential information. What this means is that some people should have access to this answer but not everyone. And so in this case, if the software engineer is asking this question, you should say "Sorry, I can't give you the answer." But if the CEO is asking this question, you should be able to give some answer.

In other words, Q or any enterprise application needs to respect the access control policies on the data. It should only give answers that a user is entitled to have. And so we have to spend a lot of time on building access management blocks, sensitive topics in general, on building responsible AI capabilities.

Now to build all of these, we also needed a performant and low cost machine learning infrastructure. And this leads me to the key considerations for accelerating your generative AI journey:

First, you want to have choice and flexibility of models.

Second, you want to be able to use and differentiate with your data.

Third, you want to integrate responsible AI into your applications.

Next, you want to have access to a low cost and high performance machine learning infrastructure.

And finally, in many cases, you want to get started with a generative AI application.

Let me now dig deeper into each one of these starting with choice and flexibility of models. In fact, this is why we launched Amazon Bedrock. Amazon Bedrock is the easiest way to build scalable applications using foundation models. It gives you a range of state of the art foundation models that you can use as-is or you can customize them with your data. You can also use Bedrock agents that can act on your behalf.

And so to talk more about how customers are innovating with Amazon Bedrock, please welcome John Hurley, the Chief Technology Officer at Ryanair.

JOHN: Hello everybody. My name is John Hurley. I'm the CTO of Ryanair, who is Europe's favorite and largest airline, who will fly 185 million passengers this year and that will grow up to 300 million over the next coming years as the new aircraft orders come in.

Two key stats I love about Ryanair is we fly 3,300 flights per day and carry 600,000 passengers all on 737 aircraft - a very efficient operation, high value, high energy - and the IT department which I work in has to go at the same speed as the business.

COVID came - we actually had a chance to breathe - and a couple of people was so positive we took up the bucket list and tackle projects we've been trying to tackle for years. For example, we use SageMaker for the LCC pricing - we have now empty priced every single fare in every single city pair product on our website - that's over a million different price points getting calculated contained 24/7. We use SageMaker for aircraft maintenance. It's been interesting, some early positive props have gone well with lots in that space. It is very interesting, it will help us our operation efficiency and we look forward to that going further forward.

We use SageMaker for packing our fresh food. It wasn't all about SageMaker I'm afraid, but we have a project there as well. For example, we had to, we get rid of the paper scope during COVID - we had 30 odd different European governments who had different regulations being thrown at us all about safety and procedures and information being shared. And we were constantly doing that. If we didn't have the likes of Landers and AWS, we fully in the cloud, we'd have been snookered in that world.

For example, government of Italy gave us three days to build a COVID wallet and we did and we only did it because of the power technology of Landers they can do it in that space. And while in that case to onboard refunding, refunding, refunding and processing refunds to over 20 million passengers. So it was a very busy time. We did a lot. We circled back on that project with the fresh food.

We call it the PIN predictor was its catchy title. The idea was, was how to give packing plans to the business so it could actually pack the right fresh food on every single flight. We did it and it's a very interesting example here where the theory was wonderful - the paper was brilliant - with data scientists they were over the moon. We had one of these packing plans and it was a car crash. It was absolutely impossible to have 550 different packing plans across 93 bases at 4 o'clock in the morning.

So we were struck. We spoke to Amazon, sorry, AWS partners did receive contact with Amazon who actually gave us a tour of their FCs to show us what it looks like. It was brilliant. I loved it. The robots were absolutely everything you would have been, which is my favorite part. On the way back, I was talking to our in-flight head of flight and I asked her what was her favorite robot? And she goes "Robots, did you not see the Amazon A to Z app? I want an A to Z app." And I was like you should not see the robots, but um...got back to Dublin IRA contacts and they put us in contact with the Amazon team again to go through the Amazon's A to Z app and with this broken backward session, got it going.

Six months later, we actually released Ryanair, sorry with a very catchy title, the Ryan's employee app. This is for our cabin crew pilots across the network gives you your roster, it gives your schedules, it will give you to book time off based transfer systems. Every need in one location has gone very well and very positive, but it didn't fix all our problems.

We had cabin crew, we concerns over, you know, training how to sell products, grooming guidelines, RA documentation, documentation that was spread right across our network and in different places, it was in YouTube, it was in PowerPoint, it was, it was everywhere.

We worked in AWS, we used Bedrock and we actually built an employee bot so we could ask questions like um...from selling a coffee, how you sell a bar of chocolate? Go at that or can I have a tattoo on my forehead? You can't put it away in case you can go and check it. But it allowed people to ask these questions. We have to search through documentation that was on your phone in your pocket while you were traveling to see the information was touched at hand.

It has gone very well. We hope to actually roll out the Bedrock parts of the business early next year once it's been finished, internal testing with our senior cabin cabin crew staff areas of we're using Bedrock. Well, we have a great plan for, for employees, but after we announce to queue, we might actually have a refresh and see the right de do in place we're using for code whisper. It's been interesting as a way to go. Very excited about that projects.

The one excites me most is definitely going to be customer experience. We get about 10-15% for our daily calls. People ring in with random questions that aren't actually up their actual site or can it be more blaze on the plane? Unusual questions like that. We have agents answering the phone on queue time. All these things be done through AI and that's where we see huge excitement and a huge area of improvement.

I thank Bratton I saw in his presentation we started the project but checked in bag. So I'll be back to him with 5.5 1000 other questions and model recommendations to make that go forward and make that go faster.

And with that, I'll hand you back to Bratton.

BRATTON: Thank you to be partnering with you on your generative AI journey. Let me now get to the next key consideration and that is using and differentiating with your data. In fact, every machine learning system we have built and this predates generative AI, every machine learning system we have built uses data as a critical ingredient. And so it's really important for customers to be able to build a robust data platform to drive their machine learning.

To that end, AWS provides you with the most comprehensive services to store, query, and analyze your data and then to act on it with business intelligence, machine learning, and generative AI. We also provide you services to implement data governance and to do data cataloging. And best of all, you can use the services in a modular way so you can use the services that you need.

And I'm happy to enhance yet another data capability - the Amazon S3 connectors for PyTorch. These connectors make foundation model training a lot more efficient and they do this by accelerating some of the key parameters that are used in foundation model training like saving and restoring checkpoints.

Now many customers use AWS to build the data platform to drive the machine learning. And so I'm pleased to welcome Vipin Mayer, the Head of AI Innovation and Fidelity to talk more about how to build a data platform to drive the machine learning.

VIPIN: All right, good afternoon. I am Vipin Mayer from Fidelity Investments. We are a large financial services company. Data and AI is really important to us and I believe you can only be good in AI if you have a very good data strategy, data platforms and data quality.

Now you are hearing all this from everyone and I thought we should unpack it a bit and I'll tell you a little bit about the journey and what's really important to us.

Ok. We started seven years ago in partnership with AWS...

We've done a lot, a lot still remains to be done and I could talk about many things, but I will talk about three things that I feel are really important.

Now, the first one is unstructured data. How well do you have it collected? How well do you have it organized? We started collecting it 56 years ago, we started digitizing calls. We started streaming all unstructured text built features around them, gave access to end users through query tools. So that over the years they have become familiar with text, which now with LLM is a critical capability.

The second thing that I believe is really important, especially with large companies is to have an enterprise taxonomy. Very easy to say it very hard to do because it requires getting consistency of KPIs and a semantic layer to instrument it. We have been working at it. We've got a lot of KPIs in one place that enables dashboards to be spun off very easily.

The third piece which is an investment in democratization of data. We've been abled query into our data platforms. So people on the business side can discover data and even have a social interaction with other people regarding the data elements.

Ok? Those three things I would single out pipelines, we worked with AWS, the backend works pretty well for us. Ok?

So now that with some data, let's quickly fast forward to generative AI. There are four things we are doing in generated AI conversational Q and A pairs especially for service reps on the coding, technical side developer assist plus looking at migration of code translation of code things that I think many of you know, the third piece, perhaps the one that gets talked about a lot in these conferences is rags search semantic search rendered through a conversational interface. A lot of work in that and all the announcements around vector stores, really all that work we're doing in the third lane and largely content generation with a human in the loop, easy to say. But the challenges we face, let's talk about them for a minute or so.

LLM pace of innovation. Incredible. If you go to Hugging Face, they add 1000 new models every day Claw 2.1 excellent big models. Great. But we've got to balance the large models with smaller fit for purpose, task specific models doing that, rapid experimentation, quickly challenged as we do this getting capacity and managing cost being a challenge guarding against hallucinations. Another challenge for us.

Ok. So with that, let me go through my last slate, which is what is our approach with classic machine learning. We don't talk much about it, but you need your factory usage maker. We are now excited with Bedrock but also SageMaker and being able to test and experiment all these things rag tuning, prompting uh being able to look at evaluation metrics and really critical for us a lot of work in that space.

But let me end with where I began, which is all this can take a lot of time and can distract you from where i began, which is data at the end of the day, there is a greater premium now to data quality and that's where we are still focused and a lot more to be done in that space.

Ok. Those are my quick few minutes. I'm going to hand it back to Bratton. And do you have a two? Thank you. V incredibly important insights into how you build a robust data platform because without that, it's very hard to innovate with machine learning.

Let me now get to the next key consideration and that is integrating responsible AI any powerful capability needs to have the appropriate guardrail. So it can be used in the right way. And if machine learning and generative AI have to scale, it's incredibly important that we integrate responsible AI into the way we work.

And to that end, I'm pleased to announce that you can now use SageMaker Clarify to evaluate foundation models and you can also get the same functionality on Amazon Bedrock. So here is how it works. As a user, you come in and select a few models, you choose the responsible AI and quality criteria that you want to evaluate them on. If you want human evaluation, you can also choose a specialized workforce and then Clarify goes off and does the evaluation for you and then it generates a report.

So all of that work that we had to do for Amazon Q, all of those evaluations and criteria that was months of hard work. All of that gets a lot easier now to talk more about responsible AI in the enterprise. Please welcome Arwin Jan, the CEO of Glee.

Thanks Padam. It's great to be here. Gleam is a modern work assistant that combines the power of enterprise search and generative AI and helps employees in your company find answers to their questions using your company knowledge. It's like having an expert who's been at a company since day one who has read every single document that has been written, who's been part of every conversation that that has happened in the company, who knows about every employee's expertise and then they, you know, they're ready to assist you 24 7 with all of that knowledge and information. That's what Green does. And we're so excited to be here at, we went and to announce our partnership with AWS today.

I'm going to walk you through how we address the challenge of responsible AI with our customers. Customers are really excited about generative AI, but they want to know if they can test the answers. We get back from AI have the main questions on their mind first, how do I know the answers I receive are accurate. Everybody knows LLMs can hallucinate. And actually even more importantly, you have to provide information to LLMs the input that you give to LLM is going to decide how accurate the output is going to be. And oftentimes in an enterprise information can be out of date and that can make the job an LLM hard the certain challenges.

How do you, how do I know that I'm using the best model? The market is evolving quickly. Each customer has different needs, priorities and constraints, dream is to guide them through this complex ecosystem and make it easy for them to get their LLMs that works best for them. And third, how do I make sure that my company data is safe?

Gloom indexes all of the company's data. So we take this problem very seriously. We need to make it easy for our customers to keep their information safe and not have them worry about data leaks.

Let's go uh a little bit deeper into each one of these. So first, let's talk about how we address the concerns around accuracy. The output of an LLM is going to be only as good as the input you're going to provide to it, to make the LLMs provide good answers. You need to use retrieval augmented generation to provide it with the right knowledge to work on as well. As to constrain its output to that knowledge. A really good search engine is the key to LLM accuracy. And that's at the core of Glean.

Our search uses technologies like SiegeMaker to print our semantic language models and LM models from Bedrock to provide accurate answers to our users. After the element generate an answer, we apply post processing to provide in line citations for everything in the answer. If a piece of information doesn't have a citation, we exclude it from the response. All of this put together the ac system backed by a powerful enterprise search engine and post processing l and m responses are how we address customer concerns around accuracy.

Let's talk about model selection. Each customer has their own needs and unique constraints that may require using different LLMs. So long as the model is able to pass our internal tests for accuracy, we won't enable customers to use it to power Green assistant for their employees. Bedrock is awesome for this because it's easy to select from its large repository of models and pick the one that works best for our customers.

And finally, uh on the topic of how do you make sure as a as an enterprise that your data is safe and secure bedrock is great because of its compliance certifications and support for end to end encryption. It makes it easy for our customers to feel confident that the data is secure and not being used for other purposes, each clean customer. In addition, uh gets their own proprietary AWS project running within their own environment, secure environment. And no, none of your company data leaves that environment uh including the customized models that we're trained using SageMaker inside that project.

So as, as our customer, you get to use the latest search technologies and AI technologies while making sure that all of your data, you know, resides within your own premises, within your own VPC. And finally, uh the way it works is um is, you know, we, we connect with, you know, hundreds of different applications and, and make sure that as users are asking questions, uh their, their, the answers that they get back are limited to the knowledge that they have access to.

This is what you're showing it in action here. The user came and asked the question, how do I set up deal on AWS? And the system actually does a search uh using our core search engine, assembles the right pieces of information and knowledge. Um and then uses the right technique to take all of that knowledge and give it to an LLM you know, powered by Bedrock uh and synthesize answer and response for the end user. When, when the answer comes back, we show the, we show the citations to the users on where the information come from. So this is how it works.

And we're really excited to be uh partnering with um AWS. Uh this is, these are our first steps of journey with the, with, you know, with AWS. And we're so excited uh to be here and bring the power of Glean to more companies worldwide. Our entire team is excited to explore more services in the future, like StageMaker, Clarify Trainum and InFrent. And if you want to learn more about Glean, you can visit our booth in the exhibit hall or see our website atle.com.

Thank you so much. You are. And we are really looking forward to a partnership with Glen to take Glen to a lot more AWS customers.

Let me talk next about the fourth consideration for building and scaling your generative air applications and that is having access to a low cost and highly performant machine learning infrastructure. Our hardware infrastructure starts with the GPU instances where we have the G5 instances that provide you the fastest inference and the P5 instances that provide you the fastest training. In addition, we also have custom accelerators for generative AI, AWS Inferential for doing inference and AWS Trainum for doing training of generative AI models. And in fact, these custom accelerators provide you up to a 50% better cost performance.

Now, at AWS hardware infrastructure is just part of the story. We complement our hardware instances where our software infrastructure and that is where SageMaker provides you a fully managed end to end machine learning service that you can use to build train tune and deploy all kinds of models, generative models, classical models and deep learning models. And now SageMaker has a number of purpose built capabilities to help with generative AI.

So earlier today we launched SageMaker HyperPod. Now, SageMaker HyperPod accelerates your generative training by almost 40% due to its optimized distributed training libraries. It also provides you automatic self healing clusters as obvious why performance is better customers get to train their models faster. But why do we need to provide self healing clusters? Let me illustrate with an example before generative AI customers would use small scale cluster. So you would use maybe eight or 16 nodes and you would train your models for a few days at that small scale. The probability of faults is negligible.

Now when you get to generative AI customers use tens of thousands of nodes and they're training for months on it at that scale, fault tolerance is critical because the probability of falls is very high. And in fact, if your software infrastructure is not resilient, it's going to be very hard to train your models because it will become a start and stop exercise. And therefore we are now providing self healing clusters.

Let me illustrate how they work. So as a user, when you use SageMaker HyperPods, the first thing that happens is that your model and data get distributed to all the instances in the cluster. And this makes sure that the training can happen in parallel so that the training can get done quickly once that happens, SageMaker then also automatically checkpoints or applications. It's saving the state of your training job at regular intervals at the same time. SageMaker also monitors all of the instances in the cluster. And if it finds an unhealthy instance, it removes it from the cluster, it replaces it with a healthy instance, it then goes and resumes from the last safe checkpoint

so it resumes the training job from the last safe checkpoint and then runs it to completion all of this without the user having to worry about resiliency or fault tolerance.

i'm also pleased to announce that sage maker is now launching a number of optimizations to make inference more efficient. so it's reducing the cost of large language model inference by almost 50% and reducing the latency by almost 20%.

here is how it works. so today, when customers deploy foundation models for inference, they deploy models on a single instance. and what happens is that that instance is often underutilized and that increases the cost for the customer. so what sage maker allows now is that you can allocate multiple different foundation models onto the same instance and you can control the resources that you are allocating for each foundation model, like you can auto scale on a per model basis, not just that it also does intelligent routing. so it looks at the load of the different instances and then it directs incoming requests to the instance that is the most likely loaded. and as a result, it can reduce inference latency by 20%.

it's optimizations like this that make sage maker the best place to build train tune and deploy foundation models and to talk more about this. please welcome dr eta sam al mazroui, the chief a i researcher and executive director at t i.

good afternoon, everyone. thank you for joining us today. let's go to the previous slides, please.

one of the most important thing in advanced technology. it is when you are developing a technology, you have to think about the sustainable development goal and advanced technology. it has improved the acceptance to transformation and communication facilitated sustainable energy solutions, not only this but also transformed agriculture and health care and promoted innovation and advanced technology infrastructure you can see here.

however, all of this advanced technology, it's very important to address the digital divide, ethical considerations, privacy concerns. it's crucial to ensure equitable distribution of technologies benefit for all of us. all we believe that openness is the key to harness technology potentials while safeguarding human rights and achieving sustainable development. for all of us, open large language more than is a step forward to achieve this goal.

any names or large language models are forging a golden era of possibilities. from personalizing learning experiences to summarizing massive amount of docs, not only this but these algorithms have proven that they can crack the code of n lp by harnessing language. an ms are helping us not only to solve our daily life task but also to contribute to the most pressing issues of our time.

that's why in technology innovation institute, we invested in building our falcon elements. we started in 2022 by building no one of the largest arabic nlb model in the world. never in the bow of a cloud made it all possible for us. aws accelerated compute infrastructure allowed us to proceed and process massive amounts of data train models with billions of parameters and trillion number of tokens, not only that but significantly reduce the operational overhead to take you through our journey.

we le a stage maker to preprocess petabyte scale with data to generate approximately 12 terabytes of data representing about five trillion tokens. to put it in context five trillion tokens. it's about 3 million books, each book with an average of 400 pages. can you imagine the amount of the data then? what we did is all of this data set. we used them to train all our file cnl ms seven b four tb 180 billion parameters. on a large scale high performance commute clusters.

we managed to achieve up to 166 teraflops thanks to the optimized ad aws infrastructure and again to with you and to give you a sense of that scale. if a single person is solving another problem in five seconds to reach to 166 of loops, he needs 22,000 years to solve what that cluster can solve in only one second.

then going from falcon seven b to four tb all the way to falcon 180 billion parameter. we need it also to scale our com computer capacity. so sage maker was able to seamlessly scale up to 4000 g bs. after that, we of course did our model evaluation using sage make a real time end point, not only this but we did our own human evaluations. this rigorous evaluation process is to ensure that icon is not just a technological advancement but also particularly effective and also ethically sound.

so what we did as a team, we built a symbol server less architecture and leverage a slack channel to evaluate all the model answers. finally, i am glad to let you know that all our files today. they are now available as part of sage makers on the site and you can start deploying them fine tune them with the only single click, starting with the adoption falcon 180 million. barter is now the largest and top performing open source model in the world in the hugging face. it has been downloaded over 20 million times and what that can tell you it can showcase the strong desire and interest for open source ll ms.

now, i want to share some of the most best practices that we have and it enabled us through our a i innovation. first, you wanted to foster visionary thinking at all levels. so we encourage all our researchers to continuously explore new ideas and challenge also all the assumptions. second, we also wanted to ensure adequate capacity for our experimentation. so it is very crucial to provide access to large scale compute, not only to do that solicit but also to empower and constrain exploration and experimentation. third, you have to have an institute rigorous evaluation of protocols. we thoroughly benchmark all new methods, testing and also validating them. this prevents over optimistic results and also ensure reward viability in summary, embracing the journey thinking, scaled experimentation, a rigorous evaluation and collaboration with vendors like aws. it will ensure you that you will have world class gen a i innovation. we are committed to continue applying these best practices from a seed of an idea to a garden of opportunities to deliver groundbreaking innovation. let's all shape the future of a i. thank you dr al.

it's really amazing work going on in t i on foundation models. now, sage maker is also focused on making machine learning accessible to people who may not be experts at machine learning or who may not be experts at coding. and that is why two years back, we launched sage maker canvas a no-code interface for building and deploying your machine learning models and now with generative a i, i'm pleased to announce that sage maker canvas new code interface is also being extended to foundation models so you can build and customize and train and tune models all with the new code interface. and so data analysts, business analysts, finance analysts, citizen data scientists who may not be proficient at coding, who may not be proficient with machine learning can still build with generative a i.

let me now get to the final key consideration for accelerating the generative a i journey. and that is using generative a i powered applications. many customers tell us that they would like aws to provide generative a i applications for important enterprise workflows like in the contact center, like for personalization like document processing or even health care.

earlier this week, we launched the general availability of aws health sky that uses generative a i to accelerate clinical productivity. today, when a patient has to go to a physician that patient physician interaction has to be described manually and doctors can spend almost 40% of the time, 40% of the time on this manual work. that is time that's not being spent on patient care. and so aws healths cri uses a i to automatically analyze that patient physician conversation and then uses generative a i to create a clinical summary that can be uploaded to your electronic health records. and so software vendors, health care, software vendors can now use generative a i to enhance clinical productivity to talk more about this. please welcome tom hizo, the chief operating officer at netsmart.

you didn't you, tom herz? grateful for the opportunity to represent the cause and communities that we serve. because at the end of the day, that's what healthcare is, healthcare is about people helping people. we've been digitizing health care for decades now. it's been about more and more data and the questions we're all asking now, what are we going to do with that data? whether we're a provider? all of us are consumers and health care is absolutely a universal language. i want to introduce this notion that these tools that we're talking about that we've all now arrived at that. we're so excited about. it truly is about addition through subtraction. see, i believe that less is more. and as we talk about health cri when we talk about bedrock, what we're really talking about is how can we be more efficient, less task, less input so that caregivers can see more people at the right time when they need it.

most the challenge, we all know the demand far outpaces the supply that when we schedule our own appointments, we're limited with the number of options because of the need that's out there. i'm going to talk about that here in a second. let me get to a very pragmatic idea and solution providers spend over 40% of their times two days a week, those in telehealth sessions just doing documentation, that's two days that they're unable to see someone. and if you ask them 15 to 45% of the information they have. well, what they, what they're using is really good. they need more, more contextual awareness, not just for when they're talking to you right then and there, but for things that may have been weeks ago, months ago, years ago, that contextual awareness, if you will, let's frame the challenge and how this is impacting us as a society in our communities.

we know that over 50 million people will be challenged with a mental health care illness or crisis. in a given year. we know that over 60% of our youth do not receive treatment for things that they may be suffering with like depression or anxiety. and we know that nearly 25% of adults their needs go unmet for the treatment that they're seeking or that they're not even aware that they need. this creates an opportunity for us to do something different.

this is the team, this is the cause and communities that we proudly serve. this is also the team of innovators and designers who are working together to change the healthcare landscape. as we know it, we serve over 754,000 providers who are touching over 100 and 33 million lives and beyond what we know as traditional medicine of acute or primary care. we're talking about community services, public health, intellectual development and abilities or disability needs. those who may have foster or family care services, long term care, hospice care. this is a real opportunity for all of us simply because we look at the things that we're doing.

here's what we need to focus on as a solution, not usability, not less clicks, we need extreme usability to reduce the burden on providers so that they can accelerate and prove and optimize the outcomes for the people that they are seeing. we have a unique opportunity using tools, solutions like health scribe in on bedrock to do something simple. let's give those two days back to caregivers so that they can see more people. let's streamline discharge so that as you need to connect with other people, that information is relevant to you right then and right there. and let's transform collaboration as we know it and take manual processes away to introduce how this system can cohesively follow you anywhere. anyhow, why did we choose these tools?

quite simple, house drive and bedrock produce ready built purpose built solutions that we can plug into our systems right now. it's able to scale with us from a performance standpoint and it has the ability to integrate across the ecosystem very uniquely. and lastly give you back to the solution, the notion that we started with imagine a telehealth session if you will, where you're not only just capturing the information you're doing it systematically, you're doing it with a great degree of accuracy. but using tools within bedrock to pull forward that information so that as i am interacting with you, i can look back a week, six months, a year to have relevant information to suggest the right treatment plan going forward.

and while we often talk about the tools and the technology and i love it. and i'm a geek at heart. what this really takes is for all of us working together. our relationship with aws just isn't about how we can use these in a systematic way. it beyond partnership. it's about collaboration because the things that we're talking about in health care today isn't about tomorrow. it's happening right here right now and we're deeply grateful and appreciative for that partnership, dr saha. appreciate the time and the opportunity to share our story.

thank you.