Customize FMs for generative AI applications with Amazon Bedrock

最新推荐文章于 2024-07-16 23:19:21 发布

李白的朋友王维

最新推荐文章于 2024-07-16 23:19:21 发布

阅读量143

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134836561

版权

Good afternoon, everyone. Thank you for joining us today for our presentation on customized foundational models for generative AI applications with Amazon Bedrock. My name is Kisho Ra. I am a Principal Product Manager for the Amazon Bedrock team. I have with me Anand, who is a Senior Director at NYSE, and Chris, who is a Principal Solutions Architect.

So what is the agenda for today? We are going to introduce you to foundation model customization. We'll discuss why customization is needed. We'll introduce you to fine tuning and continual pre-training. Then I'll talk about Amazon Bedrock custom models. Then we'll go through customization of Amazon Bedrock models at NYSE. And then we'll have Chris demo fine tuning and continual pre-training with Amazon Bedrock.

So why customize? I'm sure you all know that foundational models are built using large amounts of public data and this is generic data. And when you try to build applications, you can generate generic results, but you want it to be specific to your use case with your data. So customize to specific business cases, like if you are in healthcare industry and you are building a generative application, it is important that the foundational model understands the terminology of healthcare to generate the responses or adapt to a domain specific model. Like if you are in finance, like my friend here Anand, finance has different terms and terminologies and it's a completely different domain specific language. And you want the foundational models to understand that. Or if you are in customer service, you want your responses to be friendly and very specific to whatever the questions your customers are asking. And then lastly, if you want to improve the context awareness of your responses, when let's say a customer is interacting with the generative AI chatbot and you are asking a question about a specific thing, you want the gen AI application to respond, generate a response in that specific context.

So now we know why customization is needed. Let's look at the common approaches for customizing foundational models. And we will look at this based on the least time cost, least quality and complexity.

So first is prompt engineering - this is the easiest way you use either a zero shot or few shot examples and you send it to the foundational models and it will generate a very generic response and give you back the response. So it's very easy to build prompts, test prompts, they are very cost effective because you just send the prompt and you get the responses and the least complexity in doing prompt engineering.

The next would be retrieval augmented generation. This technology helps you bring the data from outside. So you take your large amount of data, you create embeddings, you store it in a vector DB. And whenever you require to fetch this information, you search the data, you fetch the information and you inject this information into your prompt. And this will generate very specific results based on the context you provide from the documents. But it's of course a little bit costly. Now you have to create embeddings and there is a complexity because now you have multiple systems involved.

Next would be customization. Now, if you have RIG regarding data which lives outside the model and you want to bring this data inside every few weeks or months, you would want to customize and we will look into the different customization techniques.

And then lastly, you will train the model from scratch for this. Of course, you need infrastructure, you need to have data scientists, you need to have large amounts of data to train the model. So it is most complex and you will get the best quality but not cost effective and will require a lot of time.

So now we have seen that the simplest was prompt engineering and the complex was training from scratch. But let's look at the middle two options, customization versus augmentation.

So we start with task, what is the task you are trying to achieve? If your task requires external data and up to date information, then next is you should ask yourself, is it do I need real time information? And if you don't need real time information from APIs or databases, you just need information from documents, then you would use augmentation with RIG. And for that, you can use Amazon Bedrock Knowledge Bases.

If you need real time information, that means you're going to a database or you're querying an API, then you would use documentation with agents and tools. And for that, you can use Amazon Bedrock Agents.

If your task requires consolidated or historical information, then you have to ask that, is it a simple task or not? If the task is very simple and generic as we discussed earlier, you would use prompt engineering and any model of your choice to get the results.

And lastly, if the task is complex and requires specific behavior, you would use customization and that is supported by Amazon Bedrock custom models.

So what are the types of customizations? There are two types of customizations - one is fine tuning and another one is continued pre-training with fine tuning. With fine tuning, you're trying to achieve a task specific performance like style or for example, let's say you want to summarize a document and you want to summarize a document in a specific format. Then for that, you would fine tune a model by giving it examples.

And then the next one is continued pre-training where if you want the model to learn specific terms and terminology or like large amounts of domain specific data, you can bring that unlabeled data and train the models using continued pre-training. You can also do a combination of both. You can first train your model with large amounts of data which is specific to your organization and then later fine tune that model, same model for a specific task.

So what are the dataset requirements for fine tuning and continued pre-training? For fine tuning, you need prompts and you need completions. So you have to show the model that for a given type of input, what kind of output it should generate and then a model generalizes it and learns from it. And for continued pre-training, you just need raw data, you just bring the raw data which has like all your documents or like any other libraries, you can bring the raw data and train the model using continued pre-training.

So let me introduce you to Amazon Bedrock custom models. You can maximize the accuracy of foundational models by providing labeled data or unlabeled data using the same API or console. You can deploy it using provisioned throughput and use it either through the API or through the playground, a customized model. And lastly, we support both first party and third party models for fine tuning and first party models for continual pre-training today.

So let's look at components of a customization job. And these components are the same for both fine tuning and continued pre-training for Amazon Bedrock.

So what are the inputs? The inputs are first you select the base model which you want to customize. Then you provide the hyper parameters. Now hyper parameters based on the model providers, some model providers may give you default hyper parameters or some model providers may give you option of hyper parameter optimization. It depends from model provider to model provider and we support both on Amazon Bedrock.

And then the input data - you saw the format for fine tuning and continue pre-training. You just need to provide that input data in JSONL format.

Next is output. So you have to point to an S3 bucket and a folder in the S3 bucket where we would generate the metrics like training and validation loss and store those metrics and any other logs into that folder as a log file. And you also have to provide the output model name which will be used once the job is completed to name the model.

Next is the storage. Now, the custom model which you generated is stored securely within Amazon Bedrock and you can deploy it right within Amazon Bedrock using provisioned throughput.

And lastly inference - as I said earlier, once you deploy it using provision throughput, you can use the model using invoke model API like any other base model or using the playground.

And last, let's look at the security and privacy. So data used to improve your models is not shared with any model providers, both first party or third party. Customer data remains within the Amazon region. And we support both private link and VPC configuration for security and integration with IAM roles. API monitoring is available through CloudTrail and CloudWatch and custom models are encrypted either using the service key or you can provide your own customer managed key for encrypting the model.

Now, I would like to invite Anand on the stage to talk about continued innovation and generative AI implementation at NYSE. Thank you.

And at the end, we have NYC's unique visibility platform to share their innovation with a broad global audience and the next one Genie A.

So before Jan, I, once we finished our migration of our regulatory surveillance platform to AWS cloud and our day to day operational challenges were behind us. Um Susan and I were brainstorming and I I said that let's invest in machine learning. And at that time, uh LLM was not the talk of the town.

So Suresh who is a machine learning expert, he joined our team and we started building a team around machine learning. The whole idea was that how do we find patterns in a time series that are using ML models?

So by the time Jenny, I became the flavor of the season, we were ready, our team was ready. Our pipeline was ready. So many new innovative stuff and we are actively investing time in J and because we are a highly regulated exchange, we try to understand each and every aspect of the pipeline. We understand what it takes to have a platform running using Gen.

So we are, as I mentioned, we are doing quite a lot on J but let me take you through one of the use cases that we build using Bedrock.

So being a lawyer, son, I grew up around legal terminology like pro bono de novo sinai, understanding all these legal jargons can be like deciphering a secret code. Back in 2007, when I joined YC, I couldn't understand any of the trading rules just by reading the legal documents.

So I used to look for team members who would simplify the explanation and give me something that I can understand. Depending on the workload and mood of the individual, I would get a varied response. At that time, I truly understood the meaning of subject matter experts.

So when we were brainstorming, what can we do with this use case came into the picture where we thought, why don't we take all these legal documents or trading rule documents that we have for all us exchanges not limited to NYC and build an intelligent trading rule chat board.

So we take, we took about 20,000 pages of documents and then we created this chat chat bot. So what was the idea? The idea is that can we increase the traceability of the rules to requirement to test cases? Can you get a simple, simple explanation of anything that we can ask? Can we mitigate the risk for our compliance requirement for trading rules? And can we optimize our tech talent utilization by not relying on domain expertise but strictly relying on the technical talent.

And also as I said, we always try to stay ahead of the curve. So the rules are changing every now and then. So we could be stay current. And this also becomes a reference framework for other use cases that we have at ICE.

So if you look at the example i have here, here we are asking this chatbot, can you explain me q orders? Now? It gives me an explanation that i'm able to understand. But the the the main aspect of this chatbot is able to understand the context, right?

So we ask a follow up question. Is there an equivalent of qe order in other exchanges? Clearly, it gives me an auto type in other exchanges which has the same underneath criteria without having the same name. So the model is able to understand all the behavior of that order by going through the documents and give us that insight. I find it quite fascinating to have this and to be able to build this with ease.

So talking about the architecture here before i talk about the bedrock. So let me tell you that when we started building this pipeline and bedrock was not ready at that time, we spend a lot of time, you know, starting from like taking a vm doing fine tuning manually to trying everything in sage maker, having different kind of vector databases and things like that. But when we were working with aws team kishore uh came uh said that, you know, you can do all these things in bedrock.

So why don't you try it? And in a matter of days, you know, we were able to put together this one where there is a chart of running, you know, and uh we have embeddings running our documents are on aws s3 and we are using vector databases here for storing our embedding. Now, the question in context goes to the foundation model and then we get the response right?

I would like to call uh redis here because they have been a great partner for this one to solve this problem. And this is probably not the only one where we will be working closely with them. They provide a great, great vector database solution.

So just to talk about the customization, we tried different kind of models. Um anthropic clad titan with and without fine tuning lama two jurassic two to name a few. And we tried every kind of possible combinations like try with fine tuning with rag. Why don't we do some prompting on top of that and try with rag. Can we take a foundation model ass like claw and see what it does? And we are able to do all these things very easily in bedrock, you know, so that that was a game changer for me, you know, i keep talking to our team members then within a few days, the thing that we are able to put together here is quite amazing.

Um the other initiative that we are actively working on at ice. Uh the first thing actually that we started for jai was the new sentiment analysis and correlation. So we, we have a news feed where we get on structured data, we feed into one of the j model which summarizes, then we feed the data into another model which predicts whether the stock price will go up down or stay neutral.

Then we also are building network traffic anomaly detection, data quality check, email parsing for bond pricing, convert complex otc structures, fraud detection and mortgage life cycle.

Lastly, i would like to reiterate the program. We launched yc launch pad. It is to help today's entrepreneurs and we have a vast technical expertise so you can leverage that. And this is a secure cloud based platform that we have and the products and application will be evaluated here to receive actionable insight on the product.

And last but not the least we have nyc's unique visibility platform with that. Thank you. And i'm here after the session for any other questions that you have? Ok. Who here wants to see some code everyone. All right. Thank you. That's my job. And i gotta hit this. Oh, get it done now. Take ok. All this code, by the way is available at this github repo.

Uh and i'm gonna do two demos today. I've got the notebooks already run just because of time sake, but these are uh real uh like notebooks that you can run in your own environments. So this one's gonna fine tune llama two. We're gonna use a uh chat dialogue. Well, actually not chat but more of a like summarization um right. Like conversations trying to summarize conversations.

So we're gonna fine tune with some of that data. We're gonna use uh bedrock with llama two. This is the 13 billion, but you can use 70 billion as well. And so what i'll do and the second demo will be continued pre training where we're gonna feed in pdf s uh of, you know, large books and, and documents and see how it changes the model.

Um so in both cases, um yeah, i'll show how to set up, i'll show how to test the base model before the fine tuning or the uh continued pre training. Um yeah, also called customization. Uh we will upload. Uh so we'll then prepare the data set for the customization. We'll then upload to s3.

Uh we'll actually run it and then provision that custom model as a like end point that you can hit with the api and then we'll test the customized model and then for, you know, cost savings, we always recommend delete uh these end points when you're not using them. Ok?

This is how we'll switch to continued pre-training in the next demo. Ok? We could specify hyper parameters. Um yeah. So just like Keyser said, you know, start with some uh like either academic papers or you know, other people's work to figure out sort of where to start. Um well, yeah, start with the defaults that the model provider has specified and then you can tune from there based on research uh that you find from similar domains. Ok?

And then, so this is what the model looks like in the console. And I think by now, yeah, everything's been released. So yeah, otherwise you might see some. Yeah. No. Yeah, I think everything's been released because it's Thursday. Um and here's the actual uh training job name you can click in there. Here's the source model. So that was the Lama 213 B and here's our hyper parameters, the input location and the output data. Ok.

And so now that the model's trained, uh yes, I did that offline uh before uh like walking in here. And now you want to provision that custom model to sit behind the Bedrock API. And so from now on, whenever you invoke the API, you specify the custom model name, right? Which I put into a variable uh up here somewhere.

So let's get to, oh, and then this is what the uh it's called provision throughput. So, uh you know, this is where you can purchase model units, you have to look and see what a model unit is and you know, get down to that level. But once this endpoint has been provisioned, you can scale this up as you need. Uh and let's see what else is here, ways to tag it. And then here's the actual model that the endpoint is serving. Ok.

Now, because it's a Llama two model, you have to know how does Llama two, what's the proper format to pass in to Llama two, right? So there's some tricks um where there's, you know, conventions that these models follow, again, find the model card figure out um what needs to be passed in.

So here's an example. Yeah, here's that same example. We're showing similar output here. So this is the base. So here I'm using Lama 213 B chat. That's right. Like essentially the, the that is the chat variant of Llama two. It's pretty chatty like I said, and now let's use the fine tune version. And here, so here I passed the provision model a rn which came after I provision the endpoint and now I get a nice crisp, clean. Um and so this is, so this is the example where we're trying to change the style or change the actual output. So this was a good use for fine tuning. Ok?

Yes, I should make it clear to just like Ko said. Um and uh like Anan said, start with RAG, which we didn't cover. I'm sure since all generative AI uh folks are more interested, you've probably seen a bunch of RAG talks already. There's a lot of examples. Um this would really be sort of stage two where you want to change the actual style of the model responses. Ok?

And then delete the endpoint. All right. So let's switch gears to continued pre-training. This is a fun one. Actually the hardest part. So now continue pre training just to back up a sec. This is where I can just pass in unstructured data. I could just give it a pdf and it will continue to what's called pre trainin, right. These, these foundation models have been pre trained by Meta by, you know, Cohere. Uh yeah. By the Anthropic. Yeah. The AI 20 ones. And we might want to introduce new terms and, and shift those like distributions, right? The weight distributions shift those probabilities to learn your set of documents. Ok?

And the hardest part was trying to find something that Titan or sorry, that, that Llama two or no. Oh no. Yeah, sorry. Actually I'm using Titan for the continued pre-training. Um it, it's hard to find data that Titan hasn't seen already, right? Especially because most data we're you know, getting from public data sets. So fortunately. Oh yeah. And then we're gonna follow the same structures set up, we're gonna test the base model before continued pre-training, upload the data uh and then uh provision.

So I'm actually gonna use LangChain. Some of you might be familiar with. I think this is a um yeah, pretty standard library by now to coordinate your generative applications, your uh gen AI applications. Um they also have good libraries to um convert like pdf ss into text and chunk it up into text. And then that's what's actually gonna will will feed in as json lines into the continued pre trainin. Uh so I'll show you how that that looks here in a sec doing some more pandas magic, getting the control plane ready for me to invoke the job and then the uh data plane ready to do the model inference here we are going to use Titan express. There's uh I think there's also Titan light. Are you sure? Yeah, Titan light.

Uh so yeah, like I said, um it was hard to find data and then this book just came out. Um so I know Titan hasn't seen it because uh we, it, it hasn't been been released yet. Um it's gonna start shipping, I think next week. So it's a 300 page book. Uh this was authored by me and some of my coworkers and we actually convert this into json lines. Oh and so yeah, just to kind of back up a sec. I'm I'm gonna ask the base Titan model Titan express that has not been customized to please describe this book.

Now, of course, since the model's never seen this data, it actually, you know, starts to kind of pretend like it does. So here we see some like hallucinations. This is Titan, you know, trying to be friendly, trying to come up with something that, that could have a title generative AI on AWS. So I know because I was one of the authors that this isn't really what the book is about. In fact, I don't even see uh Bedrock being called out. I see Stage Maker being called out, I see, you know, some of the other services Recognition and Transcribe. So without customization.

And so now I want to introduce my own documents, which is this book and I'm going to convert it. So here's the actual pdf, I'm gonna use some of these LangChain libraries that I use for my RAG code. I can reuse this to just split up this document into chunks. And um it like roughly does about one page I like figured out a way so that the size of the chunks and you know, the overlap gives me about uh one like line item per page, which just happened to work out that way seems to work well.

So this is just boilerplate code that's gonna convert this into the j lines format. Uh if you remember on Keys slide he showed for continued pre-training, you have to pass just the input. So there's no completion here, right? There's no output because I'm not fine tuning. I'm just giving it raw data. Ok?

So about 305 pages, we're going to upload the data pretty straightforward. Now, here's where we launched the job with continued pre-training. So not the fine tuning uh slightly different hyper parameters for continued pre-training. This is something to keep an eye on. If you haven't done a lot of uh like continued pre-training, you want to increase the epics typically. Um and then also reduce the learning rate. A bit. Uh yeah, I actually had to work with some of the Amazon scientists to try to find a good balance here. So yeah, always good to have an Amazon scientist on uh the team.

So let's take a look at this model. This is after it's been trained based on Titan express. And if I keep going, here's the provision. So I call on the Bedrock uh control plane, I say create the provisioned model throughput. This is a long like API way of saying, you know, like create this uh specific uh like end points for this customized model and now I can invoke it.

And so let's see what it gives. Oh and then here's, here's what's called provision throughput. This is where you would select the model units like we did with the fine tuned model. So all of this is now the same, ok. There's going to be different. Um well, actually, it's the same input here and I say describe the book and now let's see what it returns.

So we see. Oh, wait, yeah, sorry. Yes, I made this mistake while I was going through my head too. That was the same as before. And now this is the actual custom pre trainin. So they're past the, the uh the a rn which is the unique id for that model for the provisioned endpoint. And we see now there's, there's info here on Bedrock. Uh there's also the way that we broke the book down is, you know, data, data preparation sort of throughout the life cycle, right? Like starting with your data, getting to the generative model optimizations uh the evaluation. And so this is much closer to what we actually wrote. Ok.

So what we've done is we've essentially shifted the token probabilities, right? When I'm trying to predict the next uh token I've shifted those probabilities to now start to predict based on more information that we've given to the model. Ok.

And there's all the code, uh, that's the end of the demo. And I think we're gonna take questions offline. Is that right? Ok.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Customize FMs for generative AI applications with Amazon Bedrock

Now?Ok.
复制链接

扫一扫

Customize FMs for generative AI applications with Amazon Bedrock

“相关推荐”对你有帮助么？