Use RAG to improve responses in generative AI applications

All right, I think it's time to get started. So, uh thank you everyone for attending our talk on how to use retrieval augmented generation to improve your responses in generative AI applications.

So hopefully, that's what you're here for. Uh and if not, then you're stuck with us.

So, uh let's get started. Uh we got an action pack agenda today where we're gonna cover a variety of topics including customizing foundation models. Um why you should think about customization uh and common approaches for how you should customize.

Uh then we'll go specifically into retrieval augmented generation or RAG and we'll deep dive into how it works and cover all the different components from the data ingestion workflows to how uh embeddings work and demystify a lot of the concepts that you might have heard.

Uh throughout this uh conference, we'll introduce knowledge bases for Amazon Bedrock, which I hope you heard in the uh keynotes from Adam and Swami.

Uh and then we'll talk about how we're making uh the building of RAG applications really, really uh easy. And then we'll cover lastly how these capabilities of knowledge bases work with other uh parts of the Bedrock ecosystem such as agents for Bedrock.

Uh and also how you can leverage open source generative AI frameworks such as LangChain for building uh retrieval augmented generation capabilities.

And I forgot to introduce myself. Uh and so I'm Ruja Marcus. Um I lead product management for knowledge bases in Amazon Lex, uh technical lead for generative AI specialist in the worldwide specialist organization.

And um today, I'll be core with Ruha and taking you to this journey of how you can build your retrieval, augmented, generate uh genera generative applications uh using knowledge basis for Amazon Bedrock. It's a baby.

So uh quick show of hands um who has heard of knowledge bases for Bedrock either through the keynote uh or through our previews. Ok? Fantastic. Quite a few of you have and it looks like a few you haven't. So uh we'll definitely have a variety of content for uh for everyone here today.

Um all right. So first, I wanna talk about why you should think about customizing a foundational model. So foundational models have a vast amount of pre trainin knowledge uh directly encoded into the model, right? That's if you heard the term GPT, that's really what the P stands for, right, that pre-trains knowledge.

Uh but it's important to understand that um in many ways, these models don't really know a lot about your specific company, right? And so the first reason you might want to customize foundational model is to adapt to domain specific language, let's say you're in health care and you need the model to understand all the medical devices that you sell, right?

Um that doesn't come out of the box in, in, in uh in most instances. Secondly, you might want these models uh to really perform better at really unique tasks um suited for your company. Right. Let's say you're a financial services company and you're looking to do more advanced accounting or analysis on earnings report. And you want to teach these models about um tasks and help uh really specialize these models on a company specific data set or task.

And lastly, you may want to think about customization when you want to improve the context and awareness of these models with your external company data.

So how do you bring company repository such as FAQs or policies and other documents that exist in your company and pass that as context into a foundation model. So those are a few reasons why um you may want to think about customization and now we'll cover, you know, some common approaches of how you can customize a foundation model.

And we'll talk about a few common approaches. These aren't exhaustive and there are other approaches and these will, you know, grow incrementally in terms of the complexity, the cost and time it takes to implement these changes.

So the most simple approach uh for customized foundational model is prompt engineering and a prompt is simply the user input that you pass to a foundation model, right? And these prompts uh can be crafted and iterated upon to really steer the direction to get the right output foundation model.

And there's a variety of different approaches that you can leverage for prompt engineering, prompt, priming, prompt waiting or even uh um chaining different prompts.

So prompt priming is really the most basic form of prompt engineering, which is just taking um a input or a form of instructions and passing that to a foundation model. Sometimes you can even pass specific examples or tasks in the foundation model through the prompt. Uh and that's known as in context learning.

Another approach as i mentioned is prompt priming where you uh sorry, prompt waiting, which is giving more emphasis on certain elements of the prompt that you want the foundation model to really focus on, right.

So if you tell the the model, um you know, definitely don't respond to something that you don't know about, right? Capitalizing that and putting 5000 exclamations like those things actually do bias and and put emphasis on certain parts of, of your instructions.

And lastly, there's prompt chaining, which is taking more complex prompts and breaking that down into more discrete parts where outputs of a certain prompt are then passed as an input into the next task.

So those are just a few examples of of prompt engineering.

Secondly, there is retrieval augmented generation which is all about leveraging external knowledge sources to improve the quality and accuracy of responses. And when i use the term external knowledge sources, it's likely that these knowledge sources are actually internal to your company. But it's external in terms of the knowledge of the pre trained model, right, you're, you're really helping bring new knowledge to the foundation models. Hence the term external, right? It's external to the pre trained uh foundation models.

And we'll really deep dive into retrieval augmented generation throughout the presentation. But the basic steps being you're, you're retrieving some form of text from a corpus of documents. You're using that as context to a foundation model to ultimately generate a response that's grounded in the knowledge of your company's data, right, which is extremely powerful when you think about using the advanced reasoning capabilities, foundation models. But really steering that towards uh the knowledge specifically uh from your company data.

So these two forms of uh customization uh is really about augmenting a foundation model. We're not actually going in and changing anything in the foundational model itself. But there are approaches that allow you to do that um such as model, fine tuning and model, fine tuning.

It allows you to really adapt a foundational model on a specialized task specific data set. And this is a supervised approach, meaning that you're training the foundation model on labeled examples of tasks and you've specified the expected output and outcome uh through those examples allowing you to really um you know, train this model on a specialized task.

And through fine tuning, you're actually updating the weights of uh of the model, right. The parameters of models are actually being adjusted uh based on uh this customization.

And lastly, and arguably the most costly and time intensive approach is training the foundation model from scratch a uh approach where you really want to be in complete control of um of the data that's used to train the model. Uh you may want to remove an inherited bias that might exist from some of the uh other context that the model is trained on and it's giving you complete control and building a domain specific model.

Uh but obviously, this requires an extensive amount of uh task specific data. Um a lot of compute resources uh making it obviously one of the more uh you know, complex and time intensive and costly approaches.

And so, while we talk about a few approaches of customization, again, uh this is not exhaustive, but some of the more common uh approaches that you'll see in model customization. And today we'll focus specifically on RA.

So now that we know why you should customize and common approaches on how to customize. Let's look at a mental model for when you should use certain methodologies. Um so it's easier to make a decision of, of what approach to really take.

And it all starts by thinking about the task that you want these foundation models to execute? And does this task require context from external data? Is kind of that first decision point?

And if the answer is yes, you really then have to think about is that data uh access needed uh in, in, in real time. And if the data is relatively static where it's not changing on a real time basis such as frequently asked questions or policies, uh that's a classic use case for retrieval log to generation.

Uh but just because i use the term relatively static, doesn't mean that this data isn't changing, right? And, and, and so uh i don't want that to be misleading because you can, you know, have data changing in this construct as well, but it's not changing real time.

However, if the data is changing real time and you also need the ability to connect to different tools, meaning that i'm fetching or querying data from databases or i'm interacting with API s or applications and tools. That's a use case for uh agents for amazon bedrock and will also cover how ancient and knowledge bases can be brought together uh to really build, you know, um a really powerful um you know, uh capability.

So these aren't mutually exclusive by, by any means. Next. If you have a use case, that's more uh it's leveraging historical data, right? Uh historical, like a snapshot of information. And it's a relatively simple task that might already perform really well uh from a pre trainin foundation model. Uh that's where prompt engineering can, can really help, right. I'm passing some uh specific context or task or instructions, right, as part of my prompt engineering.

Uh and that in many times uh can be extremely effective. And lastly, if i have historical data, that's maybe a bit more uh complex. As i mentioned that, you know, is, is task specific and, and, and needs a little bit uh more task training. That's where model fine tuning um you know, serves a really important purpose.

And you might have heard today in the keynote that we've announced the ability for fine tuning for foundational models uh for all the bedrock foundation models uh with the support of fine tuning for uh anthropic quad coming soon.

And so let's deep dive into rag and really understand what is retrieval augmented generation.

So, so far, uh what rahab has covered is how do we, why do we need to customize what and provided us some really good prescriptive guidance on when to customize and how you can work backwards from your particular use case. But now let's understand what is retrieval augmented generation.

And as the name suggests, the first part is retrieval where you have to retrieve the information, relevant information and then augment it with your original query, pass that to the foundation model to generate the accurate response.

Now, there are so many aspects to it. Now just imagine if you have this large amount of information, right? And then you say, ok, let me just add everything to the model. Everything. What will happen, multiple things might happen.

First of all, your input size that the model can take, which we call this context to length might not be enough and you might get errors.

Second, just imagine if somebody throws us like throws a lot of information on us as human beings, we will also be like, oh let me pick up the relevant one to answer the question. How do i do that? It takes us time, right? The same goes for the model.

So what we need to do is provide relevant information and that's where the retrieval part becomes super important.

So when we retrieve the relevant information from our large knowledge uh data, and then provide that relevant context to the model, right? So that relevant context, we augment it with our original query so that the model knows the question as well.

And then we feed that to the model and then the model that helps the model to generate responses. And the prompt engineering also plays a very important role over here because we might want to add more instructions to the model based on our use case.

So let's take a look at the use cases for retrieval augmented generation. So the first use case that comes into mind when we, you know, when we think about RAG is really improving the content quality.

How are we improving this content uh content quality using RAG is by reducing hallucinations. So for example, as you have mentioned, right, these models, when we are talking about pre trained models, they're really big, they are trained on really big amount of data

But that data was somewhere in some time in point in time, right? That might not be very recent. So that's number 10 I don't have uh the recent data with the model and the model can act super intelligent and provide you with incorrect answer if you ask recent information on which it was not trained on.

So in order to improve those quality of the answers or the responses and remove hallucinations, that's where we can use retrieval augmented generation technique to improve it right now. We have covered that part. But what if I want the model to answer only based on my knowledge or on my enterprise data? I don't want it to provide me answers from its own knowledge. I want to use the intelligence of this model and channel it only and make it focus only towards my data. That's where there's you know, the applications such as context based chatbot and the q and a comes into play, right?

So you can use rag technique to build those applications. The third one is personalized search. Why do we want to limit to this question and answering why not add this technique? Because we are ways retrieving the relevant content and provide maybe augment our recommendation engine and to create such type of application based on um my profile as a persona that i might have in my preferences or based on my previous uh history.

For example, if i'm on the retail side, i bought certain products. There is a history which is already there. What if i want to use that along with my preferences and show recommendations to my users. So you can do that using the dark technique as well. And the last one is super close to me just because it uh the way it works.

Uh so i wrote a book on applied machine learning and high performance computing in december 22 2022. It was published and uh somebody gave um and you know, at that time, this generative a i was also getting popular. So somebody posted a, a review using generative a i and trying to summarize the book, which was approximately 400 pages. Now, just imagine if you do that, that was a really cool thing. I really liked it by the way, but it was missing the key points.

So how about using rag techniques to do the tech summarization as well? Or maybe i just want a summary of a particular chapter that i'm interested in, right? And make sure that it has all the key points. So we can use rag techniques to do that as well. And then when we have talked about the use cases, how do we use different types of data? Right? You might be dealing with different types of data sets or different types of retrievals can also happen, right?

So what technique should i use? The one is can be simply based on certain rules or the keywords or the phrases and i fetch the documents. It works for me. Yes, let's use it, right. So we have to work always backwards from our use case or the data that we have in hand.

The second one is i might have a lot of structured data, maybe imagine a use case. And this is actually something that we have already built with some of our customers. Imagine a use case where there is a natural language question. But i have my data in a let's say analytical database or a data warehouse or transactional data. It can be anything, right? And then based on that natural language question, we use the foundation model to create a query uh generate a query and that query runs on your database, gives the results back. And then we use the foundation model to synthesize those results to provide you the answer of your original query, right?

So you as a user get a full experience and i asked a question, i got the result. But behind the scene, you would, you know so many things were going on. So that can be one approach the third one is semantic search. And this, i would like to explain with an example because it really takes me back to high school or even before that actually elementary school.

So when i was in school, there used to be a reading comprehension when i, i was given this passage and then there were certain questions that we had to answer based on the passage, right? So as a kid, i was like, oh, i'm smart, i'm not going to read the whole passage. I'm going to use the keywords in the question. I look up those keywords in the passage. There are just 23 paragraphs, four paragraphs and i'll be fine.

So i used to get 10 on 10 every time in up until elementary school, some part of the middle school. And by the time i reached high school that 10 on 10 actually reduces to three on 10, 4 on 10. Based on how lucky i was. The reason was as i was growing, these passages that were provided to us were becoming complicated. The questions that were asked were tricky. They were not more based on the keyword. I literally had to understand the question. I had to understand what the author is trying to say at a high level before i can even attempt an answer, right?

And that's where the semantic search for machines come into play. So understanding the meaning of the text and then providing you the answer, right. So that's the third kind of retrieval. And we'll be mostly focusing on this third kind of retrieval today.

So in order to do semantic search or the meaning of the text looks lovely, right? But when i have to do it and i have to do it like i'm not doing it like the machine is doing it right. So then what do i do? How we, what what we nearly need to do in order to enable the machine to do it.

So what we really need to do over here is convert our text into numerical representations. Now, why do we need to convert the text into numerical representations? Because we want to find the similar uh documents or the text based on the cushion that is coming in and i'll double click on the numerical part in a moment, but we have to convert the numerical representations in such a manner that it's able to retain the relationship between the words, right?

If it's unable to retain the relationship between the words, then it won't, it won't be meaningful to me or the machine, right? And the purpose is not solved. So the selection of the embeddings model because you're not going to do it yourself, right? You will use an embeddings model feed in the text that will convert into numerical representations that will maintain the meaning and the features and the relationship between these words.

So that's how uh you know if you have to do semantic search, you need a numerics model, you need to convert into numerical representations, your query will come in and then you provide uh it will fetch the relevant results based on that. So how does is it helping me and brief briefly if we have to summarize it is helping me to fetch the results based on the meaning it is helping me because i'm getting accurate context.

If i have accurate context and i'm feeding accurate context to the model, i'm getting accurate results, right? So look at how we are connecting each and every dot over here, right? So selecting, so first you have your data, then you have to split the data into chunks so that you can create embeddings and the high quality of the model which will create the embeddings will influence your response, the retrieval and the retrieval will influence your response, right?

So that's the reason why embeddings are important. So which model to select? And for that, i'll hand it over to rehab. Thanks benny. So it seemed like a complicated process and you know, the uh the thought of actually building an embeddings layer uh seems a little daunting, which is exactly why we launched the general availability of the titan text embeddings model.

And we actually launched this in september and the titan embeddings model is optimized for text retrieval, use cases and rag use cases and is available uh in over 25 different languages such as english, spanish and chinese. And because titan text embeddings is accessible via amazon bedrock service experience. You can access this model with a single api without having to manage any infrastructure like it's so easy, right?

You think about just pointing towards a model, passing it, passing it the context and and getting the embeddings uh built, right? It's, it's so incredibly easy to use through a single api. Ok. So now that we know what embeddings are and have some foundational knowledge of rag. Let's really understand what's happening underneath the hood, right? Like what enables this from a technological perspective.

And before you can actually ask questions about um you know your your data, the data actually has to be optimized for for a rag use case. And this is the data ingestion layer and there's a workflow corresponding to that layer and we'll go right to left in this workflow. So it starts with your external data sources, right? Your company data uh this could live in s3 and it could be in different file formats or it could even be pdf documents and unstructured data.

We take this data and then go through a process called chunking and chunking. Really is just uh the splitting of that data into different segments, which is really useful for optimizing for things like relevancy. And then these chunks are then passed into an embeddings model such as titan text and then ultimately stored in a purpose built vector database, which is really optimized for indexing and retrieval of embeddings and can maintain the relationship and semantic uh meaning uh that you get through an embeddings model.

And once you go through this state ingestion workflow, you're now ready to ask questions and really see the true power of rag. This brings us to the text generation workflow and it starts with a user asking a question. So that question or query is then also um it goes to that same embeddings model to turn that question into uh a vector.

And that vector then is search and that same vector data source um that allows us to do things like vector similarity search, right. So where you don't have to ask questions and that same rigid keyword context where we can actually extract meaning and look at similar aspects of that question and how it might relate to documents. And that's the real power of semantic search, right is, is really uh looking at that relationships and understanding meaning uh more deeply.

So once we get that search result, that's the retrieval part, right? We're retrieving that data from the vector uh database and then we're passing that context into the prompt um for a foundational model. So we're augmenting the prompt with these return passages. And that's the augmentation part, right? That a we're augmenting the um the prompt. And then ultimately, this large language model, the foundation model is generating that final response, right? And that's the g part and this workflow, as you might imagine is it can be fairly cumbersome, right?

And there's so much inherent complexity associated with building a, you know, a complete rag application, you have to manage multiple data sources, you have to think about which vector database to use. Um how do i make incremental updates that vector database? Uh and it requires actually a lot of different disciplines, right? You need uh help from data scientists and data engineers and infrastructure engineers.

So if we're about scaling and dev ops and a lot of this um can seem daunting open source frameworks such as l chain have made this a little bit easier. Uh but it still requires a considerable amount of development and coding. And so how might we completely abstract away all this complexity? And that's where we have knowledge bases for amazon bedrock where we want to implement fully mana we want to implement drag or build applications based on this drag architecture that we just saw, but in a very fully managed way so that you can focus on solving the business use cases or the problems or you know, working on it and we take away all the heavy undifferentiated, heavy lifting from you.

So how is knowledge basis for amazon bedrock going to help you? So first of all, it provides you with the data ingestion part that we just saw, right? So it will automate a lot of those things and we'll see that in a moment. The second part is it will securely connect the foundation models and even agents for bedrock with these knowledge bases or your data sources, right?

The third is retrieval, right? How we can easily retrieve the relevant data and then augment our prompts. So it will help you do that. So we have features and we did recent announcements and then we'll be doing a deeper dive on those and the last one is source attribution. I don't trust anyone to be honest. Um i'm just kidding. I trust a lot of people. But when it comes to machines, we need proof. And that's where source attribution comes into play that.

How do i know that my foundation model is giving me the right response because it's the response is based on these data sources that i was providing, right? So let's take a look, let's dive deep into the data injection workflow first because if you don't have the data in the vector db, you cannot really do the retrieval augmentation and generation.

So the first part is data injection workflow that we just saw, right? In this case, we are moving from left to right. So you have new data and then the data sources, chunking embeddings models storing into vector tv, right? Imagine you have to implement each of these things on your own. First of all, you would need resources who can code really well. Second, you can code and then you have to do the maintenance of the code, you might want to use open source frameworks, which is great. But sometimes, then you have to think about the versioning piece of it, right. So there is a lot that goes into it

"And then you also have to learn specific APIs for the vector store that you are using. What if we say and change everything by providing and by providing an option to and reducing it to the choices. What if we say choose your data source. And in this case, we support Amazon S3 as a data source. So you select the date, your bucket in which you have your documents, right?

And we provide support for incremental updates as and when your new documents are coming in, all you have to do is start the injection job sync, right? And then multiple data formats, you don't have to really, you know, worry about the different data formats because with Knowledge Bases for Amazon Bedrock, we provide support for PDFs uh comma separated file, CSV Excel documents, Word documents, HTML files, Markdown files.

Um and I think that was pretty much it and the text files as well, right? And the list uh may grow as we move along. So we have support for a lot of these file formats. So you can literally have your data and then upload it on S3 and add it as a data source.

Then we provide you an option where you can do chunking like splitting your documents. You might say, you know what? I don't want to choose anything because I might not be aware of those things. That's fine too. So we have default chunking option, which is defaulted to 300 tokens with 20% overlap. So you don't have to choose if you don't want to, right? But if you want to and if you want to have a particular fixed chunks that you are interested in, you can provide those as well. So the second option that we have is the fixed chunking. You provide the number of tokens for each chunk of text that you want. And an overlap, we recommend having it between 10 to 20% and then uh choose your embedding model.

So right now we support Amazon Titan embeddings model and Rahab has already covered that. So I will not repeat that. Uh and it's important, just one thing that I want to double click over here is when we say it supports 25 languages, it's very important aspect because remember when I was talking about the embeddings, these embeddings, when we say numerical representations, they are maintaining the relation. If the model doesn't understand the language, it won't be able to maintain the relationship between the words, right. So it is important that if your text is in a different language that your model should know about this, your embeddings model should know about it and should be able to maintain that relationship.

That uh the next part is the vector store. So we are providing you options over here whether you want to use OpenSearch, serverless Redis or Pinecone, right? So we have options over here and all of this, you do the choices you click the Create Knowledge Base button or if you're using SDK, that's a Create Knowledge Base API and everything is taken care of for you. It's automated and fully managed data ingestion using Knowledge Bases for Amazon Bedrock.

So now we have our data ingested and ready to use. So the next step is now how does my architecture looks like it looks like something like this, right? We have Knowledge Base. Now the data is ready but we still have to query which will provide the query, create the embedding, retrieve the context augment the prompt, uh provide the foundation model, still do the prompt engineering and then get the response, right. So we still have to do a lot of work.

What if we eliminate that and takes away some of that heavy lifting as well? So with that, we recently announced two more uh features of the APIs. One is Retrieve and Generate which will literally retrieve the relevant documents, feed it to the model and give you the response. The second one is Retrieve API if you need more uh customization option. So let's take a look.

This is how it will look like your whole architecture. The user is asking a question, you can't retrieve and generate API and you get the response and this retrieve and generate API does the work for you? It will take your query, create the embeddings with the embeddings model. It will then augment it to your prompt and then it will feed it to the model that you select. Currently, we support two models, Cloud Insta and the Cloud Version two by Anthropic. So we support these two models that you can select and get the generated response, right? Pretty cool.

And then if you say this is good, but i need more control, right? I don't want to do the heavy lifting, but I still want control. I still want to customize a bit. That's where we have our second API which is the Retrieve API where you, we enable you and provide you the flexibility as well. Over here, we are still helping you where you have your query. The it will be um the Retrieve API will automatically use the embeddings model, create the embedding for your query and provide you the relevant data or the relevant documents. Right.

Now, what you have to do once you get the relevant documents is do the prompt augmentation, you have flexibility, what instructions you want to provide to your prompt based on the model and literally use any model provided by Amazon Bedrock or maybe you know, you might have a custom model or fine tuned model that you were working within the Bedrock uh system that you want to use with Retrieve API you can do that, right.

So we have options and we ha we still want you to take full control of your application, your decision points, which really impact you the answers that you are getting right from your application. So these are very important concepts. Enough of talk, right? Let's see something in action. Let's see how it looks like in the console.

So the demo part now and I'll share my screen now, you can see my screen. So where I am is basically, I'm on the console and I've searched Amazon Bedrock and then this is Bedrock and then I have to go to Knowledge Base, which is literally under the Orchestration. So we click over there and then it talks about what all you can do from the console. You can create a Knowledge Base, test the Knowledge Base and then maybe use it, right.

So we'll go through that uh in order to make sure because you know, we have limited time, I've already created a Knowledge Base, but I'll still walk you through if you have to create what you will need to do. So the first part you see over here is you have to create a Knowledge Base. So you click on the Create Knowledge Base button and by the way, whatever I'm showing you over here, you can do it via SDK as well.

Then you provide a name, I would suggest that you provide a very meaningful name over here because you might end up having a lot of Knowledge Bases and you don't want the uh any confusion, also add a meaningful description and then you need permissions in the role, right? Because when we were talking about Knowledge Bases, Knowledge, Knowledge Bases will be accessing your data in S3 and then we'll be storing it. So creating the embedding. So they need access to the embeddings model as well. And also they will be storing the embeddings into the vector DB. So they need access to those as well.

So make sure that your Amazon Bedrock execution role for Knowledge Base has all those permissions. And if you are unsure how to do that, simply select, create and you create and use a new service role option so that it's automatically created for you. And then we go next data source. So provide a meaningful data source name, provide an S3 location. I'm just going to type uh this is not a uh existing S3 bucket. To be honest, I just provided the name for the demo purpose and then additional settings. This is where you get to select your chunking strategy.

So you can select uh three options as I mentioned earlier, default fixed size, no chunking, right? So you have options over here as well. Let's do fixed size and then I can select maybe want to do 512. And typically your overlap should be around 10 to 20%. That's what our recommendation is. So, since we only support right now, Titan embeddings model, so that's there. And then if you say that, you know what, I don't want to create a vector DB, I want you to create a vector DB on my behalf because you, we attended that talk and it said fully managed RA, right?

So that's where we have that option that you can select this quick create, which will automatically create a vector DB and it will create an OpenSearch cluster so you can choose that. But you know what, again, we have to give you options. What if you have an existing uh vector database or an index that you want to populate? So you can literally if you have an index in OpenSearch Server, Pinecone or Redis Enterprise Cloud, you literally provide the details about those and then go next. That's it.

And you might have heard uh in the announcement today that we will be supporting new vector database types uh uh soon including Aurora and MongoDB with likely more better days, bad database options uh coming. Yeah. So stay tuned. Ok. And then you review your setup and click on Create Knowledge Base, right?

So because we want to be cognizant of the time we already have a Knowledge Base. Now, this Knowledge Base is based on the text documents. So when you have created the Knowledge Base, you will actually land up over here, you will not land up there. It's only when you go back to the Knowledge Bases where you can see the list. So once you create, you will be here.

And then most important point is once you create a Knowledge Base, you have to click on the sync button. This is very important because when we are saying we created a Knowledge Base that was good. But we have to sync. Sync is the actual thing. When you do that, it basically it will look up all your uh data that you have in S3, it will preprocess those uh documents, extract the text out of it, split it into chunking strategy that you provided and then pass it through the embeddings model, then store it in the vector DB. So that's the sync thing.

And when you have, let's say new files in your S3, you press the sync button again or call our SDK start injection job, right? So it will literally make sure that everything is in sync. So you, you need to do that. I've already done that and then you want to test it, right?

So if you have generate responses, meaning we are using generate uh Retrieve and Generate API or you can use when you undo it, then it's only retrieval. So let's start with generate responses. I first need to select my model and then I can select either Insta model or Cloud v2. We have 2.1 as well and that was also recently announced. Ok, hold on. Yeah, it was too much zoomed out and then you select the model and then you can ask the question.

Now, since my documents are based on tax data, my Knowledge Base has all the tax related dataset, then I can ask a question. Um so what I'm asking is if I work remotely, which state do I owe taxes? Right? I mean, I just selected that because a lot of us, you know, during the pandemic, we were working from home. So I was like, why not ask something that like that? And I know a lot of us are now back to office, which is, which is also cool"

Ok, now you click on the "Show Result Details" and notice some important things.

First of all, it is giving me the response very quickly.

Second, I can literally see the source attribution right on my screen, right away.

So important points:

  • If I work remotely, but your employer is located in a particular state, you may owe income taxes to that state.

  • I'm not going to read the entire thing.

  • So and then if you have to look at the source that the model used, it is basically right over here, right?

  • So it provides you and if there are multiple sources, you will see multiple tabs over here and I'll show you in a moment you can literally go to the location of that document as well.

So this was about how we were doing gene uh retrieve and generate.

What if I just want to retrieve? Right. Let me ask the same question because it will just make, make it easier for us to go through. By default, it will retrieve top five documents, top five most relevant documents and then I can go "Show Details" and I'll look at it.

So I'm seeing this particular chunk from this p17 pdf and then another chunk from another pdf and another chunk from another pdf.

And this retrieve API also gives you a score. So based on which vector db you are using and which score you are using for that vector db, for example, if you are using cosine similarity, so the score will be based on that. If you are using Euclidean distance, it will be based on that, right?

So the score option is also there. So this was about how you can you know use it on the console.

And we also have another demo where we, where I will show you the APIs and how we can do the integration with LANchain. But the important point is if I have to build retriever augmented generation applications with knowledge bases for Amazon Bedrock and use those APIs you can literally do that end to end using the features that we just talked about.

But what if I want to, I have some dynamic responses, right? Sorry. What if I have some dynamic information that I need to fetch in addition to what I have in my knowledge bases, maybe I have a knowledge base which has a lot of order details. But I also want to call some order API which gives me the status of my existing order which is in transit, right? Or do multiple things around that, right?

So what do I need to do if I want to integrate the knowledge base with, let's say agents or other ecosystem of Amazon Bedrock?

So Ruha over to you, let's please walk us through it.

Great. If we just go back to the slides, please...

[Ruha's response]

...And now we'd like to show you how you can also use open source generative AI frameworks like LANchain to build knowledge bases. And I'll have Manny walk us through this through another demo.

Yes. And for that, I'll be sharing my screen.

So in this particular demo, I'll be using the APIs because a lot of us here might love the APIs and the SDK experience in addition to the console experience as well. And also LANchain provides you with a lot of wrappers which are prebuilt. And why do we need to reinvent the wheel when we have something out there? And we want to reuse it, but we want to reuse it with the latest features that we just showed you, right?

So let me take you to this quick journey. First of all, make sure that you have the latest B03 versions and you have the latest LANin version. So it has to be equal to or greater than the versions that I'm showing over here for LANchain. It's 0.0.0.342 for B03, it's 1.33.0.2 right? So make sure you have equal to or greater than these versions.

Now, the first thing that you need to do is basically provide setup, right? So as with any AWS service, when you are wanting to use the API, you first have to create the client.

So for Bedrock and in this particular case, we need two clients:

  1. The Bedrock runtime client which helps us call the invoke model.

  2. Bedrock agent runtime using which we will call the retrieve API the knowledge bases for Amazon Bedrock.

So this is what we are doing it over here providing some model parameters because remember this is retrieve API and you can connect to any model provided by Amazon Bedrock. So that's what we are doing.

You can you provide the model parameters, you select your model now the knowledge uh the actual retriever.

So if you are planning to use the retrieve API with LANchain, you will need to first initialize a retriever object. So you have to import Amazon Knowledge Base Retriever from LANchain and then use it.

So what do I need to pass? I need to pass:

  • The number of results, the relevant documents that I want, right? So that's what I'm providing it over here.

  • And the knowledge base ID? Because how will this know which knowledge base to get information from? Super important, right?

And let me show you how you will get the knowledge base ID. Because if you are using SDK, then you will automatically get it as a response from the API and you can leverage it. If you are using it from the console, then you click on the knowledge base and that's where you get the knowledge base ID. So you can literally copy it and I'm using the same knowledge base over here as well. So the quick thing I just wanted to point that out, right?

So now you have the relevant documents. Now, what we are going to do is if I'm building a Q&A application with LANchain provides you with the retriever QA chain. And all I need to do is I've already declared the large language model. I've already declared my retriever. Let's use the retrieval QA chain, pass everything together and then keep asking questions, right?

Let's move to that part now. And this is just showing that, you know, if you just want to retrieve the documents, get relevant documents, you can do that. But if you are integrating it with retriever QA chain, you don't need to do that. To be honest, all you need is this retriever object.

So let's take a look how we integrate. Ok. So that's where we have the retrieval QA chain. Now I provide my language model which will give me the response, then I provide my retriever object which will give the relevant documents and I also provide the prompt.

So now I have flexibility that I can provide my own prompt, I can provide my own instructions and this retriever QA chain will automatically augment the relevant documents with my prompt.

And just so that you are aware, I just wanted to show you the prompt template as well. So you can provide specific instructions and model specific prompting. So it's very important, you can literally say the model only provide information based on the documents, right? And based on the relevant documents.

So based on your use case you can provide instructions. And then once you have integrated it with your retrieval QA chain, you literally provide the query to this QA object that you have created and it will keep giving you answers.

So you don't have to initialize it over and over again, you can literally ask multiple queries, get the answers, multiple queries get the answers. And now you have a running Q&A application with just three things:

  1. Initializing a model
  2. Initializing the Amazon Knowledge Based Retriever with LANchain
  3. And Retriever QA chain, passing everything together

And we have the application ready, right? So, and you can use the same pattern if you want to build a contextual based chatbot with the conversational chains that LANin provides, right?

So do explore and if you are interested in looking through the same code, we have it on github, we'll share the resources with you.

So Ruha can you just do a recap for us?

Yeah, absolutely. Thanks Benny.

And if you go back to the slides...

[Ruha's response]

So we just want to say thank you for attending. I hope this was useful. Our LinkedIn handles are here. We would love to hear from you and see how you're using Knowledge Bases and what feedback you have.

And don't forget to take the survey in your app so that Man and I can get invited again next year to give a talk at re:Invent. And really appreciate you coming today. Thank you and have a great conference.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值