Data patterns for generative AI applications

最新推荐文章于 2024-09-17 06:49:23 发布

李白的朋友王维

最新推荐文章于 2024-09-17 06:49:23 发布

阅读量135

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/135120227

版权

Hello and welcome.

By the show of hands, how many of you are building gen AI applications on AWS or planning to build gen AI applications on AWS? I, I see quite a few hands there. There are a few hands that are not up. From the messages that we're getting, it feels like all of us have to really embrace gen AI and really leverage, you know, applications we talk to customers. It's pretty evident that that's gonna be a part of every single application that we build. We should think about gen AI.

By the way, my name is Siva Rag Pai. I lead the worldwide data specialist solutions architecture team. I'll be co-presenting this session with Vlad Blast Channel who is a tech leader for databases.

Let's get started.

Over the last year, our team works with customers around the world in helping them migrate and modernize database applications onto AWS. Over the last year, we've been working with a lot of customers on the gen AI use cases.

Today what we're gonna present is the culmination of our experiences, helping customers build gen AI applications. What we really learned by going through this experience is that the data patterns really help you to cut through the noise and really focus on things that potentially could stay despite everything changing with AI every single day.

So with that, let's get started in this session. You will learn about design patterns of using AWS services to implement your gen AI use cases. We're gonna talk about patterns and practices. We're gonna show you a few little code segments and how you use gen AI using libraries, but this is really a patterns and practices talk.

Thank you for being here.

In terms of the agenda, we'll first think about the role of your data in gen AI applications - what is it, how can you empower your gen AI applications with your data, and how do you use your own data and context to build AI applications? You know, how do you supply this data to your gen AI applications? We'll go through three patterns of how to feed your data into your gen AI system or your large language model.

Then we'll dive deeper into data architecture for context engineering using what we call a retrieval augmentation of generation or RAG.

Then we will go through the end-to-end patterns of building gen AI applications - the architecture if you will. At the end, we'll discuss how should your data strategy change for gen AI - what are the new things that you need to take into consideration as you build applications or as you prepare to build applications for gen AI.

At this stage, it's safe to say that everyone recognizes the power of gen AI. But when most people think about building generative applications, they often think about what large language model or foundation model they want to use, but that's just the tip of the iceberg. As you dive deeper, you realize that a lot of the differentiated experiences are going to come from your data, both your structured data and unstructured data.

In order for an AI application, it's nothing but another application - it needs to store state like conversational history for example. And then the various state in a data store. So you're going to be using your SQL stores and NoSQL stores and potentially document stores. In some cases, customers are using graph data stores to not only store their state but also build microservices to materialize the data for the gen AI application. We will show you in the patterns where these services come in.

You also need your data lake and data warehouses and analytics systems to be able to take all the data, process it, and then store that in your data lake. So you're going to use all your maybe stream processing, your batch processing, your end-to-end interactive analytics to be able to materialize that data as well.

So the data for your data lakes are gonna get fed from multiple line of business applications and data stores if you will. So your ETL and processing pipelines still play a huge role in building end-to-end data applications. Data integration things like ETL are going to play a critical role. Things like your stream processing - how do you take up to the minute information and feed it into your gen AI application? You're going to do some stream processing for doing that.

The other key piece of your data strategy is about data cataloging. The large language models are getting so powerful that they can translate your question and look in your catalogs to actually materialize the right, you know, find the right source and also actually formulate a SQL query or whatever the language that might be. If it's a graph database, you may use a different language to materialize the data. Since that is going to be possible, it's super important to think of data cataloging - cataloging your data.

When I think of data catalog, I think of two things - one is the engineering catalog or technical catalog, you know this gets materialized in the form of a Hive catalog or if you're using Glue catalog, as well as the business catalog. The business catalog really maps your business constructs to the various data that is available in the technical catalogs. So those things are going to play a critical role.

In order to actually feed data into your LLM's or fine tune your LLM's, you're gonna be able to cleanse the data. So you need data pipelines to be able to cleanse the data and do data quality. Data quality is going to play a critical role in AI.

And then obviously privacy and access control is super important because as you're building gen AI applications or your users are using gen AI applications, they may feed all kinds of data that you don't want to feed into these applications. So your access control and data governance and data protection - you don't want to take your highly confidential data and hand it off to gen AI applications, especially if you don't have control over where that materializes and where that goes. So your data governance is going to play a critical role.

That is what we see with customers - all of a sudden a year ago, everybody started thinking about foundation models. What foundation model should I use? Now most people are thinking about how do you materialize, how do you use your data to actually specialize your experiences? How do you build your special experiences with your own data?

Really having a modern data architecture that allows you to materialize your data and prepare your data to power your gen AI applications is going to be super critical as we move forward. Everything else, everything is changing rapidly with AI, but nobody is gonna do the data piece for you. You have to do this yourself. So I think the focus is shifting pretty quickly to how do you prepare your data, how do you catalog your data, how do you govern your data, and how do you feed this into your gen AI applications?

Essentially your data is the differentiator. What do we mean by that? If you pick up large language models, last time I looked around there are literally hundreds of them, if not thousands of large language models you can pick. And most of these models are trained with the data that's available on the internet. So if you're going to build an application with this model, it's going to provide you with generic experiences. It's not going to be customized with your data and knowledge if you will.

So essentially a gen AI application is not going to necessarily provide you a differentiated experience. What really matters is gen AI that knows your business and your customers. So this leads to how do you provide your data to gen AI applications? Let's dive deep into this.

For this, I'm gonna go through an architecture of a classic generative AI application you have. If you think about a chatbot for example, you have a user, an end user that's using the chatbot. The user presents user input to your gen AI application and the application takes that input and converts it into a large language model prompt and provides it to the LLM, the large language model.

Then the LLM has an inference which comes back as a prompt response and the application bundles this response back to the end user.

Now, if you imagine we're working for a company that's selling car insurance, and I'm building an application on a chatbot that's going to enable my customers as a developer working for this insurance company building a chatbot.

If I were to use a classic LLM, let's say we're gonna use LLM2 which is a large language model from Meta. If I'm going to ask the question "I'd like to buy car insurance", I don't know how many of you have tried it, I tried this a few days ago. It came back with some answers - I said I'm going, I would like to buy car insurance and it came back and said "There are various types of car insurance, there is collision insurance, comprehensive insurance, personal injury protection insurance" and so on. And by the way, the cost of this insurance is going to be proportional to what is the make and model of your car, how old is this car, what is your credit rating potentially, where do you live etc.

Well that's the extent of the experience that this generic application can give. Now anybody else, if you have two companies selling insurance, if you're using a generic model, that is all you're going to get through.

But then how do you differentiate this experience with custom data? Let's explore that piece - how do you differentiate your gen AI with your data?

Let's say you get the user input at that point. Instead of presenting it to the large language model, let's ask a few questions - is there any specific instructions for the large language model? In other words, how do you want the large language model to behave? You could tell the large language model "Take the persona of an insurance agent answering the customer's question". And then you could say "Your responses have to be really careful and you confirm each one of your responses".

Basically what you're saying is you're taking the large language model, you're telling it to assume a persona. And then also, is there some history with this user and the company? Is this user input building on some prior conversational history and state? Maybe you want to present all of this to the large language model before it materializes a response. We call this conversational history and state, or situational context because this context is derived from the situation presented for the question.

In addition to that, you also can ask - who is the user, what do we know about them from our customer records? Typically every company has a user database or customer 360 system. So this data is persisted in your data stores or microservices. Data from your purpose-built data stores and APIs can be fed into the request. We're going to call this situational context.

In addition to all these requests, when the user says "I want to buy car insurance", as an insurance company you likely have documents on types of insurance. You can't just do a string search for "car insurance" in these documents. You have to infer that car insurance may mean comprehensive insurance or collision insurance. This is what we call semantic context - any meaningfully relevant data that would help the large language model answer the question.

Well, that is gonna be super important, you know, for, for, for, for contextualizing your request. So packaging all of this and uh providing it to the to the to the to the app is gonna be paramount.

And uh so basically from the user request, you can infer all of these behavioral context, you can set a behavioral context, you can set a send, you can set your situational context. You can also set a semantic context and retrieve all of this data and feed into this.

Typically, this thing is called prompt engineering, which is, you know, you take the user request, you you you aggregate all of these other contacts and then you provide it, you provide it to the large language model. When you do that, this is the way it's gonna look like. So you take the engineered prompt and actually give this engineered prompt to the large language model.

So then the large language model is able to respond to you, you know, taking into context all of the data that you had, you had provided the large language model. Um now this thing is called in context learning uh complemented with retrieval argument to generation or rag. So you may see rag in many places. That's what this one really means, you know, in addition to the situational context, can you also provide a semantic context and usually the semantic context is built via pipeline.

So in other words, you take all of these documents and use a vector store, you know, use um use a large language model to compute the embedding and store that will go into, go into the details, blood will be going into the details of that later in the presentation.

So you also have another way of providing the situational context to your to your generative a application instead of providing the situational context to the generative a application as part of your prompt. What you could do also, you could take your, you know, question answer pairs, you know the questions that people asked and answers that you gave if you persisted all of that, and you can clean that up and actually take uh a model such as lama two or um or any other model that you prefer. And then you can fine tune the model using this contextual u using, using the semantic context. So that's called fine tuning.

So this is a way of actually building your context or your application into the large language model. So when you ask a question, the large language model takes into consideration all the questions and answers that were provided before and it really contextualizes that answer, you know, take, taking all this into consideration or if that's not, that's, that's, that's the second way.

The third way is really actually building your own custom model is really the third way. In other words, you don't start off from any other model. Um you take all the knowledge that you have in your organization and then you build a custom model for yourself. That's the third method.

So in essence, um you know, if you again, in addition to that, you could also take some contextual data. So you have your semantic context built into your fine tune model or a custom model. In addition to that, you could still do take the conversational history and take your customer 360 data if you will and feed that into the prompt and engineer a prompt with that and then you send the prompt to that. So this is sort of a, you know, a hybrid scenario here.

So those are some of the ways if you will of providing your data to your generative a application or the large language model to summarize. So well, just to kind of set that. So this this this thing is really called fine tuning or building your own large language model um with um yeah, with, with the context learning in context learning.

So well as in essence, how do you provide your data to generative a application? Like we looked at three ways of doing that, you know, contextual engineering with r a with foundation models um using and then the second way is really fine tuning a foundation model with your own cleansed and labeled data. This is the question answer pair that you actually feed into the model and then actually build, build a fine tuned model for yourself. We'll go through how this happens in a pipeline if you will.

Uh and then the third way is really building your model. Um you know, by training the foundation model using your curated and specialized data. So well, now, which one of this should you use?

Um i try to use this mental model um for thinking about how to think about these three patterns. Um to me context engineering with rag is the easiest one to do. In other words, you don't have to be a machine learning expert. You don't need to know how to do fine tune a model or you don't need to know how to build a model for yourself. You could take a general purpose foundation model and all you need to know is to how to access your data systems and build an engineered prompt.

You know, you can, you can also access your various search stores and other vector stores to build semantic context and you can get the situational context from your customer 360 as well as from the conversational history. And then you can also provide the behavioral context and that's all you need. You need a foundation model and your data and that's how, that's why your data is the differentiator in this, in this situation.

And then, well, if that doesn't really work or do well, you could potentially go fine tune your own model. You can take a general purpose model and then take all your rich data that you have in terms of key value pairs that you've materialized that you have in your organization and then you can basically build your own model or fine tune your own model if that doesn't work, like if you go to the third step, right, which is really, really building a custom model with your own data. Obviously, you know, the one on the top training your own model with purpose built, you know, ll ms are building a purpose built. Alarm is going to be pretty, you know, um complex, right? But those are the three different patterns if you will, that our customers are looking at as to how to incorporate that data into their jane system if you will.

So, uh with that, uh let's dive deeper into the context engineering with drag with foundation models because chances are, that's where you're going to start.

So to kind of realize this, you know, we just created a small, you know, i had a really, you know, a recorded demo playing, but it's, i just wanted to use this through slides because sometimes videos don't play here.

Um so imagine the scenario where you're building a chatbot for an insurance, you know, hypothetical insurance company. Uh in this case, you know, i, you know, the bob is a user that's logging on to this chatbot or this website and then eventually is into this application. And bob is asking this question, i'm interested in getting a car insurance and then, and then your gen a i application says no, good morning, bob.

Uh i'd be happy to help you with car insurance. I have your car details on record. You drive a 2021 bmw x three and live in, you know, florida in some address. Is that correct? Right. If you think about this, right? Just because bob said i want to buy a car insurance, this system is able to actually present a more meaningful, you know, response back to the user with your, you know, with the with with the contextual data there.

Um well, how does how does this happen under the covers? Right. And then anyway, this this conversation goes on, we're not going to show the whole thing eventually, you know, this is like a baked cake at the end, right? Eventually the system is going to present you with, here are the codes i found for you, you know, from the xyz company, here's how much it's gonna cost you and then another company, how much it's gonna cost you? Ok. This is a very differentiated experience uh that leverages your data.

Uh if you go onto the cover, like, you know, this is the user prompt, right? And then you're prompt engineering under the covers in which you're telling the model. By the way, here is the behavioral context here is how i would like you to behave. Llm.

Um you know, we're, and we're telling the llm, you're a conversational agent. For, you know, a hypothetical company called horizon guard insurance marketplace. Responses to the questions are written to be helpful and inform um in an unbiased manner, right? You know, don't take any biases, you know, you're putting some guard rails there, right? And ask the human to confirm a response if you make up an answer, right, by the way, get a response back if this is correct. Right.

So this is we call this like really instructions for the model. Another way of thinking about this is, you know, uh your behavioral context. How do you want the the large language model to behave and uh you're setting some guard rails here, right?

And then the second thing is like, well, your application knows how to connect, you know, your application knows who's logging into the system. It's bob, you know, bob authenticated to get, get before. So i know bob is a valid user with the valid credentials, so i can go look up in your system in your databases or your customer 360 system and get the details about the bob. That's where this data is coming from.

You know, the human is bob. In this case, you are actually putting that as part of the prompt and lives at a specific address, right in florida and owns a bmw x three, you know, 2021. And the household members include, you know, sarah, who the wife and then the son who's jake. And then so i mean this data, the llm doesn't know this data, you have to get this data from your, from, from your system.

And then you're presenting that in the semantic context as an insurance company, we have a lot of documents on you, potentially have a lot of documents on insurance, right? So if you do a semantic search for that, you're gonna pull up various documents on various types of insurances, you know, personal injury protection, comprehensive collision, et cetera from various different companies.

And the chances are, you know, and presenting that as a semantic context allows the large language model to formulate the right right answer. Now this is gonna come from your data systems if you will your um you know your search systems and your vector databases, etcetera.

So obviously they end the question, you know, in the green, the user prompt and the question. So essentially that's, you know, this is really the whole idea of constructing the end to end. This is what this is what really happens under the covers.

Uh now as a developer, you're not going to be building everything yourself, right? You're going to basically use some frameworks. And then you, in this case, lang chain is a pretty famous, you know, popular framework for building uh generative a application.

What we have done here is mapped the various constructs that we have gone through instructions for the model to the lang chain, uh primitive if you will, in the case of instructions for the model, you could use the lag chain prompt module and instantiate a prompt template class and then set all these, you know, set your instructions for the model or the situational context.

The semantic context can be set using the memory module of of of lang chain. And then if you're persisting this memory, this is the conversation that you have with your user in a dynamo db database. For example, you could potentially use the dynamo db chat messaging history. Or if you're putting this in in reddit, there's another class for reddit that you could use, for example, right?

Similarly, the semantic context is implemented as retrievers and vector stores. You know, if you have a search store lang chain has implemented various retrievers where you, you know instantiate a class. In this case, kendra is an emerald powered search service from amazon. So you could potentially use the kendra retriever to get that, you know, to instantiate or get the semantic context.

And similarly, if you're going to be using opensearch, if you will, you could use the opensearch uh you know uh vector search uh class.

And uh some of us, some of you may be actually using bedrock, you know, bedrock is a service for, you know, a act selecting various large language models. In addition to, you know, actually using various services of, of, of, of block of uh of uh of bedrock. The instructions for the model can be provided to the agent interfaces in terms of instructions for the model. There's also a situational context that can be provided through the action groups and your semantic context can be populated to the knowledge base.

So I'm gonna call vlog now to actually dive deeper into the, into an architectures

Hi. Hi, everyone. Pleasure to be here.

Um so if you're looking at this diagram, it might look a little bit daunting at first, right? There's a lot of things going on here, but don't worry, we're actually gonna go, you know, piece by piece through this architecture. But this is basically how the whole thing looks, you know, end to end now.

You'll notice two things here, right? One of them is that we've broken out this diagram into two pieces, right? One of these is the transactional con area. This is the stuff that happens in the front end as the user interacts with the application, right? Uh there's some different usage patterns there. And then of course, on the other side, you have the batch of streaming processes, the back end part of the application, you know, things that happen in the background to bring all of this data together for you. All right.

Now let's focus of course on rag because that's this is essentially the main piece of this pattern and the purple area there kind of highlights what components and what is where that rag pipeline fits in.

Ok. So what happens in the end user critical path sort of on that side of the diagram siv i showed you a view, but essentially the end user is going to go ask a question. We're gonna kind of pull in some templates from usually some kind of a data store. We are going to pull in the context uh uh con conversation state and history right into that context as well. We're gonna use pull in the situational context and then we're going to uh use an llm to actually derive an embedding from your question.

We're gonna use that embedding to send and run a similarity search, find documents that are similar to essentially the question that was asked, return those package them into the prompt and then finally ask the llm a que you know the actual question with the engineered prompt.

Now, it's interesting here to note that the llm that you're actually using for the embeddings and the llm that you're actually using to ask the question don't necessarily have to be the same one. Typically they actually even aren't, but you have to use consistently the same llm for embeddings for inserting the data and populating your data store as well as querying it.

Um once of course, the response comes back, we update the state and then we give the response back to the user. Now, there's a lot of things happening here and there's a lot of data storage potentially involved here.

Um first of all, of course, we have the prom templates, you can store them in s3 as objects, you can store them in a database, right? Um especially if you have rule based, you might use multiple templates for your application depending on different rules and you can use, you can store them in your code as well as version control uh in your version control system.

Um for conversation state and history. This is where no sql database is shined, right? Uh you want to document our key value model. For those we have dynamo db, we have uh document db for document models. And you can also use a amazon memory db there uh if you need in memory read lacs. So these are perfect data stores for uh conversation state.

Now, your situational context comes from a variety of databases. Typically it's your relational databases that you're using for the lines of business. You might use no sql databases. You might even use graph databases if you have complex customer 360 scenarios where you have a lot of relationships between your aspect. So those are some of the data stores you can use there.

And finally, for semantic context, sing use, let's say bedrock and you're using that for the foundational models. You can use amazon rrp with pg vector or r ds with pg vector and amazon opensearch as your vector data stores. Or if you don't even want to manage the da the vectors by yourself or even touch them at all. You can use amazon kendra, which is our ml search based uh service. So there you just basically, you just, it just indexes the documents for you and you just simply ask the questions right to retrieve the right documents, they don't have to deal with the vectors at all.

And then finally, you have the uh generative a i model. Now, if you're using bedrock, you might also be able to integrate directly with aurora and open search. So you don't even have to actually directly go find the embedding, then query the similarity search and all of that bedrock can do that for you if you use knowledge base.

So this is what happens in the end user critical state. Now, you can also of course use amazon elastic cache for caching purposes. A lot of these data sets are highly cashable, the conversation state templates, even the situational context within the course of a user session, let's say. So there's a lot of opportunities here to improve uh your user experience with caching as well.

Now, we talked about vector embeddings. I know siva mentioned them and i mentioned them as well. But what is, what is that actually? Right, for the folks i'm familiar, right? Essentially you take your domain specific data, this could be documents about insurance policies and so on and so forth, right? You break them up into elements that have individual meaning right, independent of each other. And this process is called tokenization and you feed those elements through an llm to produce vectors, right. And vectors are strings of numbers, they can have a lot of numbers in that same array. And the idea there is that you're placing those elements in a multidimensional space in proximity to each other. So then you're reducing, the question is of retrieving similar data to a mathematical function of give me the elements in that vector space that are close to each other. In this case, you know, auto insurance and car insurance, there are two um two vectors that are really close in a very simplified vector space with two dimensions. But when you're using, you know, vector elements to have semantic meaning, to have independent meaning and use them in the context of generative, that's what we call embeddings, right? They're called, they, we're using the term embeddings for them.

Now, the question is um you know, where do you store those embeddings? And in aws, we have a couple of options for vector data stores, right? We have aurora or r ds postscript sequel with pg vector. Uh this is a natural fit for customers that have already postgrad sql installations, right? Or they are very familiar and comfortable with relational databases. Pg. vector is an open source ex extension. So uh you can bring data even from external sources into it uh really easily if you've used it elsewhere.

Uh the cool thing about this, of course, you can do joins, right. So you can do your ve or similarity search and then join that with table data that's that has metadata somewhere else in your data sets.

The other option is opensearch. So amazon opensearch service and the vector engine for opensearch serverless. This option provides you a choice of algorithms and indexing capabilities as well as you know, it's suitable for folks that have some familiarity with pouring no sql databases and dealing with those types of usage patterns. Also, if you use opensearch, you have hybrid search capabilities, right? You can combine semantic search with keyword search, so that forms a powerful combination as well.

Now we talked about the options themselves and i want to highlight a use case here. So cs disco is a customer of ours. They are a leader in uh technology solutions for the legal field and they are, they've built a chatbot named cecilia to help um attorneys essentially an uh find e discovery data and provide evidence-based answers with citations to documents based on whatever they have in their edi discovery database.

Now, cecilia uses amazon r ds for postscript sequel, right? With pg vector. They were one of the early adopters of that technology for us and they picked it because it just worked with their existing post grad sql jd bc drivers. So they didn't have to actually change much about their applications if you know they could use the pipe, those queries for similarity search through their applications just like any other query they had.

Um another cool thing about this use case is that the collaboration that we had with them um to help them build this vector data store solution. Uh led to um actually two improvements to ppg vector itself that the amazon r ds team submitted to the open source community and got accepted. So not only did we did the customer manage to build their tool, but they al this collaboration also helped to improve the community overall.

So of course, you probably still have a question in your mind is what data store to pick, right? I'm sorry to disappoint you, but we're not gonna tell you exactly which database to use today. But we're gonna give you a framework to understand and figure out which one is the right one.

From a from that perspective, one of the criteria that you should consider is familiarity in the fullness of time, we will add vector capabilities to all our databases. So pick the solution that you, you know the data solution that works best for you. That is most familiar to you because feed to market matters, right? And right now, what we're seeing is the majority of uh rag solutions that we're seeing customers implement, don't really operate at a scale or at the complexity where the very technical differences between the different data stores matter. A lot but getting that product to the market fast matters more. And that means reusing as much as possible, your skills ease of implementation is neither criteria i like to kind of break it down into two pieces.

One is abstractions. How well does it abs that database engine abstract away some of the complexities of dealing with the vectors and integrations, you'll need to integrate that piece with other software. So uh other services potentially. So, you know, figuring out what integrations matter for you and which ones exist in which data store is important to consider scalability.

Like with any other database is also a critical decision factor. There's two dimensions here that are specific to vectors. One is vector dimensions, how many of those are supported? So those arrays and four vectors, how many numbers can they have in them? Uh what's realistic? There is a inflection point up to a certain number. Of course, you know, the the richer you get that uh multidimensional space, but at some point, you get into a curse of dimensionality where essentially you're not gaining more. It's actually de detracting from the um um you know, from the effectiveness of those vectors. And of course, how many, what the number of embeddings that that data store can support?

Um this of course, depends on your workload and what sort of uh embeddings your uh you know, you're storing that data store. The other aspect of that is performance. The other criteria now performance the same rules apply with any other database latency throughput and so on. But there are two specific criteria here that are that i want to call out.

One is queries per second. Most of you should be familiar with that. If you work with databases, you know how many queries can we actually realistically handle on a unit of time, how many of those similarity searches? But the o the other one might be a little bit new to us database folks. And that's the recall rate.

When you think about running a similarity search right? As a mathematical function, you can do an exact similarity search in certain data stores like pg vector for example. But when you're doing an exact one, that's gonna guarantee that you're getting the most similar element to whatever you submitted as the uh reference vector. Now doing that requires a scan of your entire data set, right? It's not practical at scale, it's not performing that scale

So we the community has developed a lot of algorithms that are approximate in nature, right. So approximate known nearest neighbors algorithms. And with these algorithms, you're gonna get the right answer or the closest nearest neighbor most of the time, right? But there will be a certain cases where you might not get the precise closest nearest neighbor, right?

And we're using indexing and d these algorithms to make those queries performant and efficient. Now, the re the number of those cases where you get the right answer over the volume of your query, that's your recall rate. So, you know, we t we typically recommend customers have a use recall rates of over 90%. And the, you know, the industry seems to be consolidating towards recall rate between the 95 and 99% right.

So that means that 99% is that uh 1% of your queries can potentially, you know, not have the nearest neighbors as a result. And then finally, of course, flexibility, right. Can you choose different indexing algorithms? Can you choose this different query algorithms? Right? Think in these terms of which database to pick.

Now this really busy slide is essentially over those dimensions. How do we compare these two engines that we talked about uh pg vector and opensearch? Right? And you'll notice pretty quickly, I'm not gonna go through all of these aspects individually, but you'll notice pretty quickly that familiarity matters more. If you're looking at scalability, they're pretty close to each other, all of them can support over a billion vectors, all of them can support 16,000 dimensions as a maximum, right?

From performance perspective, all of them give you the ability to fix the recall rate to something that is acceptable to you. So you can choose one right. So again, familiarity will matter more. These nuances around technical capabilities are going to be ultimately very small and you know, customers today might not necessarily even hit those types of considerations.

Now, glaringly missing from, you know, our criteria before was cost and i took, i lifted here at the end because cost is essentially a tiebreaker. Given that you've made an evaluation and figure out which one of your data stores would support your vector needs best. You're gonna pick the one that's cheapest. Of course. Right. That's a no brainer, but this is a little bit oversimplified in reality, customers do make tradeoffs, right?

You have a budget, you have to fix that solution within a confined budget of your data store. That means that you're gonna make tradeoffs, you're making tradeoffs of courses around performance scalability features perspective, right? So take recall rate. For example, you might be ok with a lower recall rate if your um you know, if you can handle more throughput or more latency for your applications, uh i'm picking on the recall rate, but of course, there's others that are um matter just as much.

So we talked about what happens in the front side of the application. Let's look at the what happens behind the scenes right in the back end of this type of an architecture. Um that data has to come from somewhere, your domain specific data and it comes from various sources, unstructured or structured data. You need to ingest that data, right?

Services like aws, glue amazon, ms k um manage service for kfc and amazon kinesis handle streaming, ingestion of data. And then that data needs to land somewhere that data lands usually in your data warehouse or in your data lake. If you use a data warehouse, amazon red shift would be your solution there. If you use a data lake approach, of course, your data will land in s3.

Now, the cool thing about landing all that data in s3 is that all of these data and ml services are designed to integrate with s3. So you know that s3 becomes that centralization point where you can kind of put the data and pull it from different services. Of course, you have to process your data uh first even to gonna get it into your data. But also for vector organization purposes, right?

Use amazon emr, you use glue, you might use amazon uh manage service for flink, right? These services are used to process your data. And what that does is for vector organization perspective, that means you're actually going to query, enable them to get the embeddings and then you're gonna store those uh into your vector data store.

Now there's batch aspects to this, right? When you're loading your data initially into that data store and there's of course streaming aspects you want to keep up with the changes. So as data changes in your source systems, you want those change, you know, you want that data trickled over vectorized and stored and updated in your vector stores.

There's also this is where also governance fits in right. Of course, sa mentioned you need to kind of catalog this data, you need to think about permissions privacy quality controls, right? Uh two services here are designed to complement each other. This is datazone that provides you the cataloging capabilities at a business level, right? And then it co it gets complemented with um aws lake formation which is provides access controls and cataloging capabilities at the storage at the at the data layer.

So we talked about um you know, context engineering with rags, what happens with fine tuning and training. Now, you know, the architecture for fine tuning and training are actually very similar to each other. Um it's just the type of data that you're providing to the model and how you're processing that data that might be different. But architecturally, it's, you know, you still need to ingest the data, you still need to store it somewhere, right? The sources of your data are going to be more or less the same.

Now, as we highlighted earlier, you still might need to do some context engineering here. You still want situational context even in this case. And now because we're training and fine tuning, there's another option available to us which is um amazon sage maker, which is, you know, is a service that can help you build great uh train and operate machine learning models, right? And specifically there's a capability there called amazon sage maker jumpstart, which allows you to access prebuilt uh open source large language models.

Now i mentioned bedrock in the context of fine tuning. Bedrock can support fine tuning as well and actually it makes it really, really easy. You can you basically start with your data, whether it's in a data lake or data warehouse and you need to label it right. Now. In most cases, you might need some human intervention for labeling. In some other cases, it might be very easy if you are. If you're running a cu customer support system with tickets, all the tickets are the problems and the resolutions are the answers, right? So there's your label, data set of q and a pairs, you don't have to do a lot of work. But in other cases, of course, it can be much more complex. You need humans to kind of formulate those.

But even so, even though that sounds daunting, you still can, can make a difference with even a couple of hundreds of those question and answer pairs. So once you have the labeled data, you store it in js online format in an s3 bucket. And then you take bedrock, choose whichever foundation model you know, you want to customize, tell it, tell bedrock to gonna grab that data out of s3 out of that bucket that those q and a pairs. And bedrock is going to do the fine tuning for you. And the end result is going to be a customized model that then you can use just like any of the other foundational models available for bedrock.

So it makes it really, really easy. You can do a b testing with different models. You can do, um, you can evolve your models over time relatively easily with bedrock and you don't need a lot of ml skills.

So, one of the interesting things about these three patterns is that they're actually very complementary to each other. Um, you can train your own models, of course, you don't want to do that very frequently. Most customers don't want to do that at all, right. It's expensive and resource intensive, but you can then also choose to find junior models and integrate that piece into your architecture. And that allows you to kind of pull in an updated context up to a certain point in time. And then on top of that, you can still also do rag. So you know, to respond to changes of your data in real time, you can do all three of these and in reality, customers actually go the other way around, right?

You start with a poc with r a within context learning and then you might decide, hey, maybe we should find fine tune these models bring in some of that context, increase some of the performance potentially, let you know, then you're gonna start doing some fine tuning, maybe occasionally once a quarter, once uh you know, depending on your needs. And then in some cases, you might actually go as far as training your own models.

So now that we kind of understand these data patterns and how they relate to each other, how you can kind of use them as building blocks and what sort of capabilities you need for your data. What you know, i wanna kind of go back to what siva is saying at the beginning around your data overall data strategy. And what happens with that?

Um the reality is your data strategy has to evolve too and it evolves on a number of different dimensions, right? One of them is you now have to kind of think about structuring both, storing both structured and unstructured data. What did you do with unstructured data before until today? Until you know, thinking about gen a i if you're like most customers that i work with, right? You took that you ingested that data, you may be extracted some metadata from it and you stuck it somewhere, right? Stored it, right? But it's not really part of your ongoing operational processes or um you know governance processes, i sh i should say right, you don't use it on a regular basis now with generative a i that raw data and as raw format as possible is actually valuable because that gives you the content nuance for meaning for context.

So you want of course um extend your data strategy processes around that. And of course, you also want to store data both in the native and potentially the vector formats. Um aws provides a lot of capabilities in this space that can help you store and manage all of that data. And then you know, you need to think about uni unifying this data at the end of the day, whether you do rag, whether you find you or whether you train your own models, you need a unified set of data, right?

So in one way, you have to kind of think in terms of how you can unify your desperate data sources to provi provide a single view there. And you of course need capabilities to pre prepare data for model training, fine tuning vector. And again, aws has a lot of capabilities in this space from etl tools to the zero etl integrations that we're providing nowadays, for example, between aurora and uh red shift uh to allow you to move the data as easily as possible.

And finally, of course governance expand security controls and compliance um to these new use cases, right? You need, you need to cover prompts and completions. You need to think about data quality for labeling responsible a i considerations, all of that matters. Think about that interaction that cia had earlier, right? Imagine that it's an enterprise use case. Now you have more personas that have to a that have access to that data. In the past. You used applications might access your data, data scientists, data analysts and so on, right? But now you have pretty much every person in the organization asking questions of your, of your, of the llm trying to pull that data. So there's a lot more that goes into access control security and thinking about that.

And also think about the flow that sa se showed at the beginning, the user asks a question well, is that user allowed to ask that question? What is the meaning of that question? Are they allowed to kind of touch that context as you're pulling semantic context, how much of that semantic context is that you are allowed to see or not, right? All of that matters. And then as the llm brings the response back, are they allowed to see the data that the llm produced in that response? Right. So there are a lot more areas where you need to introduce access controls, right?

So that that is also another consideration where your data strategy actually has to evolve, right, to account for all those pieces. Uh and again, aws has a lot of capabilities in this space.

Now, I know we covered a lot of our aws services in the context of all architectures, but you can actually use third party solutions or partner solutions in some of those components as well. Of course, we uh showcased our own capabilities in those slides, but you know, you're not necessarily limited to that. But now, of course, if you wanna know more about um these technologies and capabilities and dive deeper into how to actually build um data driven gen a i applications. There are some suggestions of sessions, right? Uh we recommend you attend and with that on behalf of siva and myself, we thank you very much for attending our session today.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫