Building generative AI–enriched applications with AWS & MongoDB Atlas

alrighty. hello everybody. um my name is ben flast. uh and this is seth payne. we're from the product team at mango to b and we're, we're excited to be here to talk a bit about building generative a i and rich applications with aws and mango b atlas.

and this is really going to be focused on atlas vector search and a new capability inside of the platform called search notes.

so the agenda for today is really going to start with vector search and we're going to talk about what are vectors, what is vector search as kind of a core primitive. and then we're gonna go into use cases and integrations uh and then follow it up with just some unique requirements of vector search workloads, right?

and then i'm gonna transition it over to my colleague and he's going to talk to us a bit about kind of the challenges of scaling vector search workloads. and then this new primitive inside of the platform called uh search notes uh and then finish it up with just kind of going deep on how search nodes really kind of release kind of the the power and flexibility of the mango to be document model to power vector search workloads in a really expressive way that is both scalable uh and easy to use.

so to start what are vectors and and what is vector search. so vectors for, for those of you not familiar are numeric representations of data and related context. and what do i mean by that? i mean that a sentence or a text snippet like a quick brown fox could be represented by a high dimensional vector. and in this case, you know, the data would be the words right? quick brown fox, the spaces maybe, but the related context would be kind of the the meaning or the semantics behind that text.

and so for instance, quick is not, it's more akin to fast brown is a color, it's not yellow but it's close, you know, a fox is an animal. this is kind of the related context that you are now able to kind of embed inside of these, you know high dimensional vectors. obviously, this is a super powerful concept representing this kind of semantic meaning inside of a more structured, you know, piece of data.

and so kind of the the first question you probably have is how do you go about getting these vectors? and the way that you do that is you take data, whether it be text or audio or video, any type of data that has an embedding model trained on it can be put through set embedding model and turned into a vector. and that's, that's pretty much it.

what's really exciting recently is that there are all these off the shelf and general purpose embedding models that can be used to embed data and create vectors based on your kind of private data. and that's what's really exciting these days.

what's unique about these vectors that you create is that vectors produced on similar data will be near one another in this high dimensional space that they exist in. and what do i mean by this if you kind of project it down into, you know, the two dimensional space, which is generally speaking, not what we're talking about here with vectors, right? we're talking about high dimensional vectors. but if you project it down into two dimensions, you might have a representation of the word man, right, which would be one vector and that would be not too far away from the representation of a, you know, word woman, right? because they are semantically similar or have some relationships.

and so this is kind of a really powerful concept but to push on it a bit further, what's also unique about this situation is that these relations between vectors change based on how you embed that data and the distance function that you use to calculate what is similar or what is near. and so in one scenario, you might find with points like man woman, king and queen, that man and king are more similar and woman and queen are more similar with a certain embedding model and a certain distance function used. but on the other hand, with a different embedding model or a different distance function, you might find that man and woman are similar uh more similar, right? and king and queen are more similar. and this is kind of how vectors end up, you know, representing their underlying data.

so all of that brings us to vector search, which is like how do we make use of these vectors, right, and these relationships that are there. and the way this is done is typically using an algorithm called k nearest neighbors. and often when we say k nearest neighbors, we mean exact. and what i mean by exact, i mean, find me the total set of nearest neighbors ordered by distance from what i'm looking for, right and scan all of them. this is helpful for finding, you know, pieces of data similar to my target.

and for exact nearest neighbor search that's going to translate to needing to scan through all of your data and compute the distance from your target vector to each one of the vectors inside of your collection. and so it's a powerful concept and it can be really useful on small sets of data, but it can start to struggle as you get into scale.

that said at the time of querying, you're going to need to define how are you measuring this distance and so alongside k nearest neighbors is the distance function that you'll choose. and right now inside of atlas vector search, which we'll get into details. on a second, we support three different similarity functions which are, you know, the primary ones being used today.

the first is euclidean, which is the distance between the ends of vectors. the second is cosine which is based on the angle between vectors. then the third is dot product which is based on the angles between vectors but also includes the vector magnitude.

now these are just you know, geometric functions and it's a lot of math. but what does it mean for an application? well euclidean, we see our customers primarily using for what we consider dense data where values matter. and what i mean by that, i mean things like image similarity where the presence of one pixel in one image and that same pixel in another image could be highly related to those images being the same.

the other is cosine which we see as being good for sparse data where orientation is more important. and what do i mean by that? i mean things like text concepts and themes. so you know the presence of one word within a sentence could be very important to the meaning of that entire sentence. and that's an example where cosine is is very effective.

and then finally, we have dot product which is good for sparse data where both orientation and intensity matters. and we hear our customers talking about this with respect to recommendation systems and personalization.

so we talked about k nearest neighbor, generally speaking exact. now we talked about similarity functions. the last kind of core piece of vector search knowledge that i want to kind of share is around just this concept of approximate nearest neighbor.

so with k nearest neighbor, we talked about kind of exact which is costly with approximate nearest neighbor. we have a new approach for finding similar vectors. and this is done through what we use called an hnsw graph or a hierarchical navigable small world graph, which allows you to do this in an approximate manner.

and so what you don't do is you don't scan all of the vectors in your collection to find the nearest set. instead you have a representation in an index that can be more easily traversed to find nearest neighbors and not need for you to scan all of them.

what you're ending up doing in this scenario though is trading off accuracy for performance, right? and so you're not getting all of the true nearest neighbors, you're maybe getting 90 to 99% of the true nearest neighbors. but instead of needing to scan all of your data every time you execute a query, you're able to do this very quickly, right? in millisecond latency, so that you can power real-time application experiences as opposed to exact nearest neighbor, which for large data sets just wouldn't be realistic on, you know, huge sets of data.

and so this all brings us to atlas vector search. but there is kind of one key bit that i want to cover before we go there. which is why are we talking about vector search now? and why is it related to gen a i? and the reason for this is vectors have been supported in mango db since the very beginning. and you probably think of them in terms of geodata, right, two dimensional vectors representing points and places on earth.

but what's changed is that now with these generative models that are able to embed a ton of semantic meaning inside of these high dimensional vectors, you could do so much more with them and that's really what's changed.

and so in the early two thousands, we were doing things like manual feature engineering and bag of word approaches or tf id f. and then in 2010, we got to word to vec and glove, you know, coming out of google and elsewhere. and that was followed up by contextual word embeddings, which just continued to kind of improve the performance of these embedding models.

but we really see a sea change in 2018 with the transformers model, which i'm sure you're all familiar with. and that's, you know, things like burt and gp t et cetera and that has just continued into 2020 with large transformers and into 2023 with g bt four and lama.

and so with this kind of evolution in machine learning, we're seeing just a huge amount of context being able to be embedded inside of these high dimensional vectors, which makes vector search so much more useful and so much more interesting.

and with all of that, we've covered all the background and we can get into, what is that with specter search.

so for those of you familiar with mandy b, you may be familiar with the architecture that ends up looking like this, right? which is you have a client and it's reading and writing from your database.

the way things change is that client now needs to have access to an embedding model. so that when it's writing data to the database, it creates a vet along with it or it, you know, creates a vector in general inserts it into the document and then inserts it into the database and then upon reading, it also takes its query, creates a vector on that query and then uses that to read from the database. and this is what kind of a high-level architecture of using vector search with man b now looks like

the way this is implemented is you would have documents inside of mangu b just like this where you have, you know, symbol fields, quarter fields. and then let's say you have a content field that in this case has unstructured data, but you want to be able to search on that data, right, operator, ladies and gentlemen, that's a little challenge to do. you can use, you know, tech search and other capabilities. but with without the specter search, you now have a new way of doing that.

and what you do is alongside that content field, you create a new field which is a content underscore embedding field. you can name this field whatever you want. but inside of it is an array which is your high dimensional vector. and you know, the more dimensions equals more floating point numbers. it's just the way to think about this.

once you've done that, you'll go and create an index definition. and the type for that index definition will be a vector search type index. you'll specify the fields that you want to index for this vector search index. and you'll say that the type is a vector. the path is that field inside of your documents where those vectors live, you'll indicate the number of dimensions that are, you know, inside of those vectors, which will be the same for all of the vectors in that collection, for that specific index.

and then you'll decide on the similarity function that you want to use at query time to find your nearest neighbors, right? once that's done behind the scenes, we will go and build a vector index that you'll then be able to query on.

and the way that that works is you'll be able to use our new dollar vector search stage to execute a query across this data. the first thing that you define is the index that you're going to use to query this, then you input your query vector, right, which is your high dimensional array of the thing that you're querying for. so what you're searching for and in the path, which is the field inside of your documents where those vectors live.

and then the next fields are what are really interesting inside of dollar vector search to me at least. so the next two are numb candidates in limit. so limit is how many results back do you want from the dollar vector search stage? either to go into the next stage of the aggregation pipeline or to come back to the client application. but numb candidates is how many different entry points we make into that h and sw graph, which is how you can kind of tune your performance to accuracy trade-off. how fast do you want the query to run versus how you know good. do you want the recall to be against this approximate nearest neighbor algorithm?

and so this is a powerful capability that we've exposed that allows you to kind of tune the specific use of vector search for your use case. i'll just share that anecdotally. we see that, you know, somewhere between a factor of 10 to 20 to 1 is the right trade off when you're thinking about the limit to the number of candidates to achieve high recall. and so that's just something to keep in mind if you're looking to get kind of really high recall across all of your vectors when you're searching.

and then lastly, we have a filter field and this filter field is particularly exciting to me for two reasons. one is this is a pre filter. so as we go through the vector search index, any documents that don't match this filter, we will remove and still get you back the 10 documents you're looking for, right? and so it's very efficient.

but in addition, we've also allowed you to use mangu db syntax that you're, you know, very familiar with to kind of implement this filter. and so you can use dollar gt dollar less than et cetera to go ahead and do your filtering on data as you traverse this index.

now, that's basically kind of the, the net of the dollar vector search index, sorry, the dollar vector search stage. but the one thing that i would be remiss if i didn't mention is that all of this is available inside of a mango to b aggregation stage, which is then inside of the aggregation framework.

and so in a simple example, if you wanted to do some post filtering after the dollar vector search stage, you could just add another stage on top of this a dollar match and do filtering on the result set coming out of dollar vector search. and that's just a very simple example, but there's a ton of power inside of the aggregation framework that you may or may not be familiar with to do a variety of data transforms all in pipeline.

and all of this is accessible on the back of doing dollar vector search. and so this is one of the primary kind of important composable aspects of having this capability inside of aggregation.

so to just kind of sum it up with atlas vector search, your application just gets to communicate directly with mango to b atlas data is automatically synced between the database and your vector search index. and as a developer, you just get to work with the database and the vector search via a single unified query api with one set of drivers and tools, etcetera. and all of this is fully managed behind the scenes.

so you don't need to worry about keeping anything up or available to ensure that your application you know, is there for your customers.

i would also be remiss if i didn't mention that all of this is coming to you inside of atlas, right, which is a platform that you can run anywhere, has comprehensive security controls and privacy can be deployed in 100 plus regions across three different cloud providers, allowing you to get the best from any provider that you're choosing to use and gives you continuous up time with advanced automation and ensures that you have the performance that you need under all events.

so that's the capability to transition a bit though into use cases and integrations, i want to cover off on a bit of what we're seeing customers do with this service today.

so with vector search, you can implement use cases like semantic search question and answering systems, feature extraction recommendation systems, synonym generation, multimodal search. and maybe most excitingly and what the title of this slide is or this presentation is about, right? you can use it to implement large language model memory or retrieval augmented generation.

and so to kind of step back and and talk about where retrieval augmented generation comes into play is it's really focused on allowing you as an application developer to make the most out of using l ms inside of your applications.

so foundation, these generic a i and ml models are amazing tools now available to everyone, right? and they provide great amount of general information and reasoning, but they have a few faults, right? and so they have training cutoffs, they're missing your private data, which is a good thing, right? we want that to continue to be the case. um they're unfocused, they can often hallucinate um and they're not personalized, right?

so these challenges make it hard in many cases to bring just kind of a plain vanilla, you know lm into your application. what you need to do is you need to augment it with context and how do you do that? you do that with vector search?

and so this allows you to bring in context and focus with proprietary business data to really enrich the experience of using these large language models. you can include things like company specific data. you can include product info order history. you can even include, you know, secure user data, right, without risking the fact that it's getting into the model itself.

and so what you get by combining retrieval augmented generation or sorry, what you get by combining vector search with large language models via retrieval augmented generation is a really transformative a i powered application experience. it's refined, it's consistent and it's accurate.

and we're really just at the beginning of what this all means, right? today, retrieval augmented generation is often referring to chat bots and other, you know, small scale implementations, but it's getting much larger and it's getting much more broad, right? and it's going to be impacting every realm of the application development space.

so how does this work? so we talked a bit before about kind of how the client interacts with mango to b and vector search. the way that you implement retrieval augmented generation is typically through an architecture that looks somewhat similar to this.

so you have your client application and it provides a a question, right? a raw one to what is often being used an lm app framework which i'll get into in a second that app framework is going to send that question to vector search to get semantic search done and get contextual data back.

that app framework will then include that contextual data along with any other prompt tuning that you're doing to send to the large language model. the large language model is then you're going to provide that inference and send the result back to the app framework and then that result or response goes back to the client.

this is what it means to do retrieval, augmented generation using mango to b atlas and using one of the many lm app frameworks out there.

and they're so interesting on that topic. this is a really fast and exciting ecosystem and these app frameworks are extremely important and that's why we're thrilled to be partnered with so many of them. right.

first off, we have minds db who are looking to bring machine learning closer into the database. and we have no mic who are doing really interesting things over embedding generation and visualization. but then we have partners in the form of lama index, lang chain and microsoft symantec kernel.

and each one of the are these ln app frameworks that are helping you orchestrate the transfer of data between vector search and large language models and allowing you to really easily build these retrieval experiences on top of your proprietary data.

and so we have support in each one of these integration providers and we are continuing to contribute to those projects and ensure that those integrations are both stable and really useful. and so we're extremely excited about the partnerships we're making across the space when it comes to vector search.

so with all that in mind, i want to just cover off on one more topic before i hand it over to my colleague, which is challenges with vector workloads.

now, everything that i just showed is super exciting. it, you know, can unlock a ton of power from your data, which really is your durable differentiator when it comes to building businesses on top of generative a i. but it's not to say that there aren't challenges.

and so fundamentally, we've focused on giving you a very easy to use and expressible primitive for invoking vector search. but what can often happen with an approach like that is you end up with something that's let's just call it a bit of a monolith, right? and it has to be scaled altogether.

and why this is challenging for vector search is that when you include that in your database, then you're tying the same resources for your search to your operational workloads. and so you need to traverse the index to find these results. vector indexes can get large depending on the number of vectors and the number of dimensions. and this entire index needs to fit into memory to be fast.

and so these are all the challenges that are occurring when it comes to scaling these workloads and some of those can be challenging when you have kind of a coupled architecture.

and so i'm going to turn it over to seth now to talk about what is unique about what we're doing that enables you to scale in a much more differentiated manner.

oh, as ben mentioned, if you have a, you have a great vector search application, but if you don't have the right system to back it up, it's, you know, it's you're not going to have a good time, right? you have to have the right system to support the applications that you want to develop.

so as developers and d bas saw this vector search opportunity emerge, the response was to add a bolton database that interacts somehow with your transactional database or where you might store your vectors. now this of course requires some level of etl it also requires the use of multiple query languages, multiple drivers. so there's a lot of a lot of things that go into this.

you end up with a very complex system where you have to manage sync between these two database systems. you have to manage all sorts of things that can go wrong in a detail pipeline. how do you keep your index updated? what are the rules to do that?

so again, it can be, it can be an overwhelming challenge to put this system together. but what if we could combine these and have both a transactional database and a vector uh database together.

um in that case, you would have one language to work with one api one set of drivers. and we've seen that, you know, in talking to our customers in the, in the beta program and, and in this preview that folks can get to market significantly faster with a unified system as opposed to having a multi technology system.

so let's talk briefly about application architecture. so i've touched on this before. but the complexity of writing an application that relies on two systems, pulling that data together, making sense of it. on the application side again, having to sync that data between the systems. this is a challenge. it increases the complexity of your application and how you develop it.

so in atlas vector search, we have combined mongo db with vector database capabilities, we have the integrated platform where if you're already familiar with mongo db, you can start using vector search right away.

how do we, how do we do this on the back end? when you enable vector search in atlas search, we will do an initial scan and sync of that data and create a lucane index based on the contents of a collection. then we use change streams to continually update that index and keep it in line and synced with the source collection. Here is the remaining transcript formatted:

so the result is that all you have to worry about is getting data into mongo db into that collection. once it's there, we take care of the rest and updating an index, keeping it synced with the database. you don't have to worry about that at all.

so again, this simplifies the application development process which results in faster time to market for your applications.

now, there's another problem or a challenge that comes along with vector search workloads and that's how do i scale them effectively.

well, in atlas search, we have an architecture that allows you to scale your search workloads and your database workloads completely independent of one another. so you have two different knobs that you can turn here.

what we see in practice is that a lot of our, a lot of our vector search customers are able to tune down or are able to have a relatively small mongo db cluster and have a very significant search footprint.

so the way we handle that is through dedicated search notes. so for each man to be cluster that you have, you can choose to add 2 to 32 search notes. and as ben mentioned, vector search requires significant amounts of memory because you need to be able to load that entire index into memory for the best performance.

so we do offer memory memory optimized nodes where you can have different choices in the size of memory footprint you have based on the size and nature of your index.

now again, i also mentioned we can scale this out to 32 that is to of course help support high query volumes and we round robin between all of those.

so again, this is something we're very excited to, to see both vector search and dedicated search nodes um are, are coming together, they're, they augment one another. and i think they're, they're going to be essential to vector search and not going to be moving forward.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值