Improve your search with vector capabilities in OpenSearch Service

Good afternoon, everybody. Uh thanks for coming out today. Really appreciate it. Um I am John Handler. Uh I'll be speaking today about open source service. I'll be joined by Acho Kumar from Intuit and Arno Govinda Raju from Open Source Service.

Uh we got a great session for you today. Uh really looking forward to it. So raise your hand if you have not heard the words Gen AI this week, oh, somebody raised their hand in the back.

Uh you know, honestly, Gen AI and especially large language models have, have really turned the world kind of a little bit on its head. My background is actually in artificial intelligence, but I grew up in the nineties when we did artificial intelligence with Lisp and symbols and uh I was in an NLP lab and it's a hard problem and what we've got with these uh large language models is actually pretty jaw-dropping.

So I, I mean, I kind of understand why we come to, you know, 2023 almost 24. And we're hearing a lot about LLMs and especially um LLMs and their natural language processing capabilities have brought a lot in terms of chat bots, uh AI assistance and the various ways that you can use this natural language capability to build on top of and to bring really cool stuff.

Um but large language models are distinct from search. So OpenSearch is not a large language model, it can integrate with large language model. And we'll hear about how that is by bringing in vectors of natural language text to provide better matching. This is what we call semantic search where we're the the LLM captures the semantics and the concepts behind those chunks of text and enables matching at not a strictly text to text way but in a vector space that again captures some of the semantics, some of the context and some of the concepts behind the text itself.

AI/ML techniques have been used in search applications for a long, long time. And we also support a lot of those. They bring you things like segmentation, user preferences, collaborative filtering. There are tons of different techniques that that in the AI/ML world that come to OpenSearch.

Um today, we're really mostly just going to talk about this vector component and how that improves the relevance of your results. So just a quick word, maybe everybody's not familiar with Amazon OpenSearch Service. OpenSearch itself is an open source Apache 2 licensed project. That's a suite of a search engine, OpenSearch and a UI that enables visualizations and dashboarding that is OpenSearch Dashboards so those two components and there are a number of other things in the project as well.

Our two main workloads are search on the firsthand. So you flow your data into OpenSearch. OpenSearch creates indexes for that data and enables you to query that data to bring back relevant results supporting metadata search as well. On the analytics side, we have a very common workload which is to bring logs. Because five years ago, 10 years ago, somebody had the bright idea like, oh I want to find this error in my logs, but I don't want to grap on 10,000, you know, web servers, right? So I know I'll throw it in a search engine. So OpenSearch itself supports uh flowing in log data. It has to be formatted as JSON flowing in log data building visualizations and dashboards on top of that to enable you to monitor your infrastructure, your application, your security all in real time support in in these dashboards.

So we now have a number of integrations. Uh I don't know if everybody heard about the announcements. So we have zero ETL uh connector between DynamoDB and Amazon OpenSearch Service. We have zero ETL uh preview querying Parquet data and S3, a number of different integrations that will make it a lot easier to query that data without actually storing it. Uh so that makes it a lot more cost-effective. We are constantly working on cost-effective and cost optimization features and techniques.

Um OpenSearch Service comes in two flavors. So we have our serverless offering that enables you just to use OpenSearch without worrying about servers, without worrying about sort of index strategy, shard strategy. We'll get into all that as well. We have a managed domains option where you specify a set of infrastructure, we go and deploy it and you use OpenSearch that way.

So just to summarize what does Open Source provide in terms of vector capabilities. First of all, it is an Apache v2 licensed open source project. The project itself, we have 300 that should be a million uh 300 million downloads or well, more than that. Um and more than 70 partners, we have a number of different capabilities that we provide through our K Nearest Neighbors KNN and plug in. We support both the Faiss and NMSLib engines and for the HNSW exact K nearest neighbor and IVF algorithms for vector matching. So we have a deep capability. We, this has been in our service for three plus years. We have a very mature technology providing you with a vector engine that you can use to do this nearest neighbor kind of search.

We also support ML Commons in the open source side. You can use ML Commons with other techniques, linear aggression and what have you. And we support the Learning to Rank plug-in Learning to Rank, actually brings in user behavior to rank results based on past interaction with your search data. And finally, we've come out with the Neural plug-in for OpenSearch 2.11 Neural, as we'll see, simplifies the process of bringing in data vectorizing it and running vector queries against that data.

So OpenSearch itself is open source project. Then Amazon Open-Source Service is the managed service for that project. And we have our fully managed domains as well as our serverless deployment options. We have connectors to third party hosting services at present. We're supporting connectors to SageMaker to Bedrock, to OpenAI and to Cohere. So we can host the mo you can host the model externally. You can train the model fine tune it. OpenSearch is able to connect to those services to provide you. Well, they'll provide the embedding in the service.

We do support the KNN plug in the Neural plug in and the LTR plug-in. Again, all of those are vector capabilities to improve your results in the serverless side. It's an auto scale serverless experience. We have what is our Vector Engine for Amazon OpenSearch Service also went GA yesterday. Um so with Vector Engine, you can flow in your data. Uh you don't have to worry about figuring out servers or deployment or anything like that and it supports the KNN plug-in.

So coming back to the point when, when you hear Gen AI Gen AI like to think of as generative is a key piece of generative AI right. If something is generating text or generating an image or generating a video, that's Gen A right. Semantic search is really more about getting better search results. So semantic search is about bringing in your data, getting those vectors from the LM that capture that context and then matching the query against that right. And it helps improve search relevance. Actually, it's it's quite startling how much better it does on a lot of the standardized tests. And we'll we'll see that.

Then we have our generative AI, right? So, generative AI OpenSearch is also spoken of in tandem with generative AI with generative AI, you're sending a prompt to an LLM for that LLM to generate some piece of text, whether it's a piece of code, a a summary, some kind of, you know, poem or what have you, right? In order to improve the results that come out of the LLM, you feed data in, in the prompt that makes the LLM more accurate. So LLM suffer from what's called hallucinations where they, they will say they will generate some text confidently stating a false fact to improve that you modify the prompt with real information. And the LM then gets more accurate. OpenSearch serves as one of the knowledge bases commonly associated with LLM system or chatbot systems or AI applications or what have you to feed data in. And in that context, frequently you use semantic search in OpenSearch to generate the best results to add to the prompt to the LM, to generate good, good textual output.

So here's an example and, and we're gonna go through a little bit the state of the art before LLM's just so you can understand sort of the change in what happened, right. Find me an 8 ft blue sofa. Ok. How do I do that? I go to amazon.com of course, and I type 8 ft blue sofa into the search box in the back end, the search engine is going to take those words and match them against the product titles, the product names and the product descriptions and come up with a set of results based on how many matches were there ranked by a relevance function that we'll get into. So we are matching query terms to document terms directly. Could this be a text to text matching?

So what is relevance? So what makes this something better or worse as a search result? So it's a funny thing like there's no uh you know, nonhuman centric way to measure relevance. Relevance is always a human centric concept. Something is relevant to me if it meets my goals in querying the engine in the first place, like if I see an 8 ft blue sofa that is more relevant for me than an 8 ft chartreuse sofa, right? So when we're looking at this, what we're looking at is how do we bring the things that match my goals and allow me to do the task that i wanna do.

So let's talk about the basics. OpenSearch Service works with JSON. This is a, a poll from a product question and answer it's a public data set. This is just one as in, there's a question in there. There's an answer in there.

High-level point. OpenSearch is a REST API it works with JSON documents. When you send your documents to OpenSearch for indexing, you put them in JSON and you use that REST API OpenSearch itself, then will send those documents to an index index is the core structure. You can think of it like a database table. You send your documents in OpenSearch indexes all of the fields in the JSON making them retrievable.

The index itself is then composed of shards. Shards are partition of the data. Just simply said you specify a primary shard count. When you create an index and OpenSearch randomly distributes the data across that stuff. Those shards, you can also set a replica count. We recommend that you use one replica that gives you redundancy. OpenSearch will distribute the shards onto data nodes and it does. So in a way that primary and replica are never on the same instance. So if you lose an instance out of the cluster, uh OpenSearch will be able to recover that data below the shard level.

Each shard itself is an instance of a Lucene index. Lucine is a Java library been out for 20 years, it reads and writes search indexes. So at its core OpenSearch is, is a distribution layer for Lucine, which actually does a lot of the work of processing requests.

Here, we have a document, it's got some fields and we're going to send all of that through an analysis, especially for text. A lot of what OpenSearch is doing is breaking that, breaking that text down in a language aware way to make it good for matching. So that when I type i 8 ft, I can match foot to foot, we'll see how that works.

The core structure. The core search engine structure is the inverted index. It's called inverted. For some historical reason, it's not actually inverted. We take all of the words that are in that field and we collect the documents that contain those words. This is inverted index. I can now look up a word and I can find all the documents. It's in like an index in a book. I go look up Barack Obama in the encyclopedia remember encyclopedias and it says go to page 54. It's exactly like that right?

In order to do query processing, then we're going to start with the query text. And because we've analyzed the text that we're matching against, we analyze the query as well. In this case, how does albedo diffuse radiation? You can see there when we run the analyzer, it actually outputs some kind of funny stuff, right? It's how, how do albedo def radiat? What we're doing here is we're applying english stemming rules that mean that i can match, run runs, running rant, well, not rant. Um they'll all match together because the the ending the inflection is not actually important for the match, right? So the english analyzer brings words to a common form and makes them searchable.

I then go look up the query terms in the inverted index. This portion of query processing is very fast. This is login OpenSearch loves words. You might think numbers are faster, actually, words are great. So it looks up the it looks up the terms gets the posting lists and then it applies the query logic in this case, how and do and albedo et cetera.

So it's going to do an intersection between all of the posting lists. This is a linear process in the length of the posting list, it, then it then scores and sorts the results and pushes out the final answer, scoring and sorting and log

In generally speaking, your query latency is very tied to how many matches you made because of this scoring and sorting. So again, just an example, uh all of this came from Wikipedia data. In this case, this is the albedo entry, a piece of it. If there's a standard analyzer that's not language aware, it's just whitespace down case remove punctuation, there's also the English analyzer. And again, that gives us these kind of weird term looking things.

We run our query text and again, analyze, you can see we have a set of matches here and for search engines, how i define relevance is to say, well, albedo matched a lot more terms than albert einstein did. So probably that's a better match. Now, what exactly do we mean by better? The text to text ranking function in OpenSearch is called, okay, copy bm 25. It's an instance of what we call tf idf. I know formulas make me squint, they make my head hurt. So I'll go through it for you.

Idf is how many times did this term appear in every document? So we sum it all up and do one over that. A very rare term will have a relatively higher score than one that appears a lot. Ok. That's idf. Tf is term frequency. How often did that term occur in this document? So the score essentially is multiply the number of occurrences of the terms by the value of each term. And we're then we're going to get the score based on that.

Now we have a bunch of other math here which i will summarize by saying you don't actually want a term to count too much. If it occurs 500 times at some point, you, you want to level off how much it's worth, right? You also wanted, you also want to make sure that it occurs 500 times in this document. You want to sort of bring down the the saturation of that you should hit saturation, right?

So basic idea how rare is the term, how often does it occur in this document? A little normalization on the side and that's how you get your score. So when we rank this stuff again, albedo and albert einstein, albedo would win out uh based on that. So text bases relevance bm 25 has been around for years. And actually as an algorithm as a ranking function on standardized tests, it does very, very well. It wasn't until recently that even vectors were approaching the accuracy of tfdf. There's lots of papers and stuff on the information, theoretic reasons why tfdf works well. It's a very interesting topic if you want to dive in uh fundamentally, you know, you also have cases where an exact match is the right thing. If i type 8 ft blue couch into amazon.com, really matching the terms 8 ft blue couch is probably the right thing. And so it's a good function especially for exact matching, right?

What if the query doesn't have terms? Or what if the terms are really just an approximation of what it is that i'm trying to search for the, the intent of the query, which is essentially what is my goal in writing this query can be hidden behind words. And this is where we start to get into the vector space because the vectors and the l ms can capture a lot of that intent in the way that they relate the semantics and the the term neighborhoods of what they're bringing in.

So vectors generated by ms, they capture it, right? So what if i want to find a cozy place to sit by the fire? And in my head, i'm thinking 8 ft blue couch, right? So this is again where we get to vectors and uh it, it can seem a little bit off-putting and maybe scary. There's a lot of terminology. It's, it's a big area of science. It's been going for years and years. So just to simplify things a little bit, what do we mean? What is a vector? A vector is an array of floating point values? That's all it is. I send off a chunk of text to an llm. I get a big array of values somehow the llm is capturing based on those values, whatever is behind this text. It's doing something. Actually, nobody really knows what it's doing, but it's doing something because it works right. A model is a function, right? And it, and llm learns the relationship between the inputs and the outputs. That's all it does. So way l's work, you train it by running text through it and it's looking and trying to predict what's next. And so it's generating the next one based on the weights that it's building internally a large language model, forward, backward codes, text and embedding is a vector. That's all it is.

And then we have a number of different engines and algorithms um that gets confusing too. Again, the engines that we have are Lucene Face and Nms Lib those are the engines. And then the algorithms are Hnsw and Ivf also we can do exact matching. Then we also define a similarity metric. So the similarity metric there are about 10 of them. Uh they vary, it could be how close are the vectors uh in cartesian distance or cosine distance uh dot product manhattan distance. There's all kinds of measures that we use for similarity between two vectors. Ok.

So just to demystify a little bit, that's all, that's all it is actually that's only a small part of it. Ok. That's, this is a 1536 dimension vector truncated at about 200 right. But this is what the llm generates. This is what we use for matching. This is how we determine distance looking at the algorithms we have exact knn which is find me an exact match. Ok. We're going to put that on the side that's useful in some contexts and put that on the side. Then we have approximate knn and again knn stands for k nearest neighbors. I'm looking for how many neighbors, the closest neighbors to a particular point in that big vector space article navigable small worlds is a algorithm that builds graphs in order to, to traverse the vector space in a efficient way, right. So hnsw you have multiple levels of granularity. We start at a very coarse level of granularity. We traverse as close as we can. We drop down to a finer grain, we traverse the neighborhood there we drop down to a finer grain. Uh as a means of pulling out the closest neighbors, we also have inverted file. This is the other algorithm with inverted file. We use clustering and then find a centroid for multiple clusters of points in this big vector space. The search then we start looking at the centroid, we find the closest centroid. Then we explore that local area. All of this is for efficiency and the ability to actually find these things in this enormous 1536 dimension space, right?

Um again, just to throw an eye chart at you, um we do support the n ms lib hnsw implementation the face nhnsw, i can't talk face ivf and lucine hnsw. Uh you can see we have various different maximum dimensions. We can use different filtering, different training, whether we require training what distance formulas we can use this. I'll, i'll let you, i'll let you come back to this later. But this is the the sum of what we do. Our neural plug in. Again simplifies the process of bringing in textual data, sending it off to a third party connector and bringing the embedding back to uh opensearch, right?

So in this case we're looking at, we have some pre trained model that's in sage maker. We have a connector and opensearch service. We have a set of business documents. We send those business documents to opensearch, opensearch, automates the call to sage baker, bringing back the vector. And from our front end, we run a query uh to look at it in a little bit more sort of schematic way.

Um so on indexing, we're sending in a source document within that document, we have to have a chunk that is representative of that document. That's the chunk we're going to embed, right? Um that's going to go to bedrock. Bedrock is going to generate a vector, we augment the document with that vector and then index it in opensearch service. On the search side, we have some user query, we send that user query again to amazon bedrock. Amazon bedrock is going to create the embedding out of that. And then we send that to our k nearest neighbor index to find the neighbors for that vector. And then we use that as the search results.

So all of this stuff you know is kind of you can manually do all this stuff you can send it off to, you can build your own embeddings, you can stick them in your documents. But again, the neural plug in, we built to simplify this process looking at it in code. It looks like first we have to create a mapping mapping is the schema that you set for your index in opensearch. We set our type knn index dot k and n true. So we just tell open source, we're going to do kn n here, we have our plot text and plot embedding fields and we build a pipeline and an ingest pipeline that again sends it off uh based on a model id. So all you do is you specify the model id, you link in uh via ingestion pipeline. And this all goes through to the third party host.

We query again with the neural query. Um we tell it the vector field and we tell it the model and the text we want to query with again, opensearch does the embedding generation and does the k nearest neighbor match. So fairly simple to use another eye chart for you.

Um dc g at 10 is a measure of based on a golden set one where you've precalc the relevance of your answers, how much of the relevance is in that set? It's normalized, it's discounted. There's, there's a lot of math that goes into this fundamentally. You're looking at how relevant was this response based on a human data set. And we, we tried this against a number of different cor corpuses corpora uh with a number of different uh blending mechanisms for bm 25 and vector uh scores. And you can see here we get 4, 14 to about a 14 to 15% improvement in relevance based on a joint score between bm 25 and opensearch. These are pretty shockingly good numbers in, in the, this is uh the beir test.

Um i encourage you to check out this paper uh i've referenced at the bottom blog post, really good, really good source for how we ran this, what we did here and what our results were, how many people have heard the term sparse vector? Some i was mystified by sparse vector. There's like so much hype. It's sometimes it's hard to cut through it, right?

Sparse vector is moving, taking out a different layer of the llm and pulling the the vector out of there. So the vector embeddings we've been talking about come from deeper in, in this case. So we have our vector space here. Republic federalism chair couch ottoman. What have you what a sparse vector is? It's a set of things that are words or they may not be words, they could be partial words with weights. Ok. That's all it is sparse factor. It's called sparse factor because the dimensions are more closely correlated with the input text size. Uh and you don't get all of the zero weights that you would have gotten right? But we're just, we just have terms and weights when we rank with sparse vectors, then we have the relevance is the dot product of the query which we also generate the sparse vector with the um with the each document this computation is a lot cheaper than all of that graph traversal or clustering or all that stuff. So there are a lot of benefits. It does improve the scoring but it preserves also the text to text relevance because you have something that approximates the tokens matching, right, use much less ra m the other, the embedding methodology. When we're building those graphs, when we're doing those clusters, you need ra m for that. And this actually fits into the lucine data structures. So we don't actually use tons of ra m to store these. It's faster than dense vector ranking. It's close to bm 25 speed, which is great. And we get improvements depending over bm 25 or dense vector ranking. Uh 10 to about 17.5% depending this is all on beir again. So sparse vectors are available in o opensearch. You engage with them exactly the same way you engage with dense vectors, you create a connector and uh you use that connector.

I'm a little bit over. I'm just gonna go quickly. I wanted to give you two use cases. Um source for these is a blog post you can see down there. Uh amazon music is using amazon open search service for vector search at 100 million music tracks, one in 1.05 billion vectors and at 7000 queries per second, fairly beefy, uh fairly beefy. This is for collaborative filtering

Um ip infringement also at amazon.com, 8 billion items scanned daily, looking for fraud or looking for fraudulent products or what have you. 99% of infringements were discovered with this open source service backed uh fraud detection, 68 billion vectors and 8 billion queries daily.

So that's my part. I'm going to ask achal to come up and talk about vector searching into it.

Thank you john. And hello everyone. It was a really great talk next time. Can you dumb it down for me a little bit? It's amazing. Thank you. I learned a lot today.

So hi everyone. I'm from intuit and I'll talk about how we leverage vector search at intuit.

So first people don't know about it. We have four different products, turbotax, great karma quick books and mailchimp. And who do we serve? We serve consumers, small businesses and self-employed. It is a global financial technology platform that powers prosperity for the people and communities.

And the reason we have been able to do that and we'll get into more details is for the last 40 years, the mindset has been to leverage the latest technology trends as it comes. Five years ago. Rc said we have to become an a i driven expert platform and we started investing a lot in a i you know, data infrastructure and we have so much data about our customers and now we are paying the dividends.

If you see that in the last 78 months, we have built a generous platform which is powering our developers to build cool generative a i experiences for our customers.

So uh inter mission is to power prosperity around the world. And this time, this moment where we have huge amount of data and gen a this combination makes me confident that we are in the right track to deliver on our mission.

So let me tell you about who i am and who we, you know what my team does. So i am one of the leaders in the data organization attend to it. And along with some cool things like stream processing, batch processing, you know, real time data data lake. One of my teams also provides fully managed paved roads for different database technologies like no sequel, search relational and graph.

And two of my team members are here. And the cool thing about this fully managed is it's not just a database cluster. So we have an a p layer in front of the databases. So into developers don't have to worry about, you know, managing the cluster or thinking about what is behind the a p. And this is one of the reasons why i think we have been able to move fast and back to search because we have one managed open search service for all of it.

So let's look at the vector database use cases we had added to it. The first one is generative q and a. So basically, it's about building a knowledge base and understanding natural queries and providing relevant answers to the queries.

The second one which i think john also mentioned about fraud detection is entity resolution. So it is big on fraud. We are trying to make sure that you know, we catch it before it happens. So there's things like entity resolution to make sure that the person who is saying who is is the right person, then we have product matching and then we have, you know, entity matching of their multiple use cases for entity matching as well.

The other thing is document discovery and similarity patterns. So into it stores so much documents about our customers. It's like petabytes and petabytes of documents. And it it's really becoming important to figure out similarity in documents by extracting, you know, and there are different formats, some could be jpeg, some could be pdf. So we are trying to leverage, you know a i here to figure out similarity between documents.

But today, i'm going to talk about generative q and a where we are leveraging vector search, the amazon open search service with the rack technique and how we have been able to build on top of that in the last seven months.

So in the last seven months, we came up with our gen a i platform which includes the genus, which is the place where developers build stuff, then we have the gen run time where developers run stuff.

And if you look at the two screenshots here. The first one is about creating a knowledge base and you know, so and the second one is about querying knowledge base. So knowledge base is a, a place where you store a lot of documents, all of your knowledge, expert knowledge information is stored and at run time, you leverage that knowledge to provide relevant search results back to our customers.

So i'll go into the use case and data flow in the next slides. But just keep these two screenshots in mind because the first one is about creating and the second one is about query.

So it's pretty simple. I'm not going to the complexities as john did. But if you look at the use case, the idea is the developers come, they say, ok, i have a knowledge base to create these are my documents, they upload it to amazon s3. Once they are done, if you remember the previous slide, they go ahead on the ui and say i want this is the part of my documents and let's start the process.

We have built an a i service behind and this is the important part where we break down the documents into chunks. Like a document could be like, you know megabytes, you know 200 kilobytes, you don't want to send all of that in once to open service.

So right now what we have done is every paragraph in the document is a chunk. And if the paragraph is big enough. Then we even split the paragraph into smaller chunks. Those junks are sent to a kafka topic from there are managed vector search service just that data into in the amazon open search service.

And each knowledge base is mapped to an index what john had explained. And this is important because we want, we don't, we want data isolation and we are very big on, you know, user authorization. So our church should not be able to access, you know john's data and things like that.

And once the injection is done, users can start querying the knowledge base and we pass that information back to elm as a you know prompt enhancement so that s can also improve on their relevance of the results.

So i'll look at the component diagram next. So this is the injection flow. So as you see the client uploads the documents, there's a a i service in the middle which use an lm embedding model to create the embeddings. Then he passed the embeddings and the meter both to the open search in gesture and the results are stored in our open source service.

And if this is one knowledge base, all this data will be stored in one index in our open source service and it's a multi-tenant system. So clients don't have to worry about, you know, we take care of data isolation for them.

Now look at the query pattern. So now the client comes, the user comes and the query, you know, send i want, you know, where is my blue sofa at that point? The query is also again encoded by the same model and that the embedding is passed to the open search, read a p which is managed by my team and it's sent to the back end of amazon open a service, the results are returned back, it it's again passed to the lm.

So the llm gets enhanced prompts and so that can generate better relevant results based on context search history and things like that.

So right now as of today, we have seven use cases for generative q and a in production, we are lever leveraging exact can approximate can with pre filtering are in use. And in the last 78 months, we had no two incidents.

So i didn't get a call from anyone and 46 currently, 46 use cases are in pre broad being tested right now. And the last point is important because there are two internal customers experimenting with project quantization compression.

And this is very important because embeddings are currently stored in memory. And as the size number of documents grow big, the cost really explodes. So compression is really important here.

So just calling out some learnings from adopting amazon open source service at the vector store. So one thing which gave us a really good head start was we had managed open search for the last four years in production. And with several, you know, microservices leveraging it.

And we had built all the infrastructure, all the automation, everything was built and ready to use. So we could leverage the same monitoring capabilities. We could leverage our backup index management, re indexing and things like that. So that really gave us a great head start, you know, building our gen platform, some things we stand out which we all should be aware of as you are leveraging, this is open search provides hybrid search.

So you can do text and semantic search together. And we have seen scoring disparity when you do hybrid search in one. The other thing as john was mentioning about 1032 you know, dimensions sort of vector. If you use some lm models, they could be doing 15,000 dimensions, some do 1000 dimensions.

So we need support for higher dimensions on the lucine engine because then we are limited by what models we can use. And the other thing we have also found is when you're talking as we grow our use cases and the number of documents scale to billions we have seen higher latency.

So that's another thing we are working with john and the open search team and just to call out some desired features, which i wanted to list out pair is again, as i said, all embeddings are in memory. So compression really becomes important and right now, it's not as seamless to enable compression.

So we are working with aws to figure out how we can make it really configurable. And the second point is also very relevant. Again, again, cost is very important as you will start using this because the cost will just go skyrocketing.

So reducing the memory footprint by storing subset of the vectors to the disk. And that's another thing we are working with them. And we've also found some things which are available in open search but not yet yet in the amazon open search service.

So again, i would request arua from aws to talk more about all the new features which are coming based on har us. Thank you. I don't know.

Thank you, rachel. All right now for the demo, the juicy part. So um i'm auna haruna gound raju. I'm an open search specialist architect. I primarily work on opensearch managed open search service and server on the aws cloud. And today we are going to have a feature demo.

So today john covered so much about the basics. What is a vector, how they are internally stored, how open search helps as a vector database. And of course a from in also showed how the the core infrastructure or the underlying framework that enables the gen a i is the vector storage.

So one thing that we, that i'm going to show you today is how are these vectors created in the first place? How do we create them in the past, the typical framework has been, you know, set up a microservice or a lambda application or any your custom applications on two instances which will take all your data as input, generate the required vectors. And then you've got the required vectors you download them from, from your application and send them to opensearch or any vector store of interest.

So there is a certain heavy lifting that you need to do. So every time your data changes, you're again sending them to your application, update your vectors with the new data which is falling in and sending them across.

So whether it's a create or an update, there's a constant update to your vector. So how do we keep track of this? So with the latest version of the open search, we introduce the a il connectors as part of the neural plug in that john just spoke about.

So when you think about it, even in yesterday's swami keynote, he was talking about how we are bringing all of these multiple services on the aws cloud and make all of these integrations between multiple service service, be like opensearch and sage maker and bedrock all of these integrations more native so that you don't have to build another applications to do that integration between these services.

With that, we will be seeing how do we use these connectors and what are the connectors that we support today? So we support uh open a cohere sage maker and bedrock. These are the top four connectors. A il connectors to these model host is what we support.

And of course, this is that is an active recruitment and we are bringing more and more of these model host into the picture for a i connectors. So today, as part of this demo, you will be seeing how to create this connector. And once you create the connector, of course, these models are sitting somewhere on an external model host.

So we have to make sure only specific team have access to it. Security is number one priority. So we have to make sure the model access are appropriately set, how to do that. And once your connectors are created, now you register that connector within your open search cluster.

Now what does that give you a model id? A model id is a direct link to the model which is hosted somewhere on your open a i. So with that model id, all you have to do is attach that model id to your est pipeline.

So how you are investing data to open search today the in interest pipeline

Now you can accept a model id as one of its parameter. So every time I'm pushing data, I don't have to worry about hydrating my data. Rather use the engine line which internally is linked to the model id. And the model id is connected to your to your external model host to actively generate your vectors.

And of course, today, we will also see how you would do the injection and how do we do a search. So how I'm creating. So today I'm going to show you how to use the create API. There are many ways to do it. You can deploy this, this is a simple API you can deploy through your application or you can have a CF and created for it or you can also um you know, deploy it terraform. But today I will be just showing on the postman how I will just fire a simple one time creation of this AI connector which connects my OpenSearch to a model which is hosted externally on OpenAI.

And again, um I'm sorry about the all the blur that you see on the screen because this session is recorded. Of course, if you want to see what are the exact configuration where these configurations go, uh how do they look like then definitely stop by our OpenSearch booth and I will be very, very happy to show you the demo without all the blurb.

Um so let's move on to the um demo. This is working. Yeah, there you go. Ok. So this is a create API that comes, that's, that was recently launched. So today, we will see there are two configurations that you need to take care. One is the authorization where I will configure the AWS signature and of course, the signature comes with you need to make sure I have access to all the resources on AWS. So give the access key, secret key and the session token, the AWS region is the region where you deploy your OpenSearch and of course your service name. This is a typical configuration you would have seen many times.

The other thing important thing is the ML blueprint. What you're seeing here is called an ML blueprint that I give in a JSON format. This is a blueprint for OpenAI but real on the OpenSearch platform, we have a ton of blueprints that you can download and use the create command against. There are three different configurations I need to make. Let me cover that.

So today, I'm using the model, add a 002 from OpenAI that is a text embedding model. So that's what I've configured. And as part of my authentication, there are two things that I need to create. One is the secret ARN and the other is the role AR. Now let's look at secret AR and what is that right? You are now connecting to OpenAI which is like outside of AWS. So OpenAI provides you with an API key. So this API has to be stored somewhere for AWS to access. So you would create a secret manager instance and store them, store that API key there. And the second one is the role ARN of course, this is an IAM role that you would create to make sure the role has access to create the connector connector on the OpenSearch service platform at the same time, has access to the secret manager.

And of course the third one, this is the actual key, uh the API key which is a name value pair that you would now see how I would create it on my um on my secret manager. So um let's move on to the secret manager as soon as the video is ready.

Yeah, there you go. So now I'm in my secret manager. So I would create a secret manager instance where I would create a key value pair and provide in this format, my OpenAI key. So I would, so for those who have used it, if you go to OpenAI, there is a key which automatically get generated, you can create a new one for every service that you would want to call, keeps track of audit. So you know what key who's using and what accesses these key have.

And then once you create it, make sure we give a name to this so that this is what we generate for our ARN and that we would configure on our blueprint next. And now we have the secret manager entry which is created here. Click it. And this is the ARN uh that is on my account that once you copy, make sure the secret ARN is updated with this particular um configuration.

Next we move on to role. So like I said, there are two things uh the IAM uh role needs to have access to want a secret manager. Yeah, of course, because you need to access the OpenAI. So it has full access to secret manager and also access to my service to make sure I have the required access to create the connector within the OpenSearch service.

So one more thing that I would also do uh which is upcoming in a few minutes um is make sure this IAM roll has full access to the um ML uh comments, which is it's a backend plugin. So the ML comments will make sure you have access to your um ML plugins on your OpenSearch service.

So um let me do the back and roll mapping. Make sure this particular role that I've added here also is added to the ML full axis on the backend mapping. So this is my OpenSearch dashboard. So go to rolls, click on the security and then you can go and do a back end roll. Give the roll information. IAM roll information you just created and then click on next and this back end roll mapping is done.

So now you have a, you have defined the secret key and the role ARN you run the command, your connector is created. Now, you've got the direct link to your external uh model, which is, again, that's a, a 002, that's a 1536 dimension model. Now, if you want to see what are all the models that you've deployed, you can, of course, this is a command for that uh that will give you a model list.

Now, the second step is to register. So model group, again, you can group all your models together. That's another way of ensuring your um you know, there is your models are all grouped and you give single access to that model group. We have copied the connector and now the model will be registered every command that you fire against for doing the registration will generate a task, an OpenSearch task. So make sure your task is completed and it's and the models are respectively created before you move on to the next step.

Yes, it's created fine. So you capture the model id, the most important that we want out of the connector. So the model id establishes is a confirmation that that a certain connectivity between your OpenSearch service and your and your external model is established.

Now deploy the model. Why do we need to deploy the model? Like you know a was also saying your actual vectors, they actually get loaded into the memory. So we want to make sure the model is available on the memory to do your generate your embedding services.

And now a predict is a new API again, we have launched, which can you can quickly check whether your model is working fine or not. So against your model, run a predict, give some sample text and see whether your vector embeddings are generated.

Now, the next step is an important step where now you attach this model id to your neural pipeline. So this is a quick create of a neural pipeline before that the demo store. So what I'm going to show as part of my demo here is I have an existing lexical index or a text index. So as part of this demo, we will convert this index into semantic index.

So here I'm creating the neural pipeline and I'm defining the fields that needs to be converted into vectors and also attach the model id executed. And now the neural pipeline is also created. And with the neural pipeline, I create an index, a semantic index. So the previously the lexical index demo store will be converted into the semantic demo store in our demo today.

So one of the key attributes that we are configuring is index kn is true, which is making sure that we are indicating that the current index storage will include vectors. And of course, the neural plug in is is mapped to a default index pipeline. So that every time you ingest to this index to the semantic demo store index, the neural pipeline is called and the mapped field.

So we have identified two fields description and name which is mapped as vector fields. And as you can see, the dimensions are pretty critical. It's a 1536 dimension which is what, which is supported with the a 002 OpenAI model. And this is an easy way out. So I'm just converting my existing index into a vector index by doing a re index providing the source and a semantic demo store.

So this will send all the information from the source to the destination. And since the destination index is associated with the neural pipeline, it is going to generate the vectors on the flight. So now my both my indexes are created and we are all good to go.

So I will be doing a sample test to make sure my indexes are fine. Let's do a quick search on my index, right. And you can see the two fields that we have identified the required vectors are generated. Now once you do the search and it works fine, this is how this is the neural plug in that john was also mentioning it again, integrates with your model and allows you to do a neural search.

So as you can see, I'm searching for red shirt, it is it is doing a semantic search and bringing our similar skews together. But I have not converted red shirt into any vector. That is the job of the model and the AI connector, what the job of those connectors and of course, hybrid search is now supported.

And as you can see, I'm doing a vector search, but I'm also limiting to some metadata like price and categories or accessories in my index. So I don't have to limit. So I'm doing a combination of both text information as well as vector search as a hybrid search in my in my demo here with that, we come to the end of this.

And uh so what we learned as part of a demo is is how you would leverage this AI connectors and make these connections more native simplify and operationalize. You don't have to keep up to date, keep your index or your vectors up to date with the data changes. So you can use your AI connectors for those.

And of course, you know, build native integrations right. That's our core mode to bring all these integrations with multiple other services, more native and integrative with OpenSearch vector search. As you saw from the core underlying framework for gen a platforms or vector search. So how do we we can leverage vector search to build your gen applications? Of course security being number one, we also saw how we can secure your externally hosted models.

And of course OpenSearch being a distributor platform, it is easy for us to scale and build scalable applications and build stable applications on the cloud. I would now like to invite john to come talk about the content that we have today as part of OpenSearch vector. Thank you, john.

Thanks auna. Um i'll give you a couple of minutes to just quick take uh some of these qr codes. We have a lot of content out there. A lot of ways that you can dive deep into this topic including uh blog post on our vector database capabilities, um uh blog post about serverless vector engine, a couple of different workshops working with semantic search. Uh we also have some benchmarks, multimodal is image and text uh together a bunch of different stuff, building chat bots. Um please feel free and, and go check those out.

So to close, you want to use OpenSearches, core capabilities for text to text matching uh with good relevance, low latency, high throughput query processing. You wanna add vector embeddings from LLMs for improved relevance and OpenSearch brings multiple uh third party hosting services. You can bring vectors in and uh employ those those models, hosting services to do those vectors.

Um if you are following the analytics, superhero thingy, anybody following that? No. Um so you can, you can go through, i think there's a little uh quiz or something and you can find out what kind of superhero you are. Um so that's that. Uh thank you so much for coming out today. Really appreciate uh your participation, your uh coming. Please. Uh do remember to fill out your surveys. Uh we care about the feedback. So, thank you. Thank you, Ana, thank you, Acho as well. Uh, and have a great rest of reit.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值