Making semantic search & RAG real: How to build a prod-ready app (sponsored by Elastic) (Elastic)

本文链接：https://blog.csdn.net/just2gooo/article/details/134818462

All right. Thank you for joining us today. I'm Michael Hildebrandt, Distinguished Solution Architect with Elastic A and Andre Senior Partner Solutions Architect with AWS and I'm Fahad Siddiqui. I'm a Director of Engineering at Adobe Commerce.

All right. Thank you for joining us today. We're going to be talking about making symantec and how to build a prod first thing to get out of the way i have to put this up because of the lawyers. i may say things that will happen in the future may not happen at all. so don't make any buying decisions you ought to get out. i'm going to wave my hands. lawyers made me, made me say that with that. let's jump right in.

So why are we all here? So search is evolving, search is changing and customers really want to not interact with a search box like they're talking to a robot. You'll notice the common theme here for each of these different kind of prototype of use cases is that that they're all conversational. they end in question marks, they have of punctuation, they're actually complete sentences and thoughts we're evolving beyond just putting in a bunch of weird tokens taking out all of the language because, you know, that's been giving you a weird experience in that ecommerce search box. you know, if you've ever searched, you know, it, you've trained yourself to talk to the robot, but we can actually fix that today and this is going to be the future.

But, you know, as the, as the industry wants to race to do these, everyone can't wait to do it. we got to make this stuff real. we, we need to ship it, we got to push it to prod, but let's, you know, pause for a second. what does it take to get these generative a i apps to production? and as you think about it, you know, in the upper left, we've got, you know, your data, you know, the private things that large language models have never seen nor should they probably ever see in the public. you know, they should not be public, they're, they're your private data and you got the lms there on the bottom, right? and again, what you've, you have to do is fit these puzzle pieces together, they don't directly fit next to each other. so you actually need to surround your data with things like search, you need to surround it with things like natural language processing that will actually keep the pieces from flying apart. that's how you put this puzzle together.

So what's the typical approach? You've got to choose a large language model or appropriate model. You've got to build symantec search around your data so that you can bring your data to that model properly. And then you've got to make it ready for protection. You've got to harden it, secure it, you've got to scale it.

I'm going to hand it over to. I am now to talk about item one, choosing that large language or appropriate model.

Thank you, Michael. And as Michael mentioned, generative AI at its core is powered by foundation models. What are foundation models? These are very large neural networks trained on massive data sets and they generate content based on the input that you provide as human language instructions.

Our customers asked us to provide them an easy way to build generative AI applications that provides them the flexibility to choose the foundation model that best fits their use case. And that's the reason we built Amazon Bedrock a fully managed service that provides a set of high performing foundation models from leading AI companies for you to build your generative e i use cases with amazon bedrock.

You can easily experiment using the foundation model of your choice. You can customize it using fine tuning or retrieval augmented generation, popularly known as RAG. And you can create managed agents for performing complex tasks like managing an inventory or booking a ticket and i'm not done there.

Amazon Bedrock is serverless, that means you can do all of that without worrying about maintaining and managing your infrastructure. We also strongly believe that customers deserve and should have the choice to choose the foundation model that best fits the use case. That's the reason we have a set of high performing foundation models including Jurassic two from AI 21 labs Cloud from Anthropic Command from Cohere LAMA two, from Meta Stable Diffusion from Stability and Titan from Amazon.

But if you think about it, all the companies will have access to the same foundation models. But the companies which will deliver business value with the help of generative AI are the ones that will do so using their own data. So data is the differentiator between building generic generative AI applications and the applications that know your customers and business deeply.

But using data doesn't mean you have to boil the ocean. It doesn't mean every time you have to build your own models while some companies will still do it, they will build and train their own models. But many companies will just use techniques like fine tuning and retrieval augmented generation to customize the base foundation models.

Retrieval, augmented generation is a technique through which you can retrieve the relevant business data from outside of what a foundation model is trained on. And that might be from your enterprise data corpus. You can then use that relevant retrieve data as context to augment the prompt. You can also use retrieval augmented generation in conjunction with your fine tune model or your custom model to get the most updated information from your enterprise data corpus. And you can do that using Elastic the company behind Elastic Search, which is a strategic isb partner of AWS. And with whom we recently signed a strategic collaboration agreement to bring generative to our joint customers.

The architecture that you are seeing on screen is an example of how you can build retrieval augmented generation using Elastic Search where you can store your private data with vector embeddings. And you can use semantic search and hybrid search to retrieve that relevant context. And you can use that in conjunction with Amazon Bedrock, which can provide you a set of high performing foundation models with that. I'll hand it over to Michael to go deep into it in terms of what Elastic Search has to offer.

Thank you Alan. So we don't have time to cover everything that we can do. But let's talk about basically one aspect that everyone has been very, very focused and interested in recently, which is vector search vector databases. And the first thing to think to think through is is that, you know, you may think all i need is just a really fast vector search engine. i just need those nearest neighbors to that input vector. But i'm here to tell you i travel the world. i talk to all sorts of industries, all sorts of use cases, not every vector, not every time there's all sorts of stuff that go around it. what does it really look like? you've actually got to do all of these things basically in concert to be able to deliver the data to your users or to a large language model, you know, you can't just search it. you have to find a way to do your query embedding. people don't write large floating point numbers and giant arrays into the search box for you. they write in basically natural language. they have to be converted. you have to go get the documents. if you just have a bunch of ids that you got back, that's not what you can put in front of your users. you have to go get the original documents that you want to put in front of them.

Now, you may not want all of it. You may actually only want parts of the document. Again, maybe you've got all this other data or metadata, maybe it's not relevant to put in front of the users that left-hand navigation. You've got to do the aggregations, you've got to build that faceting so that people know when they get 10,000 hits for something. What's the breakdown so that they know that they can then filter deeper in on that access control is critical because all of your data again is not necessarily what people should be seeing, especially if it's sensitive corporate internal data. You may have different levels of security rules. Access control needs to go on that pagination. The simple thing of going to page three in the search results. For example, that's built in with us, you can just go to page three. I'm filtering again, you've clicked on that facet. Now you've got that, that refines your search personalization, especially if we're talking about ecommerce. Use cases. You might need to rank the different hits on the page based on the user that's actually interacting with you. You might need to blend scores. Because again, if you put something like a skew into a search box and you try and use in a semantic search model, text embeddings don't really know what to do with a bunch of letters and numbers. They're looking for human language for example. And so you might need to be using bm 25. And so if you're using just pure vector search, you may get really bad results. You may need to do things like actually modify fields like, you know, you might need to change the data over the wires, it's basically coming to your users.

And so it sounds like it's a lot and it is. But luckily with Elastic Search, you can actually do all of those things that you saw in just one api call. So again, you don't have to visit this basically, you know, cornucopia of microservices, you can just do it all in one search.

So looking at it in different ways, we think about kind of the superset of what you need to do. these use cases. Again, there's storing and searching vectors. There's the ability to create these embeddings from the human, the language that goes into them. But then again, all this other stuff that goes around and again, all these features that you see are where most vector databases are. They're, they're going to store and search vectors. Again, there are some vector databases that don't search the vectors. They're actually used for data science and things like feature stores, metadata gets you your vectors for example. So not all of them actually care about searching them. Some of the vector databases allow you to turn natural language into those vectors. So you can do search or the embeddings for indexing. And again, Elastic Search has that superset of everything that you need to run the use case.

So again, that's why we call it the Elastic Search Relevance Engine. It's basically all the tools that you need. All the APIs that you need to call in basically one cluster so that you can get it done. You get all of the search capabilities plus vector search, you get the ability to host your own vector models. If you pull them off places like hugging face or if you've got your own custom models, you can also integrate with third-party models. You get things like reciprocal rank fusion so that you can easily blend things like your bm 25 scored searches along with your vector searches or multiple vectors. At the same time, you can use our proprietary model, which is called Ulcer, which is a different way to do semantic search that doesn't use vectors.

Again, it's another way to get Symantec Search and we're well integrated into things like third-party tooling, like LangChain. So let's get kind of deep into the API for a second and see, you know, what, what does it really look like?

Again, if you're, if you're working with the Raja, so again, you're obviously going to format it for your users and whatnot. But you know, the first thing like I said is is that doing K nearest neighbor search, you know, the first thing that you're going to need to do is get the document back.

So again, everything might be way too much because if you put that dense vector in front of the user, that's obviously not what you want. You can see up there in the upper right, something as simple as saying source filtering or using our fields API to say, I just want the titles that allows you to just pick up that particular part of the document, get that in your list of hits and now you can format that and display it for your users.

So again, I mentioned you might need to change values. So let's go through a simple example of changing the price. Let's say you've got a list price that you have in your indexed data. Let's say you've got a sailor, let's say you've got some sort of discounting going on. You can actually change the price at query time when you ask the data and you can pass in the parameters once you know your users, once you know the discounting, because it's based on a scripting language. It's not just simple math that you can do, you could actually do category based discounts if you wanted to over on the right side.

Again, this is a little bit of a contrived example to say that, you know, hey, we're looking for a port in the data and it needs to be 443 to be considered secure. But you can start to encode business logic on what you want to display. Like you could do conditionals, you could look at the data in the documents. You can make decisions you have that power with Elastic Search to basically change the data on the fly.

So what if you need to do personalization after finding your results? So again, if we're talking about K and N, you know that ability to find those nearest neighbor vectors in that same API call, we can do the first search to go find your candidates. You can see right below that in the query API, you can do things like pass in another query vector, a personalization vector for example and then do computations to reorder the results that you found using your nearest neighbors so that you could reorder results or personalize or whatever else you need to do with it.

So again, analytics, the ability to run our aggregation framework on the data. If you get a large number of hits, when you do a nearest neighbor search, it is no problem for us, for example, to go ahead and just aggregate over the data. The way that you found it was by using nearest neighbors, it may have been bm 25 maybe it's blended together. But you still get the ability to get that fastening which can then lead to filtering. And so your users know what they're looking at when the results go well past that first page of results that they're ever going to look at.

So I focused there very much so on what it's like for the developer experience. So again, anyone from operations of the room, anyone's actually got to run these production use cases. This is the slide for you because vectors have been very deeply integrated into Elastic Search for those that know our capabilities.

You know, we have been horizontally scalable and highly available from the beginning. So again, if you're doing vectors with Elastic Search, you get the ability to leverage that there's nothing special about it. Vectors just work with the rest of your data, you get the ability to do updates and that means both document updates because again, you can change your mind on things, you can update your vectors in near real time.

So again, as you change your mind about what vectors may be the metadata about them, you can update your data just like you have before. You can also update your cluster too, which means that you can do things like a rolling update on your vector. You know your vector search use case again, you can take a zero downtime upgrade with us because that's the way the clusters are built.

Some of our other capabilities that have been added over the years are things like disaster recovery. We call this cross cluster replication Elastic. So again, if you have a requirement to have a disaster recoverable vector search use case, all you need to do is just turn on cross cluster replication between your two clusters, pour your data into your production cluster and it will automatically show up in that DR cluster ready for use whenever you need it.

You also get the ability to do federation. And this is a technology we call cross cluster search. And this usually comes up in my use cases where I've got data sovereignty requirements or if you've got geo federation requirements, sometimes you've got to store your vectors in the country where they live and you may not be able to, to bring them back to a central place to try and get a global view with Elastic Search.

You can have vectors in as many places as you need to hundreds if you actually need to and search from a single federated search head and now get the correct answer even if your vectors have to live in different countries. Thanks to lawyers and again, role based access control because this is all in one API call protected by rosed access control.

You don't have to try and stitch together a federated architecture of security if you're relying on microservices, because again, it's going to be a very stressful thing. If you're responsible for making sure this use case is secure, to secure all of these different disparate microservices with Elastic, you just define the role and it applies to all the things that I've been talking about today.

So it's all part of vector search, all part of Elastic Search. So again, what is our plan? You know, again, as I like to say, we want to make doing vector search with Elastic. Excuse me, we've wanted to make doing vector search with Elastic Search just as easy as you've always been searching with us.

So just bring your data, just bring your queries to Elastic Search. You can turn on vector search and search your data with a new wave without having to change really anything about how you got the data there or how you got your queries there.

So let's dig a little bit into, you know, some of the, some of the things that have changed over the last year. So again, we haven't just you know, built vector search and, you know, dusted it off and said, all right, it's done. We're, we're not going to change it. We've been very, very busy in actually improving the vector core that's inside of Elastic Search.

Again. Let's talk about what we did over the last year and what's coming down the road for vector search specifically. So, you know, spoiler, we didn't stop working on vectors. Yeah.

So the way that Elastic Search is built is is that we're actually built around Apache Lucci. So again, we have many of the committers to Apache Lucci, it's open source. So again, everybody can leverage a lot of the work that we're doing. And again, our goal is to make Lucine, you know, the world's best vector database.

Again, we want it to be there with all the other data types. We want all the power, all the capabilities we've added vectors very, very deeply into into Lucine Elastic Search, which basically wraps it up, makes it into a cluster so that you can use it via APIs.

Again, all of the stuff that you do with Elastic, we want to make Elastic Search, the most comprehensive and simple search platform for general. So even just building on vector search, there's all sorts of things that we're working on to make developers' lives easier.

And we also want to be really the most open member of the generative AI ecosystem. Again, it's going to be all about sharing your data properly with large language models of your choice. So again, you may be using things like LangChain or you may be using different models from different places. There's no requirement that you choose one and only one, for example.

So we want to be part of this ecosystem that you're going to be that you're choosing to use. So again, we rely on Lucine as a vector database. So for those that know us from the early days, again, if you've been doing anything with regular bm 25 search, again, believe it or not, these are actually sparse vectors. They just happen to have names. They're the tokens, we store things like the position so that you can do regular query bm 25 style scoring.

You know, if you put a sku into a search box, this is going to be doing the hard work of returning, you know what that, what that sku lives in for e commerce, for example, positions allow you to do things like phrase queries. So again, the ability to do things next to each other, all the things that go into doing inverted index based searching.

But then we've added the ability to do dense vectors. And again, this is the concept of transforming human language into a vector representation. And again, it's got a certain number of dimensions, whether it's, you know, 512,024 1536 we're going to supporting all the way up to 4096 dimensional vectors in Elastic Search, which is a lot. But we know some people need it.

So we've made sure that we can catch vectors of nearly, you know, basically the biggest models out there. We want to make sure that we've got plenty of head room for your, your dimensions. Should you need it?

So the algorithm that we use to do vector search for approximate nearest neighbor is called hierarchical navigable small worlds, which is a fun mouthful to get out in public. If you try and say it on the stage, the way that it works basically at a high level is is that if you've ever, you know, used something like a, you know, a map based thing on the web, you know, you kind of if you're starting at your house and you're going to go visit another country in one of these map applications, you don't scroll at street level all the way across the globe. That takes way too long. What do you do? You zoom out, you move over to a continent, you zoom in, you move to the basically state or country that you want to visit next and you zoom in again and you can find the city and you zoom in again and then you can find that street level.

This is effectively what happens is that at the high level, we make these giant swings across the globe. So to speak of, of where we are in your data, we'll drop down to the next level. We'll move over to get closer and then we'll drop down to the lowest levels and actually find all the nearest neighbors.

So again, we've prep what lives next to what, so that when you ask for it, we can tell you these are the things that are closest to your input. Now, it's not absolutely perfect. There's going to be a little bit of accuracy that's lost, but you get a whole lot of speed and scalability to do with it instead.

So let's talk about additional things that we've been up to. So ways that you can work with us. Again, you can do an approximate nearest neighbor search by passing in basically, you know, a text embedding vector. So again, we want to make it though so that you can do things like say sorry the query there is basically that large vector thing. So again, this array of floating point numbers, what you see there is also we've got things like filtering the ability now or we've had this from the beginning of K nearest neighbor search to be able to do things that you know, seem easy but are actually quite complicated.

If you want to do a abased search with a geospatial filter and abased filter. Again, if you're doing, let's say a restaurant recommendation app and there's only so far you're going to walk or travel and you want to see temporal accuracy. Again, maybe you don't want reviews from 10 years ago, maybe you want reviews over the last month and you want to do a vector bases search with a filter on like i like this particular type of cuisine Elastic Search will easily allow you to put all of that together.

So you can take the search box and all of this other metadata and actually return just those vectors that make sense. So again, we want your life to be easy. So you don't have to do any post filtering. This is all pre filtering which means pagination works, aggregations work with it. We want your life to be easy.

What we're going to be moving to or what we've got today in moving to other features in the future is that we've got native embedding support. So again, you can bring a vector and do the search with the vector. But again, what's easier copying the search box, your users have typed into your search box.

So again, by using our native embedding, you can basically run whatever models you want multiple even you know, you could be doing one for title one, for body one, for other chunks. It doesn't really matter to us. You can take that search box and turn it into using that same model exactly what's stored in the index so that you can then search for it again, copy the search box and perform vector search without any additional stress on your use case personalization.

Again, whether it's a rescore phase or whether it's a script score phase, you can again pass in, you know this now personalization factor and then use that to basically affect the score. So again, KNN will find your candidates rescore will allow you to reorder the hits. And again, you've got a whole wealth of capabilities with script score to be able to do things like blending in temporal information or other keywords

You basically have complete control over the scoring algorithm to get exactly what you want for your use case. So let's talk about some of the stuff that's really low and deep under the hood.

So again, vectors are tons of math. So when you think about it, you need to do math very, very, very quickly. So one of the enhancements that we've had over the last year is what's known as SYD operation, which is single instruction, multiple data, the ability to do more work per clock tick on the CPU.

So vector search is full of floating point operations. So you've got to do basically math on vectors. So you've got to do things like you know, add those two numbers together, you get a 6.5. Now, floating point numbers are four bytes each. So we're asking for a whole eight bytes worth of data. But what can CPUs actually do? They can do 64 bits or higher in terms of operations.

So what if instead we did a whole bunch of math in every clock tick? That's exactly what we did. We've actually basically pushed to get some experimental things into the JVM, which is what you know, were written in, written in Java. So we basically had to work with that community to be able to get this basically going as fast as possible.

So now we've got single instruction, multiple data where now we can do basically four times the work in every single clock tick. So what does that mean for you? Depending on your CPU architecture, you get four times or 16 times faster depending on the CPU basically performance for vectors. So it makes you getting your life easy. You don't have to do anything. It's just enabled for you an Elastic Search.

Another thing this is going into a little bit of the weeds of Elastic Search. And so, you know, every index is actually really one or more shards. As I like to say, you know, when I'm training people on Elastic Search, the index is only in our minds. Everything is actually happening down to the shards.

So again, fun fact, one index with six shards is exactly the same as three indices with two shards, two indices with three shards or six indices with one shard. It's really the same amount of work for the cluster. So, you know, lay your data out as you want to. But the subatomic particles under the shards are actually segments.

Again, for those that like, you know, want to really know the deep stuff. This is actually a complete Lucene index and there's all sorts of fun stuff about merging them that we could talk about for hours, but we won't today. But do know that each segment has its own basically index for the vector.

So again, this hNSW graph, there's one per segment. And what had happened in the past is, is that well, we've got many, many segments. Again, the ability to take live data and make it searchable as part of Elastic Search. Again, we have many, many segments to do that per shard. And in the past, we've done one then the other, then the other, then the other, you can kind of guess where the optimization might be coming here.

What if we just searched them all at the same time? So that's actually what we've enhanced in Elastic Search. The ability now to do parallel searching across all the different segments in your vector. So again, if you've got fast moving vectors or vectors that change at all, again, everyone's got data that's moving this basically speeds up your use case for you.

Another thing that will be coming down the pipe is is that while we started with this for vectors, we're actually enabling this for all types of search and Elastic Search. So you're going to see another jump in Elastic Search performance just in general search because we're going to apply this same technique to be able to search everything in parallel wherever we can to speed you up. This is one that I'm super excited about because it's again, the hard work of data scientists that you just get to basically harness for free.

So as I mentioned, you know, floating point numbers are they, they take a lot of space, you know, they have a lot of precision. You can express a very precise number with floating point numbers. But many times we don't really need to, you know, they take four times the number of dimensions to be able to store it as a float 32 it takes four bytes to store all that information that's full fidelity. But again, we're doing approximate nearest neighbor, we do not need that level of precision to find you the nearest neighbors. Again, we know it's approximate.

So what can we do about it? Well, what if we stored that data instead is an in eight which is now only one byte per dimension or in some cases if we can pack it in even further? What if we could get away with an in four which now allows you to use only four bits of that bite so that you can pack that data in.

So what this allows you to do is for, for those that have been sizing and working with hNSW for high speed. It requires basically to live itself in RAM to be able to be ultrafast so that you can have a public search box. And those vectors work at the full speed that you need them to four x. the number of dimensions. Remember we talked about a 4k limit. If you had a 4000 dimensional vector, that's a whole lot of RAM that you need to run your use case at full speed, you can get four times or maybe eight times the amount of vectors without actually changing anything.

So this is going to be something coming tomorrow. What it's going to do is take these float 32 embeddings and you know, basically work them down by turning them into in eights. And the way that it's going to do it is, it's actually it knows what data lives in the segment. So again, kind of working you through a little bit of the magic behind it. It's going to go look at the range of all your values and basically remap them into the integer space or maybe the enforce space so that you can get away with basically automatically quant your data.

So you've got a model, it gave you floating point numbers, but we're going to magically transform it into integers for you so that you get all that RAM back. And it's basically going to cost you nothing other than a little bit of work at index time. So again, something I'm very, very, very excited about four extraction in space means a lot, almost zero accuracy loss. It's going to speed things up to boot too. You don't just get more RAM it's actually going to speed up search because the math is easier with integers. So it's a perfect fit for us.

Things about hybrid search again, you know, as i covered if you want to do geospatial temporal vector search at the same time, yes, you have the full power of our query DSL to help filter those results. You can combine vector and sparse search again, whether you're doing bm 25 scoring or you're doing our text expansion query, which works on our ulcer model, you can combine those the way that you can do this easily.

Again, the fun part about vectors and bm 25 and all these different algorithms is that nobody sat down and agreed on stack ranking for like what should the score ranges be? So a bm 25 score and a cosine similarity score are sometimes 3 to 456 magnitudes away from each other. So if you just take the raw scores, it could be a mess. And so reciprocal rank fusion is there to make your life easy as a developer. If you want to do bm 25 and three vectors at the same time and blend the scores together so that the best candidates show up as your top hits, you can use what's called reciprocal rank fusion. You just basically turn it on in the query and it makes sure that your top candidates are always there for you. It basically just works out of the box, but you still get that capability if you want to do linear boosting.

So again, if you know what your ranges are going to be, you've done your homework, you know your values, you can still maintain tight control by boosting the different clauses so that you can manually stack, rank them. Again, we didn't take it away from you. We just gave you the easy button if you don't want to do it.

So sometimes you really do need to fine tune your weights, user experience. Um this is also useful in re scoring. For example, this is one that's coming again is another way of similarity. Again, we've got cosine similarity product and one more that's slipping my mind always does on the stage, but we're going to add another one so that you can get maximum inner product. And this is important because normal things are about how far they are in space.

So maximum inner product as i've been doing the reading on is that actually looks for magnitude is actually part of it. So again, it's another way to encode more information to get more precision in your results. So it's considered to be non euclidean. I'm not a mathematician. So i'm not going to elaborate on that too much yet. I've still been looking for the a is not nearest itself, which is like, it's like a zen cohen almost of like, how could you not be your own nearest neighbor? But apparently math is weird.

Um so I'm still looking for the answer to that one for us all. Um but agent is w and lucine is actually fine with this. It's just math, even though it may be weird, we don't understand it. Maybe a mathematician could explain it to me later. But the goal is for us is to give you yet another way to basically store that similarity data so that you can get what you need out of your use case.

So it's in lucine today which means again, everything has to go in there first and it's going to be Elastic Search tomorrow. So it's halfway there. This one is very, very, very exciting. It used to be tomorrow. It actually came out in our very latest version at 8.11. So um passage vectors.

So again, for those that have been doing vector search, there are token limits to be able to create vectors out of data. That can be something sometimes as small as like 100 and 28 tokens, sometimes it's 512. And unless you're indexing, you know, things like instant messages, almost all the data that we have to deal with is going to be well beyond the token limits for any particular model. And if you just throw the whole document in only encode that first bit and throw the rest of the data away, that doesn't really solve your vector search use case.

So again, what we have added into Elastic Search is the ability to take a larger passage and to be able to put it in the right data model. So we have a capability called nested objects, nested objects allow us to basically break the text into individual chunks. So again, fun fact inside of Elastic Search, if anyone's been doing logging or observ with us, it is totally legal to have a message field that is an array of log messages, almost nobody does it. But it actually is something that you can do.

So holding these things in an array of different passages allows us to break it up into small enough chunks. So that each and every one of these can actually have a text embedding model run on them. Storing each of these different text embedding scores allows us to find again the nearest neighbor based on the different values that are in there.

So again, basically we get all the vectors necessary to represent your large document. And what this solves you from doing is having to do things like field collapsing or den normalizing or copying the metadata. I mean, you can imagine one document that could be like a million documents if you like took some giant omnibus bill or something from the government.

And you actually tried to text embed the whole thing like it can be a lot of data with this. It allows us to store one document with just as many vectors as we need to represent it. So what's nice is is that it's going to find the document based on your inputs, the passage vector is going to basically find the original document for you. So it's very, very useful to be able to get basically all the different concepts semantically that are in that document and find them easily.

So we've got passage vectors in Lucine, it's in Elastic Search now. And the other great thing about it is is that for those that have ever done highlighting with Elastic Search in BM 25 it makes sense, the search box matched as the document. So you can find what part was there and then highlight that for the users. But highlighting doesn't really work in the world of semantic search. You can't figure out the tokens that were in the search for what they match you have to find the vectors.

So because you've now got the passage vector and you've got the text that came with it, you basically now get the highlighting of where it was in the document. So you can show the user what part of this giant maybe 150 page PDF that you indexed with vectors. They can now skip exactly to where they need to go.

Why did they find this document? Here's another fun one to think about. We're early in our vector search journey in a lot of cases. So what if you're going to change the text embedding model over time? You know that, that sounds hard. You would never do that. I'm going to tell you the text embedding models will change over time. And so you're going to end up with a situation where you've got historic indices with older models and you're going to have new, new indices with tomorrow's new hot and exciting text embedding model. And they may not be in exactly the same space.

So you know what are you going to do about it? You're going to have to combine search results for multiple indices with different text embedding models. And if you think about it today, you have to copy the search box and you have to figure out what model you want to run and all that and, and to get the vectors. Well, we thought about this and we're gonna make it easy on you. What we're going to be able to do is actually move the model into basically the index metadata about where do the vectors come from, which will allow you to even easier copy the search box, search multiple indices with your k nearest neighbor search and have each index run the right model to turn the text box into the vectors to search in it and then we blend all those scores together for you.

So again, we're going to make it easy to search new vectors and old vectors. At the same time, it's going to take all the stress out of the situation for you. So we didn't have time to touch base on everything. So there's lots of other stuff coming, you know, there's optimizations that we've done with Ulcer, which is our other way to do semantic search v two is out. It's now GA we're working on getting hardware acceleration. So for those that are excited about you know GPU acceleration and things like that, yes, we are starting to dabble in GPU acceleration to make models run even faster where we can.

And we want to make inference a first-class citizen in the stack. So there's plenty to talk about on that. But know that in the future, you'll be able to actually bring your data to ElasticSearch and maybe talk to another service to be able to get vectors or other natural language processing or any other ML on the data through ElasticSearch so that you can just again bring your data to the cluster, bring your queries to the cluster.

Again, we even want to make the search experience even easier for approximate nearest neighbor search with vector. So again, get ready for some API changes to again make it even easier, make it basically the user experience for both the developer and the end users to be even better.

So again, as we like to say, for the road to production, we want you to be able to just bring your data, um just bring your queries and get the answers in front of the users that they need. So I've talked a lot about, you know, theoretical, you know, things that we've done, but let's, let's, you know, kind of change gears for a minute. I'm gonna hand it over to Fahad to talk about an actual production use case that incorporates a lot of what you've been hearing today.

Thanks Michael. So before I begin, I'll actually invoke the same disclaimer that Michael said, which is I'll be making a lot of forward looking statements that shouldn't imply any Adobe commitments.

So with that, what is Adobe Commerce Intelligence solutions? Well, sitting at an intersection of data and machine learning. Adobe Commerce has products such as Lifeserch, which is an AI powered commerce search that provides fast and relevant searches and results to shoppers, but equally important. It provides intelligent merchandizing for merchants. Another popular intelligent commerce solution is product recommendations. This as the name suggests provides you product recommendations. Visual recommendations is a part of it has dedicated reporting, event collections and so on and so forth, but let's get basics out of the way.

So in the context of commerce, search is not just search. So search is not only the search bar that I'm sure a lot of you have already used when you're shopping online, but it is also your sidebar navigation. It's also the category pages. It is also intelligent facets. It is the getting the first row of products just right and personalized for your shopper, right? All of that starts. The whole product discovery journey starts right at the beginning.

So let's do a quick background before we get into the future of AI filled shopper experiences. So basically driving GMB or gross merchandise value through AI field experiences and these are some of the major key numbers that are well known in the world of commerce.

86% of customers say that personalization play an important role in making a purchase.
62% of customers say that they would pay extra for brands that cater to personal experiences.
35% in this statistic is a very well known statistic coming directly from Amazon. 35% of Amazon sales total sales are set to be attributed to personalized recommendation.

So how do merchants generally drive GMV from an ecommerce site? It basically depends on three levers. Pretty simple shopping, traffic conversion rate and average order value out of these. The last two component conversion rate and average order value is where we focus on. This is where our AI filled shopper experiences fit in. From our own numbers, Lifeserch merchants see a net positive 15% lift in conversion rate and 20% average order value increase after launching.

So let's go into the future of AI field experiences in commerce. One thing that I believe is going to happen is shoppers are going to rapidly transform their expectations that they have from search currently. As we know most commerce search engines are using inverted index. There's a lot of things that are missing today, but they will become, they will become table stakes pretty soon.

For example, it is difficult to capture semantics. That causes your results query for genuine relevant documentation, multimodality search is going to be table stakes. There's a lot of data that is locked into videos and with the explosion of short videos, this is not gonna be a luxury anymore but table stakes.

Furthermore, from the merchant side and this we're going to dive deep shortly is it requires sanitized description. And this is a common scenario in most customers where they have old documentation for products. And it requires that sanitization to make the content more searchable while we resolve. For all of this, we need to remember that we still need precise faceting and aggregation and sorting and filtering. So this needs to be a holistic solution.

So let's look at our use case of enriching catalog content to improve recall. Just imagine yourself that you are an auto parts company and you're selling auto parts for makes and models of the last 50 years. So just imagine one car, how many auto parts they have? And now you have for all these different cars for the last 50 years, this data is trapped in your ERP systems and I kid you not, there are companies out there that export this data out into Excel files and these Excel files get disseminated. There's somebody who knows the very well and who knows the C or SEO very well. And then they look at these product description that is filled with technical jargons abbreviations that consumers are not going to use to figure out where you or they're not going to use that language in their searches.

So they sanitize these product description, these are millions and millions of documentation. So this is a non deterministic solution plus you have diagrams and images and things that basically you're leaving on the table with inverted index.

So there are some options available to us today. This is definitely not an exhaustive list. One is manual sanitization like I discussed. The other one is one that Adobe Lifeserch gives you today, which is it allows you to create one and two word synonyms. So this is somewhat of an immediate relief because you know, once you know about your popular products and you know where your customers are and you know where your product descriptions are, you just create those synonyms. So basically, it is a poor man's version of lexical expansion where a human being is the LLM essentially.

And then there's another thing that, that we actually tried but actually unsuccessfully, which is generating text using LLM. So the idea is you take this massive catalogs, millions of product description, put it into an LLM of your choice and out comes the you know product description. Now you still need to audit these things. You still need to guard against hallucinations and the relevance and the quality of this description is still iffy.

And that leads us to the one that we're going to talk about, which is lexical expansion using fine tuned specific domain models, right? So when I'm talking about domain specific models, think about all the verticals that our customers are in. So it could be auto parts, it could be healthcare, it could be retail and so on.

So assume that this is embeddings of raw product documentation. So you have two documents here, you have D1 and D2 and we're just presenting it in a vector space and then you get a query. Now this will produce not a very good recall because the vectors D1 and D2 are equidistant from Q, why they're equidistant because again, consumers have a different vocabulary than what your product descriptions are written in.

Now, if we overlay that on a fine tuned model, and then enrich our documents with lexical expansion or in other words, the LLM will give you related terms and their weights that you can then index into Elastic. Then that will give you D1 prime with that vector is closer to Q. And now you'll get better recall because it will clearly tell you that when it's ranking, it will rank D1 ahead of the two.

Now I have to step back and say that I did a few things just for simplicity's sake here. The first thing is that while you index these things using vector search and using LMs, you don't have to query using vector search. You can still use BM25 depending on your use case, right? So the D1 prime could be vector search, but you can also use BM25.

The other thing you could have done is query itself can go into lexical expansion and produce a Q prime that may produce better relevance if that's what you need, which brings me to my conclusion, which is this is going to require a lot of experimentation. One size is not going to fit all as Ayan previously alluded to is foundational models are going to become commodities, are increasingly becoming commodities very soon. And you will need to put fine tune specific domain models to get that relevance. That is going to be the differentiator, but that's not it right. There's a lot of other knobs to tweak between and the trade off between cost and performance needs to be made. And then you have to worry about hybrid approaches, the blended weights or re ranking. How are you going to do that with a subset of relevant documents that you get? You have multi modal search that is going to be table stakes, right?

So all of that goes to one big point that is if you want to do this, you need to have a flexible platform that will allow you to do this experimentation. And with that, I'll conclude my intervention of this talk. Thanks for listening.