Power Amazon Bedrock applications with Neo4j knowledge graph

最新推荐文章于 2024-09-17 06:49:23 发布

李白的朋友王维

最新推荐文章于 2024-09-17 06:49:23 发布

阅读量113

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134860681

版权

Today, we are going to talk about how you can leverage the power of Neo4j and graphs and Amazon Bedrock to create a gen application.

I'm joined by Ben Lackey from Neo4j who's going to do a demo and I'm Anthony Prasad, Senior Partner Solutions Act at AWS.

So I'll walk you through the understanding of how we're going to come up with this architecture. With that being said, we'll get started.

So think about today's world, right? If you look at it, if you execute your LMS out of the box, you will connect to data sources, you can pull everything, extract the data and create your applications. Absolutely.

But guess what? If you dive deeper, you'll understand there are a lot of issues, a lot of hallucinations there, errors, right? Why does that happen? Because this is LLM that you used is only going to generate data based on repetitive phrases, right?

There's an overuse of templates, for example, known patterns can only be used and you might get text results that may not make sense. So you don't really understand the context.

For example, you could be talking about like you could have a question about soccer, but then the response would be something about basketball, right? So it doesn't really relate to exactly what you want.

And sometimes what we've noticed is in your output, you're gonna have generation of blanks or fillers to actually complete your textual output. So these are some challenges we've noticed right now.

Why use Neo4j in this whole ecosystem? Right? LMs can be made more accurate by fine tuning, few shot prompting and grounding, right?

So grounding is going to actually eliminate your hallucinations for one and then fine tuning is actually going to ensure that your LM is focused on specific tasks. So you provide an input parameter and it's going to make sure that it zeros in on exactly what needs to be done.

The other important aspect is we need to eliminate bias, right. So, grounding is also going to eliminate your bias, which is one of the problems that we had before and last but not the least I want to talk about vector embedding, right?

So Neo4j supports vector embedding. And what that entails is you have both semantic search and your vector search as well. So you have more visibility, not just the extraction of data, but you can traverse through relationships. And that's the whole idea of graphs, right?

Think about using secret languages language can yes get you outputs from your desired input. But who is going to tell the story? How do you tell the relationships.

For example, you want to look at patient data spanning three years ago. What were the drugs administered to this patient? What are the clinical trials happening right now? What is the future state? All these can be navigated through your graphical representation.

Now, this is a simple reference architecture that we have come up with and you don't have to use this as this. This is a template blueprint and you can customize and build your own application.

We are dealing with an application, we are going to create a simple chatbot which is going to give you responses, right? So we here we have two phases. Basically, it's that simple ingestion layer into Neo4j's knowledge graph. And then you're consuming that data that is in your knowledge graph to get an output for your user.

For example, you're gonna use Amara Bedrocks cloud B two LM to actually extract data and convert that data into your cipher text, which will actually be retrieved by the user which could be a chat bot or it could be a customer service representative or even recommendations. Let's say you have a retail ecommerce website and you're looking for personalized recommendations based on your purchase history that can be provided to you based on the data capture.

Now, in today's demo that Ben is going to show you in a couple of minutes. What we've built is a really simple chatbot application, right? So we have your input data that could be unstructured data, predominantly XML files from text. And then we're using Amazon SageMaker Studio to invoke a job to run Bedrock and with radio application to actually build your entire interface.

One of the most important aspects to understand is with using LMs, you can actually use prompt engineering, right to provide more targeted aspects of your and you can extract very specific information.

In this use case, we're using a financial data set that is going to have your account manager assets, the address map to it. And there is one way to showcase how we can pass the relationships and extract the information.

Now with a simple chatbot, you have these tips, right? First, the user asks the question, you have the line chain, it's going to extract, it's gonna convert that cipher text, right? And then using Neo4j is gonna generate the cipher and query your Neo4j graph database and provide the result to your end user. Is that simple?

And I want to talk about the RAG workflow here, right? So Neo4j is used as a vector database which includes both similar and semantic matches right now, this ensures that your LM is better than only a vector, only search or vector only approach. Why? Because you're having additional information, right? Because using your nodes on the left, you can traverse the relationships.

For example, if you need to understand where my data is coming from and why I'm using this data. This will explain what you need to do with that being said, I'll hand it over to Ben to present that in demo.

Ok. Thanks Anthony. Uh yeah. So let me uh switch over between machines. So I'm gonna show you a demo of the architecture Anthony just described and it's kind of a two part data flow.

Uh one part, we start with some data and we use Bedrock to process that data and build a knowledge graph that we store in Neo4j. And then part two, we basically slap a chatbot on top of that thing. So you can ask it questions and it uses the database, the Neo4j database in a RAG sort of pattern.

So uh query goes into Bedrock, Bedrock, spits out uh a cipher statement, cipher statement, run against the database. We get the response back and then we present that to the user uh that uses LangChain underneath.

So everything I'm gonna show you public on GitHub. So uh you're welcome to go tinker. Look at what we did uh offer suggestions and so forth. Of course, I'm not gonna go through all the painful steps of cloning that repo and walking through that all since it probably take us like half an hour or so.

Uh instead I've sort of done the Julia Child thing and pre baked a little bit of this stuff. So this is my AWS account. There are many like it, but this one's mine. Uh and within it, I've got a SageMaker domain setup and this is the single user setup, the new thing they sort of introduced recently. It's pretty slick. So you don't have to make VPCs and all that nastiness.

Uh one click, well, one click and then you wait five minutes for it to deploy and then you can connect to it but good stuff. So uh within that domain, I've got my uh SageMaker Studio running and within this SageMaker Studio, I've already cloned a copy of that repo we were just looking at and you can see all that over here and the repo consists of a couple of different parts.

Uh we've got a notebook that loads some data from some S3 buckets and the data we're working with here comes from the SEC's EDGAR system. So the SEC being the Securities Exchange Commission, the financial regulatory thing in the US.

Uh they publish all sorts of data that they collect from companies they regulate and the data sets we're looking at in here are specifically something called Form 13, which is if I'm an asset manager who manages $100 million or more. Once a quarter, I'm required to file and disclose all the things I've bought.

So for asset managers think companies like Fidelity and Vanguard and in this, they're disclosing that they own shares of Ford or General Motors or whatever.

Uh so we're parsing those documents uh using Bedrock and that's what this notebook walks through. And within the notebook, we're taking this sort of prompt driven approach. It's kind of funny if I were giving this presentation six months ago, I would have told you fine tuning is the way to go. We should go fine tune all this, but here we are and we're prompting.

Um I don't know what it's going to look like in six months, but I'm sure it will be something different at any rate. Uh I'm telling Bedrock a little bit about what I want from the data with these prompts.

So I'm saying that for a manager, uh an asset manager once again, so think Vanguard Fidelity sort of companies. I want to pull out the manager name their street address metadata like that. That's kind of interesting.

And then for each filing, uh I want to pull out, of course, the name of the company. So I think, you know, for Ford or whatever. Uh and then some other things like the number of shares owned the value of those shares and so forth and uh financial services industry thing called ACI uh think basically I do it for financial instruments.

So we do those prompts. Uh we tell the notebook a little bit about uh the Bedrock we wanna connect to, we have a few wrapper functions. Uh not doing anything too exciting, just wrapping around the Bedrock APIs we're gonna call and then within here we pull down one of those files and we sort of get our first look at what the files look like and, uh, they are exactly what you would expect from the federal government.

Uh they're a mix of XML and tab delimited data just jam together because you know, why not? Let's mix data formats. Uh and what's kind of cool about this is a long time ago. Earlier in my career, I wrote a parser for this thing and it took months to write it and then years to harden it.

Uh and we're, we're kind of skipping ahead and doing a lot of that work with gen AI very quickly. Uh so you're effectively trading a whole bunch of developer hours to make data processing a lot easier.

Uh uh we're burning a lot of compute in the process, of course. Uh the, there, there's a reason Nvidia is doing very well these days at any rate.

Uh if we look down here, we see semi structured data, then we sort of get into the XML. Uh we're doing a little cheating string processing here that we could probably do within the LLM. Uh that's sort of a future thing where we'd like to extend what it's doing a little more.

Uh and then we take a look at the XML blob there. Uh finally, we get to what we actually pulled out of all that nastiness using the uh LLM. And what that is is kind of some metadata. Here is the remaining transcript formatted for better readability:

So we've got here this QSIP, this is apparently from Apple. It's the value of this and it's 100 shares and blah, blah, blah.

So what we've done here is we've done something that you might have typically accomplished with like an ETL or ESB tool. And instead we're throwing a whole bunch of LLM compute at it to yank this metadata out.

And then of course, because this is the Neo4j talk, we're going to feed it into the database and build a knowledge graph out of it.

And one of the questions you might ask is, well, there are many databases. Why would I build a knowledge graph? What's the point here? And there are myriad answers to that.

One of my favorites is this whole LLM space is very natural language focused. And you may remember back in grade school when you were diagramming sentences, you know, noun verb, all that stuff, you're building a graph there.

And there are these very natural mappings between natural language and graphs. And that's part of the reason that backing LMs with knowledge graphs works really well.

It there's not this domain impedance, mismatch thing. You might get turning rows and columns back into natural language. And we're seeing a lot of success from our customers doing that in domains like financial services, of course. But also in manufacturing and logistics.

If you think about things like say a bill of materials, it's very clearly a graph. One example would be a car, a car has at the top level, its whole self. And then it's made up of a number of components, things like the engine which contains a piston which you know, on down.

So that's kind of part one of our data flow. We've gone through, we've processed a whole bunch of these files and we've built out the knowledge graph with them.

We can go take a look at what we did. But um before we do that, I'm gonna skip to part two of the data flow, which is the chatbot.

So here we're putting a chatbot on top of the knowledge graph we built and we're using it to query the knowledge graph and this notebook has a few different examples.

We sort of, we start off with an ungrounded example. So we're just asking raw Bedrock in this case, an Anthropic Claude v2. Hey, can you tell me who the top 10 investments are for BlackRock? And in this, it basically says, I have no idea what you're talking about. Which isn't surprising because it doesn't have that information.

As we go down the notebook, we're going to present it with said information. So I am gonna do the, the fun thing with any demo and I'm gonna run a bunch of this. So wish me luck.

Let's see, run selected cell and all below. So, that's running through now and you can see it importing photos so it can grab some stuff. Runs through here. It's making the calls into Bedrock now. So we're refreshing those answers and sure enough, and this is actually slightly reassuring.

The LMs as you may know, are very stochastic processes and we actually got the same answer, which is always nice. So if I scroll down here, these are the prompts, I was giving it.

So it sort of knows what questions are coming and some things about how to answer and I gave it some database credentials, told it how to connect to the database and so on.

And now I'm finally asking it some questions. So the first one is, what are the top 10 investments in Vanguard? And you can see it comes back with Vanguard's holdings and the result here isn't terribly surprising, you know, Vanguard pioneered index funds.

So Vanguard's holdings of course, mirror the larger market. So we've got things like Apple as the largest holding. So that's kind of cool. But oh and it just popped down a little further than I wanted to go.

So then we get to another question that I, I kind of like which is which managers own FANG stocks. So FANG, you know, Facebook, Apple, all that. This is kind of interesting in the, you're combining the knowledge the LLM has.

So the LLM inherently from its crawling of the internet knows what a FANG stock is. So you're combining that knowledge with what we've provided it in the knowledge graph that is these actual holdings and the LLM is able to sort of synthesize these together.

There's, of course, a little LangChain magic going on here that pulls all these bits together and allows us to work. Basically what's happening is that question is going into the LLM. The LMs converting thing into this cipher query, this Neo4j query.

We see. So it knows Facebook, Apple, Amazon and so forth. We then run it against the knowledge graph against that database and we get a JSON blob answer back, we feed that JSON blob answer back to the LLM again.

And the LLM summarizes the answer in natural language that we're able to present back to the user. So kind of cool use case showing how the LLM is better together with Neo4j, we're providing it with information, it wouldn't know otherwise, we're blowing past the time context window, we're blowing past token limit limitations we might otherwise have and just doing things that wouldn't be possible with either technology if just used alone.

So with that, I am running to the end of my time so I'm gonna switch out of the demo and sort of tell you guys how you can learn more.

We have these slides available if you'd like a copy, I think we'll probably send them to everybody who attended here, but feel free to email me ben.lackey@neo4j I will happily send you the slides.

One of my pet peeves is when I go to a presentation and nobody shares the slides. So those are there, there's a blog post describing this. There's the GitHub with all the code we just looked at.

There's a video of a guy on my team sort of walking through it. There's also a bunch of sort of more white paper like collateral. A Neo4j and generative AI page describing how the technologies work together.

Also a page describing how we work with Amazon and how we partner on the marketplace. You know, all this stuff's deployable through marketplace, of course.

And if you have any questions at all, I would encourage you to come by booth 1304, which is I think right back there like 50 ft or something. And a bunch of my colleagues would be more than happy to show you this demo in more detail and answer any questions you might have.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫