Amazon Neptune: New features for next-generation graph applications

最新推荐文章于 2024-08-21 13:37:17 发布

李白的朋友王维

最新推荐文章于 2024-08-21 13:37:17 发布

阅读量85

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134866157

版权

Welcome. Theater five. Is everyone still awake? Still have energy? Uh let me hear it. Yeah, my name's Anthony Pago. I'm a senior product manager here at AWS. Uh just, I've been doing a poll. I gotta do it for the last one. If this is your first re:Invent, raise your hand. Alright. Yes, that's great.

Um it is my first re:Invent too. I, I'm super excited to be here and I've, it's been a great couple days. Um I'm excited to introduce my colleagues here at AWS who are gonna do the last session of the day, Denise Gosnell and uh who is a principal product manager and Dave Beck Burger who is a principal graph architect. Come on out.

Try. Hi, everyone. How's it? Boom. Sorry. Hi, everyone. How's it going? Thank you for coming. I know it's been a long re:Invent. I know. I don't know how many miles I've walked, but I'm sure you guys have probably walked a ton as well. So enjoy sitting down for 20 minutes here while we talk about Amazon, Neptune Analytics and some of the new uh graph analytics and gen I capabilities that we just launched yesterday.

I'm Dave Becker and with me, Denise Gosnell, I'm a principal graf parker, talked on the Neptune team. She is a principal product manager on the Neptune team. And we're here to talk to you a little bit about what uh what, what our new release was.

So as many of you probably know and probably many of you have seen this slide or a very similar one, we, we support data workloads for your applications with the most complete set of relational and non relational databases and purpose-built databases such as Dynamo DB um DocumentDB Elastic Cache. And in our case, we're going to be talking about Amazon Neptune.

What is Amazon Neptune? Amazon Neptune is a fully managed purpose built graph database built and managed for the cloud. It's optimized to store billions of relation or store and map billions of relationships and enable real time navigation of these connections with millisecond query and response times. It's, you know, as I mentioned, this is part of a really built a way to scale uh your your architecture of your graph data.

But what is graph data kind of haven't talked about that too much yet. Graph data is data where you deal with the relationships and connections between data, where those relationships and connections between data are as important or more important than the data itself.

So when we do that, when we're looking at that, we look at how this works. Neptune is a per is a great example of how you can use this data and use this data at scale to be able to do this with a pay as you go model. But over the years, we've had some, we've worked with many customers and customers have said they've really wanted to make data discoveries faster by analyzing graph data with those graph data that contains tens of billions of connections and do this in such a way where you're looking at most to all of the graphs.

With that. we yesterday, I don't know if how many of you saw Swami's keynote announcement, we announced a very new product called Amazon Neptune Analytics. And I'm really happy today to have Denise Gosnell up here, the principal product manager to talk to you a little bit about what that product is and how you can use it to be successful in your applications.

Thank you. Thanks everybody and thank you so much for staying here for the one of the last talks of the day. We could not be more thrilled than to be announcing Amazon Neptune Analytics during re:Invent this year, which was yesterday as you just heard from Dave.

Now, Amazon Neptune Analytics is a new analytics database engine for working with your graph data in the AWS cloud. It helps data scientists, data engineers and application developers discover insights about that connected data that you just heard. Dave. speaking about within seconds, there's three main ways to think about what you can do and what is available in Amazon Ne Neptune Analytics.

And the first one I want to talk about is what our early access customers have been most excited about. And that has been that it's a single service for working with graphs at AWS. We're working on making it so much more straightforward for you to work with your graph data and analyze your connected data.

So when you're using Amazon Neptune Analytics, you're gonna be working with graphs provision to a certain capacity instead of having to manage all of the infrastructure in order to actually build and work with your uh graph analytics.

It's uh when you're working with your graph data, you use a popular open source query language called open cipher. And for those of you who are new to query languages, that query language is much uh much like sql.

And the second thing that our customers have been loving the most so far is how fast the engine is. It's 80 times faster for getting insights from your graph data within the AWS cloud where we can load about 10 million pieces of graph data into the engine per second.

You also are gonna be able to analyze billions of edges in seconds when you are using Amazon Neptune Analytics. And it's giving you much faster ways to be able to do those joins, which is why we care about working with graph data.

And then lastly and many reasons why you all might be sitting here today is that within Amazon Neptune Analytics, there is a vector search engine so that you can combine a vector similarity search with a graph data. And what that does that makes your similarity searches explainable.

It gives you the ability to ask questions like what is similar to this but then go through the explicit relationships and tell you why give you the context to make it much more explainable and understandable to have higher level conversations and make, make decisions and actions with your data.

I'm gonna give you a quick overview of the three main types of use cases that our early customers have been uh have been building with Amazon two analytics. And then we're gonna dive into a demo. I'm gonna show you how to build a graph and then show you how to work through some graph algorithms. And we're gonna do some vector search up here as well to get started.

The most popular way that customers are wanting to use. amazon neptune analytics is for their ephemeral ephemeral graph loads. You can imagine a data science team that might have a one off process that is building custom infrastructure, running custom code to run an algorithm and then get insights about your data.

Amazon neptune analytics is a really great way to spin up a graph load your data very fast. then you can run algorithms combined with low latency queries and vector search in any combination, it does have durability. so you can write back the results of your algorithms, you can maybe add new edges that you've discovered during your analysis and then you can take a snapshot of your graph to then do uh additional analysis and other and other other workflows.

The second most popular way that our customers have been using. uh the new graph engine is for doing low latency analytical queries. And that's been most popularly a part of their existing machine learning pipelines.

Say you have a machine learning feature table and you wanna add a new column to or n and of those in those columns, you wanna put an insight that came from your graph. Neptune analytics is a really great way to load up your data quickly determine that insight and then store it in that feature table for use in your established ml pipeline.

The third way that our customers are loving using Amazon Neptune Analytics is for doing vector search with graph data. We have an early customer who has been experimenting with how they can reduce the cost of their wet lab experiments by determining candidate drugs that may be relevant for testing in the pharmaceutical industry.

What they're able to do is they're able to take their uh proprietary knowledge graph and do a search of a vector that represents a protein. What that search does it gives you candidate proteins that might be viable wet lab experiments. But by combining that similarity search and maybe a path discovery traversal. they can determine if it's even worthwhile, giving them a lower total cost for their research uh for their research in their wet lab experiments.

So those are the three ways that our customers have been using Amazon Neptune Analytics so far before we get back to the.

Thank you. Let's do a demo.

Ok, thanks, Dave.

Alright. So where we are right now, we're just in the AWS management console. We're over in ohio. uh we have deployed Neptune Analytics to seven regions around the world and to find it, we're gonna start by going to Neptune and for those that have actually before we get into this, is there anyone here who has used Neptune before? Awesome. Thanks for coming.

You'll notice on the left hand side that we now have a new navigation that takes you down to an analytic section. So from here we're gonna go to graphs. You can see i built a graph just earlier and i need to grab the s3 location of the data. I'm gonna load for you. So, pardon me? One moment.

So, Denise, while you're doing that, is it where can i load my data from here? That's a great question, Dave. Thank you. So when you, when you uh spin up a graph here in the moment, you're gonna see that you can load your data from one or two sources, you can load it directly from an existing Neptune cluster or you can load it directly from s3, which is what we're gonna do in this demo and i had to go grab the ur i. So thanks for your patience.

So we're gonna start by creating a graph and we're gonna give it a unique name. The date is the 30th and this is the second time i've done this today. We're gonna create a graph from an existing source. But today's question that he just asked, you can create an empty graph if you would like. And then you can load the data through an open cipher call command or you can create a graph from an existing source source which can be a Neptune cluster or directly from s3.

Now, when you're creating and working with graphs, i mentioned earlier that you're gonna be working with an amount of capacity for the workload that you're doing. So we're gonna select the maximum capacity that we need for this graph. It's a small one and 100 and 28 memory uh mncu which is a memory optimized Neptune capacity unit. It's the same uh same type of unit as when you're working with Neptune server list. Except this is memory optimized, you're gonna provision that and it's a 1 to 1 ratio for the amount of g bs that we're gonna provision to run your workload.

So 100 and 28 here is gonna spin up the amount of infrastructure for a, a capacity of 100 and 28 gigs. When you're looking at the data you have on disk, it's about 7/11 the size of what you see to what you should select here.

When we load up, we are gonna pick the s3 le read only, roll and down here, we're gonna load directly from s3. Now we're working in us east too. And then lastly when you are creating a graph, you do have the ability to set some uh different availability requirements. like how many warm graphs that you would like to have standing by in case of a fail over if you're running live traffic did.

Oh, thank you. We had someone catch a catch a demo bug. I really appreciate that. Sounds like someone who's found that one before. Yeah, thank you. uh So for this graph, this is just a demo. so i don't need any warm graphs that are gonna be standing on the side.

And then lastly here is one of our customers, most favorite aspects of working with this. When you create a graph, if you would like for the security requirements of where you at, you can make it straight to be accessible from public who here is used Neptune and wish they could do it from their local machine. I know i have. Yeah.

So what this is gonna do is when we create our graph, it's gonna create an end point that will be publicly accessible i'm gonna copy that and put it right into my jupiter notebook wire up that jupiter notebook. And then we're gonna be good to go.

And then lastly vectors, we're gonna go ahead and make a vector search index for this graph. The data set that we're loading in is a knowledge graph that has embeddings trained from hugging face on the nodes that represent investors. Those uh those embeddings have a dimension of 384. So we're making a vector search index here in my graph of dimension 384. And i'm going to turn off delete production so that i can turn this off when we are done. And let's go ahead and create our graph living dangerously.

I'm sorry, living dangerously, living dangerously. Of course, always. uh So i have this one up from earlier just to give you guys a glimpse of what it would look like when you have your graph created. uh When you have your graph, this is where you can find the a rn and you can get your endpoint, your publicly accessible end point. so you can work with it in notebooks.

And i think that one's gonna take a moment to bake. So i have one up and we're gonna use that for the demo. What i'd like to also show you is all of the training material that uh team members like dave have worked tirelessly to create so that we can help teach the power of graph algorithms and working with graphs uh if you are interested in learning about them.

So when you uh fire up your notebooks, you're gonna have a canned set of jupiter notebooks that are gonna teach you how to apply graph algorithms to specific business problems. That's where we are. So we're gonna go into Neptune and you're gonna see Neptune analytics and in Neptune analytics, you're gonna see a whole series of getting started jupiter notebooks like graph algorithms where you can go through and understand what is a pathfinding algorithm. How do i apply it to a business problem? What are community detection, centrality detection?

We've got notebooks that we've set up so you can walk through and teach uh and learn how to use them for this demo. we are going to go into neptune analytics, the sample use cases and we're gonna do some investment analysis here.

Now, this data set just to give you an idea of what we are studying in this demo. This data set came from the edgar data and edgar data collects holders or collects investors and the investments that they uh that they invested in during a quarter. This data comes from any investor that holds over 100 million in investments in a quarter and then they must file and this is publicly available data.

So we have just loaded that into this graph and let's take a look at the data model that we're using. So you can have an idea of what types of relationships are very important for this data set.

On the far left, you see here that we have a holder that represents an investor like raymond james that you're gonna see here quickly, raymond james has so many investments in a quarter. The data that we just loaded up was q four of 2023. And then we also have all of the holdings of the investments that we made.

So we went from one out to many investments using a graph is very valuable when you start to find these relationships and the number of edges or the number of joins that it would take to do this in a relational database is when that gets very large. That's when it's a great idea to put it in a graph.

The first thing that we are going to do here is make sure that we have our notebook set up and that we have our graph endpoint. So whenever you are doing this uh on your own, you can see here on this cell, i dropped the endpoint of my graph and all i need to do is use the notebook notebook magics here to change the graph, notebook host to that public endpoint that we just created. And then your notebook is gonna be ready to start practicing graph queries, graph algorithms and vector search together. Here is the remaining formatted transcript:

We've already loaded the data. So please pardon one more scroll here and we are gonna start with this first query.

The first query that we would like to do on this data is we just loaded up an entire quarter's worth of SEC filings for this data of anyone in that quarter who had over 100 million of investments. And what do we wanna know? We wanna know who are the top 10.

So what we're doing in this query is we're gonna match and we're gonna do a graph query looking for an investor through that quarter and then all of their investments that they had and we're gonna sum them up and return them as a ranked list. A very analytical style query that you would want to do with the graph.

And you can see here that uh as I mentioned and gave you all a preview earlier, Raymond James is our top investor uh for the quarter in this data set.

One of the next things that you might want to do when you are working with. Now, when you are working with a graph in Amazon Neptune Analytics is you might want to combine running queries together with one of our graph algorithms that we have natively deployed in this engine.

And most often, one of the questions you would be interested in asking is, well, how are the competitors doing compared to, you know, Raymond James or the top other portfolios? And we're gonna use a graph algorithm called Jaccard similarity to do that.

We're gonna start with finding a set, finding the top 10 investors who had the, the most holdings in that quarter. And we're gonna find all of their competitors. We're gonna use a graph algorithm called Jac card similarity. That's gonna look at the differences between the top 10 and then all of their competitors.

If you are curious about the details of Jaccard similarity, we've got some great uh great information including here how to interpret the analytical result. When you run something like Jac card similarity, higher score is better, meaning that they're more similar.

What this is doing right here is this is starting off with an open cipher query where we're looking for first the top 10 and then we're finding all of their competitors. Now we have that list of vertices and we are gonna run an algorithm called Jaccard similarity.

So you can see here when you're using the Neptune analytics engine, you can call directly into running our graph algorithms that we've implemented natively in that engine. And that's giving you the ability to take a look at the top most uh profitable investor from this quarter and then directly compare them to their competitors.

The really interesting part about this that we just saw is that this is an analytics engine that is allowing you to combine queries and algorithm workloads back to back and being able to mix them together. It's something David, I think you told me that our customers have been most interested about so far being able to bring those together.

Absolutely. It's really being able to combine together the power of open cipher with the power of these algorithms in a way that you can really kind of compose these very complex questions into relatively simple and straightforward uh query syntax is somewhere that we've definitely tied many of our customers really, really liked and really, really think it's really one of the very most powerful areas of Neptune Analytics.

Yeah, absolutely. Because customers before were having to write many data pipelines, spin up custom jobs just to run an algorithm and then combine it back with what was in the database. There is durability behind this. So you can take a snapshot and store wherever you are in your process and then load that snapshot back up whenever you're ready and the ability to kind of work with these workflows and is giving our customers an overall lower total cost for bringing graph analytics into their stack.

And so the last piece I would like to show you here is one other way to do similarity uh when you're using Amazon Neptune Analytics. So we just used a graph algorithm to do similarity. We use the Jaccard algorithm. It's a very popular algorithm.

Another way we can do similarity with Neptune Analytics is by using our vector search index, we've stored embeddings when we did the data load on those investors. And what we can do now is find those top investors. And we are gonna skip that process and we are gonna call our vector uh our vector search index here. And we're gonna find the top end by using our vector similarity search engine in Amazon Neptune Analytics.

That engine is happening right here. We're also gonna be using like you saw before with the algorithms we're using that call step so that you can use that uh vector search in the middle of a graph analytic query so that you can be combining together a graph traversal with vector search to make your results much more explainable.

That's why our customers have been loving Amazon F two Analytics so far. So with that, I wanted to make sure that we saw how to create a graph, how to load your data directly from S3. I wanted to make sure that you guys knew how insanely fast this engine was. So hopefully you got to see that.

And then I also wanted to give you an idea of how to run algorithms and vector similarity search in Amazon Neptune Analytics. So I uh I hope you guys learned something and enjoyed the, enjoyed the session today.

Yeah. And thank you all for coming and we really appreciate your, your time here and you know, here at Amazon, I'm sure you've been asked a million times, but please fill out the survey. Uh hopefully we earned high marks for you and we're really glad that you were able to spend time here at the end of the day with us.

Thank you, Denise. Thank you, Dave. Uh if you have a question for them in just a few moments, give them a few moments. They'll uh they can be at the back of the theater and take any questions that you have.

Thank you for the presentation. Thank you for joining. Thank you for coming to re:Invent. Have a good night.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Amazon Neptune: New features for next-generation graph applications

Welcome. Theater five. Is everyone still awake? Still have energy? Uh let me hear it. Yeah, my name's Anthony Pago. I'm a senior product manager here at AWS. Uh just, I've been doing a poll. I gotta do it for the last one. If this is your first re:Invent,
复制链接

扫一扫