Dive deep into what’s new with Amazon Neptune

最新推荐文章于 2024-10-01 19:57:56 发布

李白的朋友王维

最新推荐文章于 2024-10-01 19:57:56 发布

阅读量112

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134832010

版权

Good afternoon, everybody. Welcome to DAT 325. Raise your hand if you traveled from the other end of the strip to come here. Ok, cool. Well, thanks everybody for making the time and reinvent to join us today. I'm Brad Beebe. I'm the General Manager of Amazon Neptune and Amazon Timestream. And today I'm also joined by Doctor Umit Chael who's an Amazon Scholar with Amazon Neptune and also a Professor at Georgia Tech.

A little over seven years ago, I was running a small open source graph database company. I joined AWS and five years ago we launched Amazon Neptune, one of the first fully managed graph databases available in the cloud. And yesterday in Swami's keynote, we launched the general availability of Amazon Neptune Analytics. So I'm super excited to be talking to you guys today about that.

Before we get started, I wanted to kind of understand your own experience with graphs. And so um raise your hand if you guys are currently using graphs, your systems. Ok. Raise your hand if you signed up because there was JA I in the title. Ok. That's good. Oh, well, that's good. That's good. Uh and raise your hand if you're using Amazon, Neptune a few super.

So we're gonna go through a quick overview of graph use cases uh how neptune works, what neptune analytics is. Um it's gonna let you take a peek inside how it works and then we're gonna show you some examples and hopefully we'll have some time for questions. So let's go ahead and get started.

Graphs are awesome. And the reason that they're awesome is they allow you to use the relationships in your data to solve problems, make insights and find things that you can't otherwise do, you can build applications that traverse and query these relationships. The challenges the processing graphs can be very hard. Graph performance is data dependent. It depends on the shape of the graph and graph performance is dominated by random access. If you think about a social network, for example, how I'm connected to Emit who might be connected to someone else. If you traverse those connections, it's very difficult to predict when you lay out the data in the database and position it correctly for fast retrieval. So for generalized graph processing operations, you need to use a purpose built graph solution to be able to get good performance.

Amazon Neptune of course is AWS managed graph service. Neptune databases are optimized to store and query billions of nodes with millisecond latency. And in particular, they're focused on parameterized interactive graph queries. And so we're typically talking about graph queries that have a starting point and an ending point and you're traversing out 12 or three hops. And we have many customers who are using neptune databases for this customers are processing graphs as big as 400 billion nodes, edges and properties using Neptune databases. And Neptune databases support the broadest choice of graph models. As you guys may know, there's a label property graph and there's also the resource description framework or RDF graphs. And Neptune supports three different query languages, open cipher, Apache, tinker, pop gremlin, and also RDF and sparkle. And within the past year, we've launched a serverless deployment option as well as a multi region global database capability. Within the past year, we've launched things like a graph explorer to help you visually navigate your graph tools for you to understand your query performance, expanded the serverless availability. And while doing that, we've made over 15 releases to improve performance, reliability, security and availability. If not, one of the things that most excites me is that every day, thousands of neptune customers create tens of thousands of neptune instances. And you can see from the customer logos here. The use cases are very, very broad and they're all very interesting. And for me working in the graph space, getting to see some of these graph challenges is super cool.

When we look across use cases, there's really four major use cases that we see customers doing for graphs. The first are knowledge graphs, knowledge graphs use relationships to connect data to help you with information retrieval, to help you do machine learning data preparation. And increasingly for use you to use in conjunction with adding context for various different kinds of generative a i applications. The second are identity graphs, identity graphs use relationships between records or observations and some concept of identities. These might be customers. So think about a customer 360 application or they might be devices if you think about an adtech application or it might be a precursor to a fraud kind of piece. But connecting observations of entities with some view of them in the real world. The third major use case is fraud. Fraud. Of course is a classic graph use case. There's many, many ways to detect fraud. I often joke that the only limit to fraud detection is the creativity of those who perpetrate it. The kinds of fraud that you can detect with graphs involve taking sets of transactions and forming relationships or looking at actors who are participating in those transactions and looking at the relationships that they have. And then the fourth and last use case that we see are security graphs. Also a classic graph use case security is often related to how things are connected and resources are connected. We see customers doing cloud security posture management, various kinds of exfiltration detection and many many different identity and policy kinds of use cases.

One particular customer that I wanted to highlight is Whiz. Whiz is a fast growing crowd security posture management software provider. And graphs are really core to how Wiz provides their platform. But the thing that's really interesting to me about Wiz and I have a blog with one of the Whiz coast founders where he says the world is a graph. So just for the bias there. Um but one of the really interesting things about how Whiz uses it is they use the graph for explainability. So if you think about security use cases, there's always something wrong. You're scanning, you're detecting there's thousands or millions of alerts. What they're able to do is take a graph of the security infrastructure and a graph of what's important to your business and be able to show you why fixing one particular thing is important and explain it to you quickly. And so remember that concept about graphs for explainability and we'll talk about it later.

I mentioned that Neptune database is our very traditional type of database architecture. Databases, separate compute and storage. You can choose create neptune clusters, clusters have a primary writer. They have up to 15 different low latency read replicas. And as you issue queries to one of the instances in your cluster, the database engine, the query optimizer, they evaluate that query and to evaluate that query, they use pages from the database as you issue queries, those pages are read from the storage service over the network. And they're stored in RA M, we call it a buffer pool cache within the instance. So as you're issuing queries, if those queries are able to be answered within the pages that are in the buffer pool cache, you'll see very, very low latency and high throughput. If you start to execute queries where you're needing to retrieve information from the storage service, then you're going to start to see your latency increase.

I've been in the graph space for a long time. And when I think about graphs and I think about customers who are understand how to use graph databases, it's sort of a space like this. But if you think about all of the use cases where you could benefit from a graph, it's a much, much larger domain and it includes things that are not just database operations or things that we traditionally associate with databases, but it also includes graph analytics, linked data vector search, a graph processing and a range of the different ways that you can use relationships in your data to solve problems.

If you think about the current landscape of the graph world, on one hand, you have graph frameworks and and these are things like prel or vertex centric uh graph api s and they allow you a lot of flexibility to uh build and implement different graph operations. On another side, you have high performance computing, high performance computing graph applications tend to focus on a specific graph with a specific data layout and they build something that's optimized for a particular algorithm or two and deliver very high performance for a very specific problem. On the other hand, you have graph databases, graph databases give you good general purpose graph processing. They're very usable by supporting purpose built graph query languages. And they provide you database properties like high availability and durability, but that can come at a certain cost for different kinds of of graph operations.

So as you all know at Amazon, we work backwards from our customers and customers said that they wanted to make data discoveries faster by analyzing large graphs, graphs with billions of edges very quickly. So this means that if you think about how we evaluate queries and you need to evaluate a large graph and traverse it and maybe traverse it multiple times before answering a question, you need a different kind of architecture. You need to be able to analyze a whole graph in memory very quickly.

So I'm very proud to introduce Amazon Neptune Analytics, which is now generally available. Amazon Neptune Analytics uses high performance computing techniques as a managed service to try and provide you high performance graph processing within a constant factor of what you can get from a purpose built HBC solution with the usability that you get from a managed service with high level graph query languages. And API s.

Amazon Neptune analytics is a new analytics database engine offering for Neptune. It provides you a single graph endpoint. So with Neptune analytics, you don't have to manage database clusters, you don't have to manage instance configurations, you create a graph using an endpoint and then you use that same endpoint to operate on that graph. It's built using memory optimized high performance computing techniques. So we use a two dimensional graph partitioning, we use a log based replication to provide durability. So Neptune analytics does provide you strong durability guarantees and the memory optimized aspects allow it to deliver much faster performance for certain kinds of analytic operations. For example, Neptune analytics can load graph data at a rate of higher than 10 million edges per second. It can scan data at around 10 million edges per second and overall, it can be up to 80 times faster to be able to load and analyze graph data. In addition, Amazon Neptune analytics allows you to store vector data in your graph and search those vectors.

There's three major use cases that we see you doing with Neptune analytics. The first is what we think of as ephemeral analytics. And this is so where you have a graph in an existing Neptune database or perhaps in Amazon s3. And you want to very quickly load that graph and then run graph algorithms low latency graph queries or vector search on it. The second is where you have potentially large graphs, graphs of up to 50 billion edges where you want to run low latency queries that are parameterized by graph algorithms. And then the third use case is where you want to augment your graph with vector search.

Neptune analytics implements these algorithms as callable stored procedures within the open cipher query language. You can see this is the set of algorithms that are now generally available with Neptune analytics. And this is how you can use Neptune analytics to issue algorithms within your cipher queries. And so from the starting from the top of this algorithm, for folks that are familiar with open cipher, we we're matching and we're finding all of the airports that are in the united states, we then returning the airports in the regions and then we're making a call, the part that's highlighted in orange, which is to a stored procedure that implements an optimized breadth first search or bfs algorithm within neptune analytics. And we're parameterizing it with the results of that match query. And then we're returning the set of nodes and the levels that are the result of the breadth first search. So it's that easy to be able to invoke these kind of algorithms from within your cipher queries.

Oh, now we ran for four months prior to the general availability, what we call it aws a private beta of the service and we had a number of customers use that that service. And one of the problems that the customer is trying to solve is that they had a large marketplace and they sold books and they were seeing people posting books that were generated through LMS and they were basically fraudulent books there. You know, there were IP issues, there were correctness issues. Um, some of these books were for things like, um how to pick safely pick wild mushrooms. Would you trust the gen A I generated book to tell you how to pick a wild mushroom. I need it. No. Ok, good. You're still here. So that's, that's in validation.

So what they did is they combined graph queries with vector search to help them do this. And so this is an example of their query or substantially similar. And if you look at the top again, we're starting with some kind of match on the graph. So we're taking the, the graph, we're finding a subset of it. In this case, we're finding subsets of, of books uh where travel portugal is in the name in the next line where the one is in comments there, we're doing another stored procedure invocation. In this case, we're calling a top k algorithm across the vector search. So we found books that have this title

Now we're finding similar books, we take those results and we then execute a match query against the graph. And we go out between 1 to 3 hops to listeners that are marked or listers sellers or buyers that have been marked as suspicious. And so by doing this, this user was able to get a starting point for where they should investigate for these kinds of fraudulent books. And they were able to look at the graph to explain why they targeted this. So think about doing a vector search without this context, it would be very difficult for you to understand why you got there. And so when you think about when someone asks you, why should i combine graphs and vector search, you should answer them and say explainability for similarity.

So this is an architecture that you can use with neptune analytics and neptune and other amazon aws services like amazon bedrock to build generative a i applications. Obviously, this is a general architecture year 1 may, may vary. But if you start from the right hand side, one part of the process is you, you take your data and you use things like amazon bedrock and lang chain to be able to generate embeddings from your data and then you can store them in your graph with amazon neptune analytics.

On the left hand side, if you want to build applications that interacts with your graph or your databases, using natural language processing, you can call lang chain and others use prompts by querying your graph database to be able to improve the context for your rag applications and then issue vector searches within neptune analytics. So these are both patterns that you can use as parts of building a applications in august of this year.

We launched a lang chain integration. This makes it very easy for you to use, to build applications that use natural language queries or use natural language to generate cipher queries and return the results in the natural language. For anyone who's familiar with neptune, we consistently have a theme of using air routes uh examples and that's for a good reason. And in this case, you can see that we've set up the connection. If you look across the top to importing the bedrock models, we've chose to here to use the anthropic clad llm. We've preconfigured this with the, the things that we've released into open source with lang chain. And we've asked the question, how many outgoing routes does the austin airport have? This? generates an open cipher query against the air routes, data set and it returns the results and says the austin airport has 98 outgoing routes.

Now, this query will probably work for you out of the box. If you use uh air routes, your other ones probably won't. They're gonna need a lot of uh a lot of tuning and a lot of, a lot of context and other things there. But it's an example of what you can do.

Uh i mentioned we ran a private beta for thermal analytics. We had a financial services customer improve fraud detection at the point of sale from 17% to 58% by using ephemeral analytics loading the graph very quickly with neptune analytics issuing graph algorithms finding detections and then using that to improve their results. And we also had a large media and technology company to reduce their overall time and total cost of ownership for their data science pipelines using a similar workflow. Taking advantage of the memory optimized architecture of neptune analytics to load graph data very quickly issue algorithms and then they may keep the graphs running or they may shut them down.

If you watch swami's keynote, you know that snap is now using neptune analytics to store and process different features in a billions of node graph. And to be able to use it to serve back as users navigate and interact with their app and increase user engagement. And they're seeing substantially improved user engagement results from using this graph based approach. And it wasn't possible to load a graph of this size in the speed they needed prior to neptune analytics.

In the second case, amazon.com, another fraud use case reduced the time to resolution by over 25%. Using a similar kind of approach. They loaded their fraud data and they use, they gave the investigators the tools to use graph algorithms and graph queries to prosecute it. And in the use case of graphs and vector search, we had a large health care company bring their knowledge graph of proprietary information about chemical compounds, disease, uh gene interactions, protein pathways and combine that and store vectors in that graph. And then they use graph query and vector searched to help them identify potentially promising compounds and potentially ones that, that weren't promising. And the result of that was it saved them significant amounts of cost because they were able to reduce things that they needed to do wet lab testing, which is a very expensive part of the pharmaceutical process.

And we already talked about this other use case for a large online bookstore because we wanted to make neptune analytics really easy to use to build with graphs. We also wanted to price it in a way that was easy. So with neptune analytics, when you create a graph, you choose a capacity, you provision a capacity and that capacity is expressed in what we call memory optimized neptune capacity units or mnc us. Each mncu represents a certain amount of memory cpu and network resources. And the mnc us are bundled into provision chunks starting with 100 and 28 mnc us going up to 4096 mnc us and we can scale as much as 24,000. Although that's not publicly available. If you're interested in that, please let us know. And we also created the simpler customer experience. I talked about using the same graph endpoint rather than having to manage databases and create them, you're able to analyze your graphs much, much faster.

So at this point, i'm gonna turn it over to um me and he's gonna give you a peek inside how we built this? Thank you very much.

Good afternoon, everyone. I'm really excited to be here uh and have the opportunity to explain you um what an amazing service we have been building. So uh as brad mentioned, we designed neptune analytics to be in the intersection of graph databases, graph frameworks and also hpc analytics. So how we have managed to do that?

So at the core of that is really the the graph partitioning, we use a graph partitioning scheme that really enabled us two things at once. One as you all know, graph algorithms are have in irregular memory accesses. So this partitioning scheme transform them to almost regular memory accesses and increase the locality. The second thing they've done is minimize the communication that's needed when we are doing the computation together with these. And of course, a lot of good engineering, we were able to achieve the fantastic speeds. You have been seeing there 80 times faster uh loads and stores uh about 100 million seconds and 100 million records in second or 200 times faster coler scans in the transactional world.

Ok. So what we are doing we are running everything on our transactional system. All of the algorithms will directly run on the data structures designed to support the full transaction. So we are not creating any in memory read only uh memory efficient like secondary data structure, everything is running on the light data center.

So we use three different kinds of partitioning schemes. Uh for three different types of uh data we have in our database. The first one is the dictionary dictionary uh helps us to encode lexical forms to uh identifiers that we can internally use. The second is the topology of the graph. Uh that's basically a structure of the graph, the actual relations itself. And the third one is the property values of vertices and edges.

Um we are being we have been using different partitioning schemes, all of for all of those uh appropriately. For each one of them, we use hash partition for the dictionaries one departure for vertex uh properties and two departing for edges and uh edge properties which i'll explain you a little bit in more detail in the next slides. So this is number one.

So how we actually store the data in memory? The second is what's the computational model we are using internally, we are using something similar to every one of you think like a vertex instead of think like a vertex, we think like a subgraph a block. Ok. So uh and we develop a simple visitor model as well. So we can easily deploy algorithms externally today. We are supporting open cipher and graph api.

So what's this two d partition about? So let's look at the simple graph with only six vertices and uh it's directed graph naturally simply ordered with vertex number 0 to 5. We know that in real world we don't get these graphs. And when we get the graphs, we get the lexical forms for each of the vertices, we need to convert them to internal identifiers. We do some interesting things there, but i'm not going to go into too much detail of that. Let's assume that i managed to do that. And i convert that to some, some form of integer numbers.

We don't actually use consecutive integer numbers because we don't wanna bias our system. But we can create a sparse matrix of the graph, right? This is very traditional computer science. So if there is an edge between zero and one, there is a nonzero and 01 position in that matrix. Ok. So what we do we do two d block partitioning. Basically we partition horizontally and vertically at the same time such that diagonal blocks are squares. Ok?

So what this allows us first we are dividing the set of the graph into multiple ones for simple, i'm using only four parts here, right? The most important thing this provides us is for a subset of vertices from a source subset of vertices going to destination a subset of vertices. Those images are co located right? The vertices. For example, in this slide uh from zero to 0123, let's think the rows are the sources. So first three row going to the vertices, 356, the columns are the destination. They will be all in block b one. Ok. So they are all co located that really increase the locality and enables us to improve the performance. And also it will help uh reducing the communication which i don't have too much time to explain.

So each of the algorithm is implemented as a set of kernels. Uh the kernel is basically a function, a function that takes a list of blocks. Ok. Some kernels only take one block like page rank, it can only operate a single block. So if you want to run a page rank in the whole algorithm, you basically need to go over all the blocks and give the page rank punter.

So we designed a storage system such that it provides us an efficient both index and scan based access to the data. So depending on the kernel needs, if it needs to do index access or scan based access, whatever is more efficient, we actually sometimes switch from one to the other as well. Ok. From that, we have a kernel, we have the data, we define a task, a task, basically a kernel with a block list. So in our case, the page rank was using single block. So we have four tasks here, then we put those tasks into an execution queue. Then we can do parallel execution.

The nice thing about really this two d execution model, uh two d partitioning model. It's really amenable to actually run on heterogeneous systems and distribute execution as well. All right. So can we actually support all the algorithms? So we classified uh with my phd student uh algorithms into basically three different categories.

The first one is single block box synchronous algorithms. Those are the ones that has to go over all the edges, right? It takes a block process, it goes to the next block. Some of those algorithms need to do this multiple to deter of times. So degree the simplest one needs to do one pass page rank, you'll do multiple paths, but you need to go to all of them. There are some other algorithms like uh bfs or or shortest path delta stepping, et cetera. Those are activation based. You start with a source or sources then depending on the blocks there, they activate the blocks and you continue the execution that way. And the third one is basically multiple of patterns in this uh case you get multiple blocks and the patterns predefined by the algorithm itself and you execute jar similarity triangle count are some of the examples.

So basically with this system, we can really implement almost all all the algorithms. Ok? So now what i'm gonna do is something i've never done before. Uh brave enough, be brave enough trust wi fi and my sss seeing sites, i can't really see the light is too much coming on to me. But i'm trying to show you guys uh uh an actual demo. All right, let's see. Bye.

So because we have limited time. I'm not gonna be able to show you too many different things, but i'll want you to walk through a few of the examples. So we start with the uh er console, easy two console, right? Uh sorry uh amazon console. So we go our neptune. So on the left side, you see at the top neptune database, what you have been seeing before all of those functionalities still there at the bottom. Now you will be able to see since yesterday. Neptune analytics, as we mentioned, nep analytics works on graphs. No other cluster. You create a graph and you work on a graph. Ok

Let's see how we can create a graph.

We click the graph and create a graph here. So the first thing we need to do, of course, give a name my new graph. Of course, if i can type i have fat fingers and bad eyes, right? So please excuse me um data source. The next thing we need to select. Do you wanna create an empty graph or do you wanna start from an existing one?

Uh and we together with it, we need to specify memory optimize neptune credit unions m ce mnc us as brett mentioned and the options are here. So let's go with the smallest one, you can create a from an existing source.

Uh in that case, you can specify what will be the minimum and maximum. And of course, we'll try to fit in the minimum. Uh but we may have to need to go a little bit bigger. So that's why we ask you to tell us what's the maximum budget you wanna use?

Uh then you need to give an r a role something you created before i had created this read only uh role. before. Then you can specify one of those three options. Do you want to create a clu from the neptune cluster, existing neptune cluster? Do you wanna create from the snapshot? Honesty? I'll assume if most of you are not using that will be s3, then you need to type the s3 bucket here which i'm not gonna try to create right now just uh for the interest of time.

The next thing you need to select the availability settings, we can by default, we will have one replica. Uh if you just want to play with me like me, you can create a zero replica, just have one. Then you specify the network security for the demo. I'll say everything is public.

Then the next thing is, as you mentioned, we support vector embeddings. Currently we support a single vector embedding. If your data set will have uh embeddings for the vectors, vertices, what you need to do is you need to tell us what's the dimension of that. Ok. It could be, you know, any number your lm or gnn is producing. Ok. Then you just hit click create.

So it will take a few minutes, 56 maybe. So i don't want to wait 56 minutes here. Um the only reason is, you know, the instance need to be picked up et cetera. So instead of that, if you're ok with me going back, what i will show you is how you can use them with notebooks and not books have been available for neptune uh neptune database itself. But today i'm gonna show you with um neptune analytics.

So we provide this free notebook examples. You should be able to see all of these examples yourself and you can play with yourself as well. So let's go to the neptune, let's go to the neptune analytics. This is the new one that you should be seeing since yesterday. So let's go to skip the getting started and go to directly the graph algorithms.

And again, for the interest of time, i'll only try to show you two of them pet finding. Maybe let's start with pfs. Well, pfs, we call it algorithm, but it's the building block of all of the algorithms. And, and it's simple to explain, that's why i'm showing that i'm not trying to insult your intelligence.

So as bet mentioned, we really like to play with the air ras it's a really cute little data set uh that was uh became public uh as part of practical gramming book. Uh and it's very easy to explain. So it has basically four different type of vertices specified by the labels. The first type is the airport. Airports are connected to the other airports with edge label route. Ok. Then we have country vertices, country vertices are connected to the airports with contains and we have continents and they will also connect to the airport.

Uh and there's a version vertex uh which has which version of the data that you are using. So as i've mentioned before, you can load this from your neptune graph database. If you already had it there or s3 or if you want, you can actually load directly from the load notebook, uh which i had done already loaded.

So instead of loading what i will do, i'll run a summary of this. So if you load it here as well, you will see something like that. It's a small data set, uh 3700 uh were to see 115,000 edges and this shows all other basically um cast characteristic sets of the data set making part.

So um pfs, as i said, it's a building block. It's very simple. Um you basically look at how many hops uh you are away from a given a source vertex you can use to find uh if the graph is connected or just basically count to hops uh hop distance.

So we can, the really nice thing about uh our implementation is you're running this data on the live data and you're just running as a part of open cipher query, right? We just merch an air airport is an airport. Uh we like to start from seattle. I guess you can guess why. Uh then we call the fs algorithm with that node. So we'll start from seattle.

Uh in this example, i'm giving max uh depth limit because i want to show the query uh as well. Ok. And i can run this and thanks to network, uh it comes extremely quickly thanks to network and our implementation obviously.

Uh so as you can see from seattle, you can go to 3100 ports very quickly, we can run the same query uh with the open cipher query, right? So this open cipher query uh we are using variable length query but limits to three hubs. We can run this. As you have seen, we get the exact same result, but it was a little bit slower, right? As the data set get bigger pfs is a custom built algorithm. It will obviously run much faster. We will also enable parallelization.

So pfs has variants. So uh you can call bfs with, what was my parents? Where did i get here from? Uh or you can call bfs with levels. Uh how many levels i'm far away. So i like to run this one. So again, starting from seattle, i wanna get the levels and i wanna order them uh level in the decreasing order, descending order.

So as you can see, it's extremely fast and you may notice that there are two airports that if you want to go to there from seattle, you need to take seven flights. That's quite a bit. Ok.

Uh for the interest of time, i'm skipping a little bit. So shortest path we have two implementation. Uh one of them is beam forth. So i'm gonna just show the beam forth one. So in this one, we are using actually actual distance listed in the edges.

So how we specify, we tell that the algorithm is short single source, shortest path be for algorithm. Uh we'll use the edge property that's in this data set is distance. Uh the weight is integer. Uh and the age label is route. So basically, we wanna run short spat using that i run and this is actually run to server run it come back.

Uh and as you can see, there are, there's one airport uh bzz. The shortest distance to that airport is 16,000 miles from seattle. Ok. And these are, these have different variants, as i said, the patterns and delta stepping. And we have another uh uh top k shortest path algorithm which i will skip for the interest of time.

I just want to show centrality algorithms a little bit. So if you wanna find the who is the influencer, what's the important note? What's the most uh important hub uh in your network? You can use centrality algorithms.

Uh we're gonna use again. The same data set. So the simplest one is we can ask ourselves which airport has the most connection, right? We can run the degree algorithm. So we'll get the top 10. Oh i like this one because istanbul shows at the top i'm turkish by uh so uh 1200 flights from istanbul and also frankfurt.

So uh as you can see, amsterdam is the number four here. So we can look at different ways we can look at um uh page rank uh and closer centrality. Other two algorithms we implemented.

So page, the famous page rank algorithm is uh later page and sing means uh google's ranking the page. So the simplest way a vertex is important if it's connected by important vertices and the connection is important, right? It's just an iterative algorithm that computes that. Ok.

Uh so we can run page rank algorithm on the state side. Uh and that still shows istanbul at the top. But as you can see frankfurt dropped 99. Amsterdam is not in the list anymore. Ok. This is a different way of looking uh the same problem.

I'll show the, the the third one closes centrality. This time we're gonna look at the distances to those paths. Ok? How far i am to all of the vertices? You may want to use this if you want to find a place, a facility in a location so that it can actually go, you can reach out other places much faster. The complexity of this algorithm is a little bit higher than the others right vertices edges the full complexity.

If you wanna compute an approximation, you can actually do a limited number of starts. Uh we put 8000 starts here but starting from 8000 vertices. But basically this uh data set has only 3500 vertices. So this is an exact solution. In this case, what you will see is again the order and change.

Now we are seeing the geographical benefits of the location, right? The one, the airports that are in europe, they are now showing at the top because they are actually closer to both us and asia at the same time. So istanbul dropped to number eight. Ok.

All right, we want to take some questions. That's why i want to stop here. So let me switch back to slides again and i wanna end here with one teaser slide.

So as we mentioned, there are two camps of the graph world, right? One is rdf, the other is lpg and nein that best supports both but not at the same time, right? What if we could actually use gremlin or rdf or sparkle or lp g, right? So we can have a single graph model, then we can free to use any of the query languages. Of course, there will be some adjustments needs to be done. But that's one of the idea that my colleagues at amazon has been really pushing the last couple of years and, and one graph model and i had the audacity to say one graph model to rule all. They didn't say that. I i'm saying that, but that's, that's, that's our vision and our current storage model is actually designed to be one graph model. Ok?

I think i'll stop there and thank you and we have some time to get questions.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫