如何使用SQL Server R绘制SQL Server 2017图形数据库

A few years ago, one common business case I came across in my professional career that required modelling of data into a many-to-many entity relationship type was the representation of a consultants and their projects. Such a business case became a many-to-many entity relationship type because whilst each project can be undertaken by several consultants, consultants can in turn be involved in many different projects. When it came to storing such data in a relational database engine, it meant that we had to make use of bridging tables and also make use of several self-joins to successfully query the data.

几年前,我在职业生涯中遇到过一个常见的业务案例,该案例要求将数据建模为多对多实体关系类型,即代表顾问及其项目。 这种业务案例变成了多对多实体关系类型,因为尽管每个项目可以由几个顾问来承担,但顾问又可以参与许多不同的项目。 在关系数据库引擎中存储此类数据时,这意味着我们必须使用桥接表,并且还必须使用多个自联接来成功查询数据。

However, with recent focus on big data for many of my clients, we have experienced an increase in different business requests that requires for many-to-many data modelling. Consequently, as a Microsoft shop we’ve had to turn to other non-Microsoft products to ensure that we optimally respond to such business requests. Not surprisingly, ever since word got around that graph database will be part of SQL Server 2017, we’ve been looking forward to this latest release of SQL Server. Having played around with the graph database feature in SQL Server 2017, we have noticed that unlike other graph database vendors, plotting and visualising the data out of the graph database is not readily available in SQL Server 2017. Luckily, thanks to SQL Server R, you can easily plot and visualise SQL Server 2017 graph database data without turning to 3rd party plugins. In this article, I demonstrate how SQL Server Machine Learning Services (previously known as SQL Server 2016 R Services) can be used to plot a diagram according to the data defined in a SQL Server 2017 graph database.

但是,由于最近我的许多客户都专注于大数据,因此我们经历了需要进行多对多数据建模的不同业务需求的增加。 因此,作为Microsoft商店,我们不得不求助于其他非Microsoft产品,以确保我们以最佳方式响应此类业务请求。 毫不奇怪,自从围绕该图形数据库的消息成为SQL Server 2017的一部分以来,我们一直在期待最新版本SQL Server。 在试用了SQL Server 2017中的图形数据库功能之后,我们注意到与其他图形数据库供应商不同,在SQL Server 2017中不容易从图形数据库中绘制和可视化数据。幸运的是,由于有了SQL Server R,您可以轻松地绘制和可视化SQL Server 2017年的图形数据库数据,而无需求助于第三方插件。 在本文中,我将演示如何根据SQL Server 2017图形数据库中定义的数据使用SQL Server机器学习服务(以前称为SQL Server 2016 R Services)来绘制图表。

案例研究:300多个英超联赛出场 (Case study: 300+ Premier League appearances)

Avid readers of my SQLShack articles would probably have noticed that I’m a great fan of football, particularly, the English Premier League. Like in most sports, there are several traditions and milestones that are celebrated in the Premier League. One such milestone involves celebrating players who have made 300 (or more) appearances in Premier League games. This Wikipedia page provides a list of such players with their respective Premier League football clubs that they have played for. If you go through the list, you will notice that some players have played for multiple clubs in order to get to 300+ appearances whilst some of the clubs (i.e. Chelsea FC, Arsenal) have had several players play for them at some point. Thus, as shown in Figure 1, our case study indicates that a player can play for many Premier League clubs and a club can have many players.

我的SQLShack文章的热心读者可能会注意到,我是足球的忠实粉丝,尤其是英超联赛 。 像大多数体育运动一样, 英超联赛也庆祝着几种传统和里程碑。 其中一个里程碑是庆祝在英超联赛中出场300次(或更多)的球员Wikipedia页面提供了此类球员及其曾效力过的英超足球俱乐部的列表。 如果您查看列表,您会注意到一些球员为300多个出场席位而参加过多个俱乐部的比赛,而有些俱乐部(例如, 切尔西足球俱乐部阿森纳 )则在某个时候为他们效力。 因此, 如图1所示,我们的案例研究表明,一个球员可以为许多英超俱乐部效力,而一个俱乐部可以有许多球员。

数据建模和存储:SQL Server 2017图形数据库 (Data modelling and storage: SQL Server 2017 graph database)

With the type of entity relationship illustrated in Figure 1, we proceed to create these entities in our sample instance of SQL Server 2017 graph database. Assuming that you already have a general understanding of graph databases and their implementation in SQL Server 2017, we have created SocialGraph database that consists of dbo.Player and dbo.Club node tables that are used to store a list of player and club names, respectively. As per Figure 2, our graph database also contains a dbo.playedFor as our edge table.

使用图1所示的实体关系类型,我们继续在SQL Server 2017图形数据库的示例实例中创建这些实体。 假设您已经对图形数据库及其在SQL Server 2017中的实现有了一般的了解,我们创建了SocialGraph数据库,该数据库由dbo.Playerdbo.Club节点表组成,分别用于存储球员和俱乐部名称列表。 如图2所示 ,我们的图形数据库还包含一个dbo.playedFor作为我们的边表。

For the purposes of this demo, we will limit the number of players and clubs to store in our graph database to just 5 per entity as per the INSERT T-SQL script shown in Script 1.

就本演示而言,根据脚本1中显示的INSERT T-SQL脚本,我们将限制存储在我们的图形数据库中的球员和俱乐部的数量为每个实体只有5个。

INSERT INTO [Club]([name])
VALUES('Tottenham Hotspur'), ('Chelsea'), ('Manchester City'), ('Arsenal'), ('West Ham United');
GO
INSERT INTO [Player]([name])
VALUES('Frank Lampard'), ('Petr Cech'), ('Cesc Fabregas'), ('Gael Clichy'), ('William Gallas');
GO

The INSERT T-SQL script for populating the edge table is quite lengthy, as a result it has been provided for download under the Downloads section at the bottom of this article.

用于填充边缘表的INSERT T-SQL脚本相当冗长,因此可以在本文底部的“ 下载”部分中下载该脚本。

使用SQL Server R进行图形数据库可视化 (Graph database visualisation using SQL Server R)

Having setup and populated our SQL Server 2017 graph database, we proceed to plot and visualise our graph by using the R language which is part of the Machine Learning Services in SQL Server 2017. In order to successfully achieve our objective of visualising our graph database data in SQL Server R, we need to download and install several R packages according to the following steps:

设置并填充我们SQL Server 2017图形数据库后,我们将继续使用R语言绘制和可视化图形,R语言是SQL Server 2017中机器学习服务的一部分。为了成功实现可视化图形数据库数据的目标在SQL Server R中,我们需要根据以下步骤下载并安装多个R软件包:

Step 1: download and install igraph R package

步骤1:下载并安装igraph R包

The igraph is an efficient package in R that can help with network analysis and plotting of simple graphs. There are several ways of downloading and installing the igraph R package the one that worked for me was to manually download it to a local folder on my machine as a windows binary package from igraph.org, and later installed it using the R.EXE application shown in Figure 3.

igraph是R中的高效软件包,可以帮助进行网络分析和简单图形的绘制。 有几种下载和安装igraph R程序包的方法,一种对我有用的方法是将它作为Windows二进制程序包从igraph.org手动下载到我的计算机上的本地文件夹中,然后再使用R.EXE应用程序进行安装。如图3所示。

Again, assuming that you are already familiar with R package installation in SQL Server, the installation of igraph R package should be straight-forward as shown in Figure 4.

同样,假设您已经熟悉SQL Server中的R软件包安装,则igraph R软件包的安装应该很简单, 如图4所示。

To install the igraph package, we simply run the install.packages command with a path to where the downloaded file is stored, in my case I kept it in the E drive as shown in Figure 4.

要安装igraph软件包,我们只需要运行install.packages命令,并带有指向下载文件存储路径的路径,就我而言,我将其保存在E驱动器中, 如图4所示。

Step 2: install magrittr R packages

步骤2:安装magrittr R软件包

Having the igraph package is not enough as you may still run into the error message shown in Figure 5 when trying to execute your R script. Lastly, in addition to successfully installing igraph R package, you may also need to install the magrittr R package as the igraph R package requires it.

仅拥有igraph软件包是不够的,因为在尝试执行R脚本时,您仍然可能会遇到图5所示的错误消息。 最后,除了成功安装igraph R软件包外,您还可能需要安装magrittr R软件包,因为igraph R软件包需要它。

Similarly, to the installation of igraph, I got around this error by downloading a copy of the magrittr package, saved it into my local E drive and installed it using the R.EXE application.

同样,对于igraph的安装,我通过下载 magrittr软件包的副本,将其保存到本地E驱动器并使用R.EXE应用程序进行安装来解决了该错误。

Step 3: prepare and execute R script

步骤3:准备并执行R脚本

Finally, now that we have installed all required R packages, we proceed to plotting our graph using the script shown in Script 2. Obviously, you can choose different image file type but I have opted to plot our graph into a PNG file.

最后,既然我们已经安装了所有必需的R包,我们将继续使用脚本2中显示的脚本来绘制图形。 显然,您可以选择其他图像文件类型,但是我选择将图形绘制为PNG文件。

EXECUTE sp_execute_external_script @language = N'R',
@script = N'
	require(igraph)
	g <- graph.data.frame(graphdf)
	V(g)$label.cex <- 2
	png(filename = "c:\\temp\\PLVeterans.png", height = 800, width = 1500, res = 100);
	plot(g, vertex.label.family = "sans", vertex.size = 5)
	dev.off()',
@input_data_1 = N'
	SELECT a.name, b.name as Club    
	FROM player a, playedFor, Club b
	WHERE MATCH(a-(playedFor)->b);',
@input_data_1_name = N'graphdf'
GO

The generated PLVeterans.png image file is shown in Figure 5 wherein we can visualise the data from our sample SQL Server 2017 graph database. The graph indicates that whilst Frank Lampard and William Gallas played for 3 different Premier League clubs to get to their 300+ appearances, Chelsea and Arsenal had the most number of the so-called 300+ players appear in the Premier League through them.

生成的PLVeterans.png图像文件如图5所示,其中我们可以可视化示例SQL Server 2017图形数据库中的数据。 该图表明,虽然兰帕德Frank Lampard)威廉·加拉斯William Gallas)为3个不同的英超俱乐部效力而获得了300多个出场,但切尔西阿森纳通过所谓的300多个球员出现在英超联赛中最多。

摘要 (Summary)

The new graph database feature in SQL Server 2017 is a greater alternative to traditional relational databases particularly when it comes to modelling and implementation of many-to-many entity relationships. Furthermore, the integration of graph database with SQL Server 2017 Machine Learning Services such as the R language (along with the igraph R package) enables data scientist to successfully plot and visualise data from the graph database.

SQL Server 2017中的新图数据库功能是传统关系数据库的更好替代方案,尤其是在多对多实体关系的建模和实现方面。 此外,将图形数据库与SQL Server 2017机器学习服务(例如R语言)(以及igraph R软件包)集成在一起,可使数据科学家成功地绘制和可视化图形数据库中的数据。

资料下载 (Downloads)

翻译自: https://www.sqlshack.com/plot-sql-server-2017-graph-database-using-sql-server-r/

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值