增强的PolyBase SQL 2019-外部表SQL Server,目录视图和下推式

This article is part 4 of the series for SQL Server 2019 Enhanced PolyBase. Let quickly recap the previous articles.

本文是SQL Server 2019增强型PolyBase系列的第4部分。 让我们快速回顾一下以前的文章。

  • Part 1: We installed SQL Server 2019 PolyBase feature along with Azure Data Studio and SQL Server 2019 preview extension to explore its features 第1部分:我们安装了SQL Server 2019 PolyBase功能以及Azure Data Studio和SQL Server 2019预览扩展以探索其功能
  • Part 2: In this part, we learned to create an External table using Azure Data Studio ‘External table wizard’ for the Oracle data source 第2部分 :在这一部分中,我们学习了使用Azure Data Studio'外部表向导'为Oracle数据源创建外部表
  • Part 3: We learned the useful features of External tables like joins and created an external table using t-SQL instead of the GUI mode for Oracle database in this series article 第3部分 :在本系列文章中,我们学习了外部表的有用功能(例如联接),并使用t-SQL而不是Oracle数据库的GUI模式使用t-SQL创建了外部表。

We have learned earlier that PolyBase in SQL Server 2019 Preview allows access to various data sources such as SQL Server, Oracle, MongoDB, Teradata, and ODBC based sources etc. Azure Data Studio SQL Server 2019 preview extension currently supports for SQL Server and Oracle data sources only from the External table wizard.

我们早先了解到,SQL Server 2019 Preview中的PolyBase允许访问各种数据源,例如SQL Server,Oracle,MongoDB,Teradata和基于ODBC的源等.Azure Data Studio SQL Server 2019 Preview扩展当前支持SQL Server和Oracle数据仅从“外部表”向导获取源。

In this series, we will create an external table for SQL Server and explore some more features around it.

在本系列中,我们将为SQL Server创建一个外部表,并探索围绕它的更多功能。

Launch Azure Data Studio and connect to the SQL Server 2019 preview instance. Right click on the database and launch ‘Create External Table’.

启动Azure Data Studio并连接到SQL Server 2019预览实例。 右键单击数据库,然后启动“创建外部表”。

PolyBase - Create External Table SQL 2019

This opens up the wizard to create the external tables. Recently, I faced an issue where the wizard stuck in the ‘step 1’. Progress bar icon keeps rotating and does not show any error message or any progress.

这将打开向导以创建外部表。 最近,我遇到了一个问题,向导陷入了“步骤1”。 进度栏图标保持旋转,并且不显示任何错误消息或进度。

PolyBase Create External table Error SQL 2019

After some time, we get the error message ‘Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.’ This is the general error message and does not point to clear error message.

一段时间后,我们收到错误消息“超时已过期。 在操作完成之前超时时间已过,或者服务器没有响应 。 这是一般错误消息,并不指向清除错误消息。

PolyBase Create External table Error SQL 2019

I tried to close the wizard multiple times and launch it again but it remains the same. It does not show any error message to troubleshoot it further.

我尝试多次关闭向导,然后再次启动它,但它保持不变。 它不会显示任何错误消息来进一步对其进行故障排除。

Later, during the investigation, I figured out that PolyBase services were in stopped condition.

后来,在调查过程中,我发现PolyBase服务处于停止状态。

  • SQL Server PolyBase Data Management Service

    SQL Server PolyBase数据管理服务
  • SQL Server PolyBase Engine

    SQL Server PolyBase引擎

PolyBase service  SQL 2019

Let us start these services.

PolyBase service  SQL 2019

让我们开始这些服务。

Now again launch the ‘Create External Table’ wizard in Azure Data Studio. This starts the wizard successfully. Therefore, monitor the service status before launching the wizard to avoid any issues.

现在,再次在Azure Data Studio中启动“创建外部表”向导。 这将成功启动向导。 因此,在启动向导之前监视服务状态,以避免出现任何问题。

PolyBase External data source in Azure data studio SQL Server 2019

SQL Server的外部表 (External table for SQL Server )

In this section, we will use the below source and destination instances.

在本节中,我们将使用以下源实例和目标实例。

  • Source Instance (here we will create external table): SQL Server 2019 (Named instance – SQL2019) 源实例 (此处将创建外部表):SQL Server 2019(命名实例– SQL2019)
  • Destination Instance (External table will point here): SQL Server 2019 (Default instance – MSSQLSERVER) 目标实例 (外部表将指向此处):SQL Server 2019(默认实例– MSSQLSERVER)

Click on the ‘SQL Server’ in the data source type of wizard and proceed to the next step. In the next step, create the Database Master Key to secure the credentials used by the external data source. We should use a complex password with a combination of lower case, upper case, alphanumeric and special characters.

单击向导的数据源类型中的“ SQL Server”,然后继续下一步。 在下一步中,创建数据库主密钥以保护外部数据源使用的凭据。 我们应该使用包含小写,大写,字母数字和特殊字符的复杂密码。

PolyBase Create Master key SQL 2019

Go to the next step and create data source connection.

转到下一步并创建数据源连接。

Server Name should in the format of [Instance Name IP Address].[Port]

服务器名称应采用[实例名称IP地址]。[端口]的格式。

PolyBase Create Data Source SQL 2019

This credential should have permission onto the SQL Server where we will point out external tables.

该凭据应具有访问SQL Server的权限,我们将在其中指出外部表。

PolyBase data source error SQL 2019

  • Connect to the instance, create a login, and provide read permission to the user on WideWorldImporters database.

    连接到实例,创建登录名,并在WideWorldImporters数据库上向用户提供读取权限。

    CREATE LOGIN [DemoSQL2019] WITH PASSWORD=N'f3EzbtBSXu7iNaKdtRXd+soU0ab6Pwu6BSfMOI7jqms=', 
    DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF
    GO
    CREATE USER [DemoSQL2019] FOR LOGIN [DemoSQL2019]
    GO
    ALTER ROLE [db_datareader] ADD MEMBER [DemoSQL2019]
    GO
    
  • Now we can go further and see that no login failure issue occurred. Select the table from the database.

    现在,我们可以继续进行下去,看看没有发生登录失败问题。 从数据库中选择表。

    PolyBase - Choose object for external table SQL 2019

We see here that the mapping as below:

我们在这里看到的映射如下:

  • Source Table: Sales.Invoices

    来源表:销售发票
  • Destination table: dbo.invoices

    目标表:dbo.invoices

It automatically selects the destination schema as dbo since sales schema does not exist in our database. Therefore, let us create the schema and refresh the schema list to show it here:

由于销售模式在我们的数据库中不存在,因此它将自动选择目标模式为dbo。 因此,让我们创建模式并刷新模式列表以在此处显示:

Create schema

Select the sales schema from the drop-down. Click Next to view summary:

从下拉列表中选择销售模式。 单击下一步以查看摘要:

PolyBase Select the schema in drop down  SQL 2019

summary of create external table Azure data studio

Click on Create to configure an external table. Once the external table is created, we can access the data from it.

单击创建以配置外部表。 创建外部表后,我们可以从中访问数据。

status message

PolyBase- View the records from external table

PolyBase的目录视图 (Catalog views for PolyBase)

We can view the external tables in using the catalog view sys.external_tables. It shows all the external tables in the current database:

我们可以使用目录视图sys.external_tables查看外部表 它显示了当前数据库中的所有外部表:

PolyBase catalog view sys.external_tables

SELECT execution_id, status,st.text, dr.total_elapsed_time  
FROM sys.dm_exec_distributed_requests  dr  
cross apply sys.dm_exec_sql_text(sql_handle) st  
ORDER BY total_elapsed_time DESC;

We can also get the information about the data source using the catalog view sys.external_data_sources.

我们还可以使用目录视图sys.external_data_sources获得有关数据源的信息。

Using below query, we can see the name of the data source, location (location contains database and instance IP address along with the instance port address):

使用下面的查询,我们可以看到数据源的名称,位置(位置包含数据库和实例IP地址以及实例端口地址):

SELECT name, location, type FROM sys.external_data_sources

PolyBase catalog view the sys.external_data_sources.

了解向导生成的脚本 (Understand the script generated by the wizard)

The Create an External table wizard can also be used to create scripts for the whole process. Let us understand the script generated by the Azure Data Studio for creating an external table in the above example by breaking the query into multiple parts.

创建外部表向导也可以用于创建整个过程的脚本。 让我们了解以上示例中由Azure Data Studio生成的用于创建外部表的脚本,该脚本将查询分为多个部分。

  1. Below query creates the Master Key Encryption

    下面的查询创建主密钥加密

    Create Master Key Encryption

  2. In this step, it created the database scoped credential.

    在此步骤中,它创建了数据库范围的凭证。

    Create database scoped credential

  3. Create external data source pointing to SQL Server. In this external data source query, we need to specify the location in the format of <vendor>://<server>[:<port>].

    Since we are creating the external table for SQL Server, we need to specify the vendor as ‘sqlserver’.

    We also need to specify the port address with the colon. For example, in below query, we specified port number as 5290.

    Create external data source
  4. 创建指向SQL Server的外部数据源。 在此外部数据源查询中,我们需要以<vendor>:// <server> [:<port>]的格式指定位置。

    由于我们正在为SQL Server创建外部表,因此需要将供应商指定为“ sqlserver”。

    我们还需要用冒号指定端口地址。 例如,在下面的查询中,我们将端口号指定为5290。

  5. In the below section, we will create an external table. We need to create an external table similar to the relational database table with the column properties. We also need to specify a location for the object along with the data source.

    在下面的部分中,我们将创建一个外部表。 我们需要使用列属性创建一个类似于关系数据库表的外部表。 我们还需要指定对象的位置以及数据源。

    For example, in below query, we specify the location as [WideWorldImporters].[Sales].[Invoices] and DATA_SOURCE as [SQLServer]

    例如,在下面的查询中,我们将位置指定为[WideWorldImporters]。[Sales]。[Invoices] ,将DATA_SOURCE指定为[SQLServer]

    Create external table script

在PolyBase中按下 (PUSHDOWN in PolyBase)

We normally use predicates in the query in order to get a subset of the rows from the table. This subset allows pulling the records based on the conditions defined with where clause. These predicates can be as following as per the docs.

我们通常在查询中使用谓词,以便从表中获取行的子集。 该子集允许根据where子句定义的条件提取记录。 这些谓词可以按照docs进行如下操作。

  • Binary comparison operators ( <, >, =, !=, <>, >=, <= ) for numeric, date, and time values.

    数值,日期和时间值的二进制比较运算符(<,>,=,!=,<>,> =,<=)。
  • Arithmetic operators ( +, -, *, /, % ).

    算术运算符(+,-,*,/,%)。
  • Logical operators (AND, OR).

    逻辑运算符(AND,OR)。
  • Unary operators (NOT, IS NULL, IS NOT NULL).

    一元运算符(NOT,IS NULL,IS NOT NULL)。

In PolyBase, we can use pushdown to improve the performance of the query for the external table. Mostly, we use this feature for the scale-out cluster cases where we can see significant improvement of the query performance. In this example, we will be using the standalone PolyBase configuration.

在PolyBase中,我们可以使用下推提高外部表查询的性能。 通常,我们在横向扩展集群的情况下使用此功能,在这种情况下,我们可以看到查询性能的显着改善。 在此示例中,我们将使用独立的PolyBase配置。

When we create an external data source for external table, we have the option to specify the value for PUSHDOWN as ON or OFF. The default value for pushdown is ON. Therefore, we do not need to specify a pushdown value if we want to enable it. Using PUSHDOWN, we can choose to move the computation to source system or not.

当我们为外部表创建外部数据源时,我们可以选择将PUSHDOWN的值指定为ON或OFF。 下推的默认值为ON。 因此,如果要启用下推值,则无需指定。 使用PUSHDOWN,我们可以选择是否将计算移至源系统。

The syntax for an external data source with pushdown is as below:

具有下推功能的外部数据源的语法如下:

CREATE EXTERNAL DATA SOURCE [DataSourceName]
WITH ( 
LOCATION = sqlserver://SqlServer,
-- PUSHDOWN = ON | OFF,
 CREDENTIAL = Credentials
);

We have already created a data source with a default value (Pushdown=ON) for the external table pointing to another SQL Server instance. Therefore, we will run the query with predicate with and without pushdown. To disable, pushdown we can use the predicate OPTION (DISABLE EXTERNAL PUSHDOWN) in the query.

我们已经为指向另一个SQL Server实例的外部表创建了具有默认值(Pushdown = ON)的数据源。 因此,我们将使用带下推和不带下推的谓词运行查询。 要禁用,下推我们可以在查询中使用谓词OPTION(DISABLE EXTERNAL PUSHDOWN)。

Similarly, while creating the external data source if we disabled the pushdown, we can enable it while running the query as OPTION (FORCE EXTERNALPUSHDOWN);

同样,如果禁用了下推功能,则在创建外部数据源时,可以在以OPTION(FORCE EXTERNALPUSHDOWN)运行查​​询的同时启用它。

Let us run the query and see the difference in performance.

让我们运行查询并查看性能差异。

  • Execute query with predicate and enabling Pushdown: In this query, we do not specify OPTION (FORCE EXTERNALPUSHDOWN)since it is by default enabled in the data source.

    使用谓词执行查询并启用Pushdown :在此查询中,我们没有指定OPTION(FORCE EXTERNALPUSHDOWN),因为默认情况下它已在数据源中启用。

    SELECT  [InvoiceID],[CustomerID],[BillToCustomerID],[OrderID]
          ,[DeliveryMethodID],[ContactPersonID],[AccountsPersonID],[SalespersonPersonID]
          ,[PackedByPersonID] ,[InvoiceDate],[CustomerPurchaseOrderNumber],[IsCreditNote]
          ,[CreditNoteReason],[Comments],[DeliveryInstructions],[InternalComments]
          ,[TotalDryItems] ,[TotalChillerItems] ,[DeliveryRun]
          ,[RunPosition],[LastEditedBy]
          ,[LastEditedWhen]
      FROM [SQLShackDemo].[sales].[Invoices]
      where InvoiceID>250 
      order by CustomerID
    

    Polybase -Execute query with predicate and enabling Pushdown SQL Server 2019

  • Query with predicate without pushdown: In this query, we disabled the pushdown with predicates OPTION (DISABLE EXTERNALPUSHDOWN):

    使用谓词不带下推的查询在此查询中,我们使用谓词OPTION(DISABLE EXTERNALPUSHDOWN)禁用了下推:

    SELECT  [InvoiceID],[CustomerID],[BillToCustomerID],[OrderID]
          ,[DeliveryMethodID],[ContactPersonID],[AccountsPersonID],[SalespersonPersonID]
          ,[PackedByPersonID] ,[InvoiceDate],[CustomerPurchaseOrderNumber],[IsCreditNote]
          ,[CreditNoteReason],[Comments],[DeliveryInstructions],[InternalComments]
          ,[TotalDryItems] ,[TotalChillerItems] ,[DeliveryRun]
          ,[RunPosition],[LastEditedBy]
          ,[LastEditedWhen]
      FROM [SQLShackDemo].[sales].[Invoices]
      where InvoiceID>250 
      order by CustomerID
      OPTION(DISABLE EXTERNALPUSHDOWN)
    

    Polybase - Query with predicate without pushdown

We can see here the query without pushdown took 30.524 seconds while query with pushdown took 19.754 seconds so there is a significant performance improvement with this approach. PUSHDOWN allows moving computation source, which we can see improvement in performance.

我们可以在此处看到不带下推的查询花费了30.524秒,而带下推的查询花费了19.754秒,因此这种方法的性能有了显着提高。 PUSHDOWN允许移动计算源,我们可以看到其性能有所提高。

结论 (Conclusion)

In this latest article in our series, we have learned to create an external table for SQL Server data source with the Azure Data Studio Create external table wizard along with T-SQL as well. We also learned about the PushDown approach for computation queries. In the next series of the article, we will explore more on PolyBase for different data sources.

在本系列的最新文章中,我们还学习了如何使用Azure Data Studio 创建外部表向导以及T-SQL为SQL Server数据源创建外部表 。 我们还了解了用于计算查询的PushDown方法。 在本文的下一个系列中,我们将在PolyBase上探索更多用于不同数据源的内容。

目录 (Table of contents)

Enhanced PolyBase SQL 2019 – Installation and basic overview
Enhanced PolyBase SQL 2019 – External tables for Oracle DB
Enhanced PolyBase SQL 2019 – External tables using t-SQL
Enhanced PolyBase SQL 2019 – External tables SQL Server, Catalog view and PushDown
Enhanced PolyBase SQL 2019 – MongoDB and external table
增强的PolyBase SQL 2019-安装和基本概述
增强的PolyBase SQL 2019-Oracle DB的外部表
增强的PolyBase SQL 2019-使用t-SQL的外部表
增强的PolyBase SQL 2019-外部表SQL Server,目录视图和下推式
增强的PolyBase SQL 2019 – MongoDB和外部表

翻译自: https://www.sqlshack.com/enhanced-polybase-sql-2019-external-tables-sql-server-catalog-view-and-pushdown/

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值