如何在SQL Server 2017中使用Python获得高级数据分析

本文探讨了Microsoft SQL Server 2017如何成为数据科学的解决方案,支持Python进行高级数据分析。从CTP 2.0开始,SQL Server引入Python集成,允许在数据库中直接运行Python代码,支持数据处理和机器学习。安装SQL Server 2017 CTP 2.0并启用外部脚本执行后,可以通过sp_execute_external_script执行Python脚本,实现与SQL Server的无缝集成。
摘要由CSDN通过智能技术生成

On the 19th of April 2017, Microsoft held an online conference called Microsoft Data Amp to showcase how Microsoft’s latest innovations put data, analytics and artificial intelligence at the heart of business transformation. Microsoft has, over the last few years, made great strides in accelerating the pace of innovation to enable businesses to meet the demands of a dynamic marketplace and harness the incredible power of data—more securely and faster than ever before.

在19 2017年4月,微软举办了名为Microsoft数据安培在线会议,以展示微软最新的创新技术是如何把数据,分析和人工智能在业务转型的心脏。 在过去的几年中,Microsoft在加快创新步伐方面取得了长足进步,以使企业能够满足动态市场的需求并以前所未有的安全性和更快的速度利用数据的强大功能。

After the conference, there were a few questions some of us had, though. Is Microsoft SQL Server 2017 emerging as an enterprise solution for data science? Does it provide the required capabilities—is the engine capable of handling huge data? It seems the answer is “Yes”, as starting with the CTP 2.0 release of SQL Server 2017, Microsoft has brought Python-based intelligence to data in SQL Server.

会议之后,我们中的一些人还是有一些疑问。 Microsoft SQL Server 2017是否正在成为数据科学的企业解决方案? 它是否提供了所需的功能-引擎能够处理大量数据吗? 似乎答案是“是”,因为从SQL Server 2017的CTP 2.0版本开始,Microsoft已将基于Python的情报引入SQL Server中的数据。

Python has gathered a lot of interest recently as a language of choice for data analysis. This language has the right set of libraries for data analysis and predictive modeling, not to mention a simpler learning curve.

作为数据分析的一种选择,Python最近引起了很多兴趣。 这种语言具有用于数据分析预测建模的正确库集,更不用说更简单的学习曲线了。

The growing trends of data science and modeling predict a massive growth in data in the upcoming years. The propulsion towards innovation and adaptation to leading trends in the data technology might intrigue us enough to make us take a look at the current release of SQL Server 2017.

数据科学和建模的增长趋势预示了未来几年数据的巨大增长。 推动创新和适应数据技术领先趋势的推动力可能会吸引我们,足以让我们看一下SQL Server 2017的当前版本。

Data science is a combination of Data Mining, Machine Learning, Analytics and Big Data. The integration of SQL 2016 with data science language, R, into database the engine provides an interface that can efficiently run models and generate predictions using SQL R services. Python builds on the foundation laid for R Services in SQL Server 2016, and extends that mechanism to include Python support for in-database analytics and machine learning.

数据科学是数据挖掘,机器学习,分析和大数据的结合。 该引擎将SQL 2016与数据科学语言R集成到数据库中,提供了一个界面,该界面可以使用SQL R服务有效地运行模型并生成预测。 Python建立在SQL Server 2016中为R Services奠定的基础上,并将该机制扩展为包括对数据库内分析和机器学习的Python支持。

Moreover, the R–Python integration in SQL Server, apart from emphasizing productivity and code readability, also can take advantages of parallel query processing, security and better resource governance.

此外,SQL Server中的R–Python集成除了强调生产力和代码可读性之外,还可以利用并行查询处理,安全性和更好的资源治理优势。

Now, Microsoft has renamed R Services as Machine Learning Services, bringing R and Python under the umbrella. The renamed Microsoft Machine Learning Services component enables Python to run directly on the database server, or along with embedded T-SQL scripts.

现在,Microsoft将R Services重命名为Machine Learning Services ,从而将R和Python纳入了伞下。 重命名的Microsoft Machine Learning Services组件使Python可以直接在数据库服务器上运行,也可以与嵌入式T-SQL脚本一起运行。

Developers have used stored procedure called sp_execute_external_script to run R code, whose first parameter is @language. Microsoft had designed this stored procedure to be open-ended.

开发人员已使用名为sp_execute_external_script的存储过程来运行R代码,其第一个参数为@language 。 Microsoft已将此存储过程设计为开放式的。

The engine inherits the features of R, in order to adopt Python. To run Python code in SQL Server, we have to install SQL Server 2017 CTP 2.0, with the Machine Learning Services with Python feature. It’s noteworthy that other versions of SQL Server do not support Python integration.

该引擎继承了R的功能,以采用Python。 若要在SQL Server中运行Python代码,我们必须安装具有Python机器学习服务功能SQL Server 2017 CTP 2.0。 值得注意的是,其他版本SQL Server不支持Python集成。

文章探索 (The article explores)

  • Machine Learning Services  机器学习服务的配置
  • Configuration of Python

    Python的配置
  • Configuration of the instance to allow execution of scripts that use an external executable

    配置实例以允许执行使用外部可执行文件的脚本
  • Data-type support

    数据类型支持
  • Samples to understand the use of Python in SQL

    样本以了解Python在SQL中的用法

安装 (Installation)

  • Download SQL Server 2017 CTP 2.0. During Feature Selection at the time of installation, Python is listed as a part of Machine Learning Services.

    下载SQL Server 2017 CTP 2.0。 在安装时的功能选择期间,Python被列为机器学习服务的一部分。
  • Run the setup wizard for SQL Server 2017.

    运行SQL Server 2017的安装向导。
  • On the Installation tab, click New SQL Server stand-alone installation or add features to an existing installation.

    在“安装”选项卡上,单击“新建SQL Server独立安装”或向现有安装添加功能。
    • Database Engine Services: To use Python with SQL Server, you must install an instance of the database engine. 数据库引擎服务 :要将Python与SQL Server一起使用,必须安装数据库引擎的实例。
    • Machine Services (In-Database): This option installs the database services that support Python script execution. 机器服务(数据库内) :此选项安装支持Python脚本执行的数据库服务。
  • Python: Check this option to get the Python 3.5 executable and select libraries from the Anaconda distribution Python :选中此选项以获取Python 3.5可执行文件并从Anaconda发行版中选择库

On the below page, Consent to Install Python, click Accept.

在接下来的页面, 同意安装Python ,点击接受

On the Ready to Install page, verify that selected components are included

在“ 准备安装”页面上,验证是否包含选定的组件

After a successful installation, the instance is ready for enabling an external script execution parameter.

成功安装后,该实例已准备好启用外部脚本执行参数。

To enable SQL Instance to run Python scripts:

要使SQL实例能够运行Python脚本,请执行以下操作:

  • Open the SQL Server Management Studio.

    打开SQL Server Management Studio。
  • Connect to the instance where Machine Learning Services is installed, and run the following command:

    连接到安装了Machine Learning Services的实例,然后运行以下命令:

    sp_configure

    sp_configure

  • To enable the external scripting feature that supports Python, run the following statement.

    要启用支持Python的外部脚本功能,请运行以下语句。

    EXEC sp_configure ‘external scripts enabled’, 1
    RECONFIGURE WITH OVERRIDE

    EXEC sp_configure'已启用外部脚本',1
    重新配置

  • Restart the SQL Instance.

    重新启动SQL实例。

sp_execute _external_script (sp_execute_external_script)

The sp_execute_external_script is a system stored procedure that executes with a provided R/Python script as an argument. To enable normal functioning of this external stored procedure, you must have administrator access to your SQL Server instance, so that you can run the sp_configure command. The procedure will invoke the launchpad service to the respective library for its execution.

sp_execute_external_script是一个系统存储过程,使用提供的R / Python脚本作为参数执行。 若要使此外部存储过程正常运行,必须具有对SQL Server实例的管理员访问权限,以便您可以运行sp_configure命令。 该过程将对相应的库调用启动板服务以执行它。

sp_execute_external_script   
    @language = N'language' ,   
    @script = N'script',  
    @input_data_1 = ] 'input_data_1'   
    [ , @input_data_1_name = ] N'input_data_1_name' ]   
    [ , @output_data_1_name = 'output_data_1_name' ]  
    [ , @parallel = 0 | 1 ]  
    [ , @params = ] N'@parameter_name data_type [ OUT | OUTPUT ] [ ,...n ]'  
    [ , @parameter1 = ] 'value1' [ OUT | OUTPUT ] [ ,...n ]  
    [ WITH <execute_option> ]  
[;]  
 
<execute_option>::=  
{  
      { RESULT SETS UNDEFINED }   
    | { RESULT SETS NONE }   
    | { RESULT SETS ( <result_sets_definition> ) }  
}
Parameter Purpose
@language = N’Python’ Scripting language parameter, In this case it’s Python
@script = N’ ‘ Python script body
@input_data_1 = N’ T-SQL Statement’ The T-SQL statement reads data from SQL Table
@output_data_1_name = N’ Data Frame Name’ Holds the data frame generated inside the Python Script.
WITH RESULT SETS ((Col1 DataType,Col2 DataType )) Specifies the Output column and Datatype of the dataframe columns. This is optional.
参数 目的
@language = N'Python' 脚本语言参数,在本例中为Python
@script = N'' Python脚本主体
@ input_data_1 = N'T-SQL语句' T-SQL语句从SQL表读取数据
@ output_data_1_name = N'数据框名称' 保存在Python脚本内部生成的数据框。
有结果集((Col1 DataType,Col2 DataType)) 指定输出列和数据框列的数据类型。 这是可选的。

在SQL Server中执行Python代码 (Execute Python Code in SQL Server)

Microsoft has made it possible to embed Python code directly in SQL Server databases by including the code as a T-SQL stored procedure.

Microsoft通过将代码作为T-SQL存储过程包括在内,可以将Python代码直接嵌入SQL Server数据库中。

数据类型支持 (Datatype Support)

Python supports a limited number of data types in comparison to SQL Server. As a result, whenever you use data from SQL Server in Python scripts, the data might be implicitly converted to a type compatible with Python. However, often an exact conversion cannot be performed automatically, and an error is returned. This table lists the implicit conversions that are provided. Other data types are not supported.

与SQL Server相比,Python支持有限数量的数据类型。 因此,每当您在Python脚本中使用来自SQL Server的数据时,数据可能会隐式转换为与Python兼容的类型。 但是,通常无法自动执行精确的转换,并返回错误。 下表列出了所提供的隐式转换。 不支持其他数据类型。

SQLTYPE PYTHON TYPE
bigint numeric
binary raw
bit bool
char str
float float64
int int32
nchar str
nvarchar str
nvarchar(max) str
real float32
smallint int16
tinyint uint8
varbinary bytes
varbinary(max) bytes
varchar(n) str
varchar(max) str
SQL类型 python类型
比金特 数字
二元 生的
一点 布尔
烧焦 力量
浮动 float64
整型 int32
nchar 力量
nvarchar 力量
nvarchar(最大) 力量
真实 float32
Smallint int16
tinyint uint8
varbinary 个字节
varbinary(最大) 个字节
varchar(n) 力量
varchar(最大) 力量

例子 (Examples)

Print the input value

打印输入值

exec sp_execute_external_script 
@language =N'Python',
@script=N'OutputDataSet = InputDataSet
print("Input data is {0}".format(InputDataSet))
', 
@input_data_1 = N'SELECT 1 as col'

Find Mean of a given list

查找给定列表的均值

execute sp_execute_external_script 
@language = N'Python',
@script = N'
l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
print(sum(l) / float(len(l)))
'

Format operator

格式运算符

exec sp_execute_external_script
@language = N'Python'
,@script=
N'
print("""jan:{2} feb:{0} mar:{2} Apr:{1} May:{2} Jun:{1} Jul:{2} Aug:{2} Sep:{1} Oct:{2} Nov:{1} Dec:{2}""".format(InputDataSet.A, InputDataSet.B, InputDataSet.C))
',
@input_data_1 = N'select 28 as A ,30 as B,31 as C'

Using loops and branches

使用循环和分支

execute sp_execute_external_script 
@language = N'Python',
@script = N'
for i in range(5):
 if i<3 :
     print("i is now:", i*2)'

Passing a table as input and generating a computed column called bonus

传递表作为输入并生成称为奖金的计算列

  • Create the EMP table

    创建EMP表
  • Inserting dummy values

    插入哑数值
  • Execute the Python script to generate the compute column

    执行Python脚本以生成计算列
DROP TABLE IF EXISTS dbo.EMP
GO
CREATE TABLE [dbo].[EMP](
	[empno] [int] NOT NULL,
	[ename] [varchar](10) NULL,
	[job] [varchar](9) NULL,
	[mgr] [int] NULL,
	[hiredate] [datetime] NULL,
	[sal] float NULL,
	[comm] [numeric](7, 2) NULL,
	[dept] [int] NULL,
PRIMARY KEY CLUSTERED 
(
	[empno] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
 
GO
 
INSERT INTO EMP VALUES
    (1,'Prashanth','ADMIN',6,'12-17-1990',18000,NULL,4)
INSERT INTO EMP VALUES
    (2,'Jayaram','MANAGER',9,'02-02-1998',52000,300,3)
INSERT INTO EMP VALUES
    (3,'thanVitha','SALES I',2,'01-02-1996',25000,500,3)
 
SELECT EMPNO,ENAME,SAL from EMP
 
EXECUTE sp_execute_external_script 
@language = N'Python',
@script=N'OutputDataSet = InputDataSet
for i in OutputDataSet["sal"]:
	OutputDataSet["Bonus"]=OutputDataSet["sal"]*0.05',
@input_data_1 = N'SELECT [empno],[ename],sal,0 as Bonus from EMP'
WITH RESULT SETS ((EMPNO int, ENAME varchar(10), SAL float, Bonus float))

Read content from a text file using the DictReader function.

使用DictReader函数从文本文件中读取内容。

EXECUTE sp_execute_external_script  
@language = N'Python'  
, @script = N'
import csv
with open(''e:\python.txt'') as f:
    reader = csv.DictReader(f, delimiter = ''\t'')
    for row in reader:
        print (row)
'

The following sample shows how to import the Pandas library to get access to dataframes and do the following tasks

以下示例显示了如何导入Pandas库以访问数据框并执行以下任务

  • Read CSV file using Pandas library

    使用Pandas库读取CSV文件
  • Find the number of Rows using Pandas object pyo

    使用Pandas对象pyo查找行数
  • Fetch the first row of the csv file

    提取csv文件的第一行
  • Calculate the mean of every statistical column

    计算每个统计列的平均值
EXECUTE sp_execute_external_script  
@language = N'Python'  
, @script = N'
import pandas
pyo = pandas.read_csv("e:\InputServer_1.csv")
print (pyo)
#Finding the number of rows
print (pyo.shape)
#Looking at the first row of the data
print (pyo.head(1))
#Find the average of each statistic
print (pyo.mean())
'

Find the mean over the statistical column

在统计列上找到平均值

The data is fed from the table and the computation happens using the Pandas library

数据从表中馈送,并且使用Pandas库进行计算

 
DROP TABLE IF EXISTS MyData;
CREATE TABLE MyData([Col1] INT NOT NULL) ON [PRIMARY];
INSERT INTO MyData VALUES(1);
INSERT INTO MyData VALUES(10);
INSERT INTO MyData VALUES(100);
GO
 
-- print all rows of MyData table
SELECT * FROM MyData;
 
--Find mean of Col1
 
 EXECUTE sp_execute_external_script  
@language = N'Python'  
, @script = N'
import pandas
print("*******************************")
OutputDataSet = InputDataSet
print (OutputDataSet)
print (OutputDataSet.mean())
print("*******************************")
'
, @input_data_1 = N'SELECT * from Mydata'

摘要 (Summary)

This article covered how SQL Server 2017 introduces support for data analytics, and the use of Python in addition to R scripts. I’ve detailed the installation procedures, configuration details and execution of a sample Python SQL script.

本文介绍了SQL Server 2017如何引入对数据分析的支持以及R脚本之外的Python使用。 我已经详细说明了安装过程,配置细节和示例Python SQL脚本的执行。

The close integration of R/Python to SQL Server machine eliminates unnecessary movement of data across machines; think of moving millions/billions of rows to the client for modeling or scoring over the network—it is cumbersome, and a tedious job. This is one of the reasons why data scientists rely on sampling (Test set v/s Train Set). It’s a useful approach, especially where there are issues of data sovereignty and compliance. Your code runs inside the SQL Server security boundaries, triggered by a single call from the T-SQL stored procedures.

R / Python与SQL Server计算机的紧密集成消除了跨计算机不必要的数据移动; 考虑将数以百万计的行移动到客户端以通过网络进行建模或评分-这既麻烦又繁琐。 这是数据科学家依赖采样(测试集与火车集)的原因之一。 这是一种有用的方法,尤其是在存在数据主权和合规性问题的情况下。 您的代码在SQL Server安全边界内运行,由T-SQL存储过程的一次调用触发。

目录 (Table of contents)

A Quick start Guide to Managing SQL Server 2017 on CentOS/RHEL Using the SSH Protocol  
How to use Python in SQL Server 2017 to obtain advanced data analytics
Data Interpolation and Transformation using Python in SQL Server 2017    
An introduction to a SQL Server 2017 graph database 
Top string functions in SQL Server 2017 
Top 8 new (or enhanced) SQL Server 2017 DMVs and DMFs for DBAs
Overview of Resumable Indexes in SQL Server 2017 
Understanding automatic tuning in SQL Server 2017
使用SSH协议在CentOS / RHEL上管理SQL Server 2017的快速入门指南
如何在SQL Server 2017中使用Python获得高级数据分析
在SQL Server 2017中使用Python进行数据插值和转换
SQL Server 2017图形数据库简介
SQL Server 2017中的顶级字符串函数
适用于DBA的前8大新(或增强)SQL Server 2017 DMV和DMF
SQL Server 2017中的可恢复索引概述
了解SQL Server 2017中的自动调整

翻译自: https://www.sqlshack.com/how-to-use-python-in-sql-server-2017-to-obtain-advanced-data-analytics/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值