如何在Microsoft SQL Server中使用R从Google Analytics(分析)接收数据

In this article I want to show in detail how you can use R in Microsoft SQL Server to get data from Google Analytics (and generally from any API).

在本文中,我想详细展示如何在Microsoft SQL Server中使用R从Google Analytics(通常从任何API)获取数据。

任务-我们有MS SQL Server,我们希望通过API在DWH中接收数据 (The task — we have MS SQL server and we want to receive data in DWH by API)

We will use googleAnalyticsR package to connect to Google Analytics (GA).

我们将使用googleAnalyticsR软件包连接到Google Analytics(分析)(GA)。

This package is chosen as an example due to its popularity. You can use another package, for example: RGoogleAnalytic. Approaches to problem solving will be the same.

由于其受欢迎程度,选择了此软件包作为示例。 您可以使用另一个包,例如: RGoogleAnalytic 。 解决问题的方法是相同的。

在MS SQL Server上安装R (Install R on MS SQL Server)

this is done via the standard interface for installing MS SQL components.

这是通过用于安装MS SQL组件的标准接口完成的。

  1. This is R that SQL Server will interact with directly (called in SQL queries).

    SQL Server将直接与之交互的R(在SQL查询中称为)。
  2. You can work with the R client copy from R Studio without fear of breaking something on the database server.

    您可以使用R Studio中的R客户端副本,而不必担心会破坏数据库服务器上的某些内容。

Accept the license agreement and pay attention that not оrdinary R will be installed but Microsoft R Open

接受许可协议,并注意将不会安装Rinaryary R,而是会安装Microsoft R Open

Briefly, what it is: Microsoft takes R Open, improves it with its packages and distributes for free. Accordingly, packages of this R version are available for download not in CRAN, but in MRAN.

简而言之,这是什么:Microsoft接受R Open,通过其软件包对其进行改进并免费分发。 因此,此R版本的软件包不是在CRAN中提供 ,而是在MRAN中提供

There is more to come. In fact, when installing MS SQL, we get not a clean MRAN, but something more — Microsoft ML Server.

还有更多。 实际上,在安装MS SQL时,我们得到的不是干净的MRAN,而是更多的Microsoft ML Server

This means to us that there will be additional packages in the set of R libraries – RevoScaleR.

对我们来说,这意味着R库集– RevoScaleR中将包含其他软件包。

RevoScaleR is designed for processing big data and building machine learning models on large datasets.

RevoScaleR设计用于处理大数据并在大型数据集上构建机器学习模型。

This information should be kept in mind since there is a high probability of questions related to different R versions.

请记住这些信息,因为与不同的R版本有关的问题的可能性很高。

After installing the components, we get the Microsoft R interaction.

安装组件后,我们将获得Microsoft R交互。

This console is not very convenient to use, so immediately download and install the free version RStudio.

该控制台使用起来不太方便,因此请立即下载并安装免费版本RStudio

配置SQL Server与R交互 (Configure SQL server to interact with R)

Execute the following scripts in SSMS:

在SSMS中执行以下脚本:

Allow scripts to run on SQL server

允许脚本在SQL Server上运行

sp_configure 'external scripts enabled', 1;
RECONFIGURE WITH OVERRIDE;

Server SQL restart

服务器SQL重新启动

Make sure R scripts are executed

确保执行R脚本

EXECUTE sp_execute_external_script
@language =N'R',
@script=N'print(version)';

Find the location of R packages that are used by SQL server

查找SQL Server使用的R包的位置

declare @Rscript nvarchar(max)

set @Rscript = N'
    InstaledLibrary <- library()
    InstaledLibrary <- as.data.frame(InstaledLibrary$results )
    OutputDataSet <- InstaledLibrary
'

EXECUTE sp_execute_external_script
    @language = N'R'
    , @script = @Rscript
WITH RESULT SETS (([Package] varchar(255) NOT NULL,
[LibPath] varchar(255) NOT NULL,
[Title] varchar(255) NOT NULL));

In my case, the path to R MS SQL packages: C:/Program Files/Microsoft SQL Server/MSSQL14.MSSQLSERVER/R_SERVICES/library

就我而言,R MS SQL包的路径为:C:/ Program Files / Microsoft SQL Server / MSSQL14.MSSQLSERVER / R_SERVICES / library

Run RStudio.

运行RStudio。

There is a good chance that there will be several R versions installed on the computer, so you need to make sure that we are working with the version of SQL server.

很可能在计算机上安装了多个R版本,因此您需要确保我们正在使用SQL Server版本。

Settings will be applied after RStudio restart.

RStudio重新启动后将应用设置。

安装googleAnalyticsR软件包 (Install the googleAnalyticsR package)

To RStudio using the command

使用命令到RStudio

library()

Find out the path to the package library of the R client version (with which RStudio works)

找出R客户端版本的软件包库的路径(RStudio可以与之一起使用)

In my case, this path: C:/Program Files/Microsoft SQL Server/140/R_SERVER/library

就我而言,该路径:C:/ Program Files / Microsoft SQL Server / 140 / R_SERVER / library

Install the googleAnalyticsR package via RStudio

通过RStudio安装googleAnalyticsR软件包

Here is some shaded nuance: You can’t just add anything you want to the MS SQL system folders. Packages will be saved in a temporary directory as ZIP archives.

这是一些细微差别:您不能只将所需的任何内容添加到MS SQL系统文件夹中。 软件包将作为ZIP存档保存在临时目录中。

Go to the temporary folder and unzip all the packages in Explorer.

转到临时文件夹,然后在资源管理器中解压缩所有软件包。

Unzipped packages must be copied to the R Services library directory (which MS SQL server works with).

必须将解压缩的程序包复制到R Services库目录(MS SQL Server与之配合使用)。

In my example this is the folder C:/Program Files/Microsoft SQL Server/MSSQL14.MSSQLSERVER/R_SERVICES/library Unzipped packages must also be copied to the R client version (which RStudio works with)

在我的示例中,这是文件夹C:/ Program Files / Microsoft SQL Server / MSSQL14.MSSQLSERVER / R_SERVICES / library还必须将未压缩的程序包复制到R客户端版本(RStudio可以使用)

In my example this is the folder C:/Program Files/Microsoft SQL Server/140/R_SERVER/library (we learned these paths from previously executed scripts)

在我的示例中,这是文件夹C:/ Program Files / Microsoft SQL Server / 140 / R_SERVER / library (我们从以前执行的脚本中学到了这些路径)

Before copying to the R Services folder, it is better to save a copy of the library folder. Experience has proven that there are different situations and it is better to be able to return to existing packages.

复制到R Services文件夹之前,最好先保存库文件夹的副本。 经验证明存在不同的情况,最好能够返回到现有的程序包。

When copying, replace all existing packages.

复制时,请替换所有现有的软件包。

To consolidate the skill, repeat the exercise. Only now we do not install packages, but update all existing ones. (this is not necessary for connecting to GA, but it is better to have the latest versions of all packages)

要巩固技能,请重复练习。 直到现在,我们才不安装软件包,而是更新所有现有软件包。 (对于连接到GA,这不是必需的,但是最好拥有所有软件包的最新版本)

Check for new packages in RStudio

在RStudio中检查新软件包

Packages will be downloaded to a temporary folder. Perform the same actions as when installing new packages.

软件包将下载到一个临时文件夹。 执行与安装新软件包时相同的操作。

检查MS SQL Internet访问 (Checking MS SQL Internet access)
declare @Rscript nvarchar(max)

set @Rscript = N'
    library(httr)
    HEAD("https://www.yandex.ru", verbose())
    '

EXECUTE sp_execute_external_script
    @language = N'R'
    , @script = @Rscript

Since SQL Server does not have Internet access by default, it is likely that the previous script will cause the following error.

由于默认情况下SQL Server无法访问Internet,因此以前的脚本很可能会导致以下错误。

Provide Internet access to R scripts from SQL.

提供从SQL对R脚本的Internet访问。

SQL 2017

SQL 2017

SQL 2019

SQL 2019

In SSMS

在SSMS中

-- Создаем базу данных для примера
create database Demo
go

use Demo
go

-- Создаем схему, для объектов базы данных связанных с Google Analytics  
create schema GA
go

-- Создаем таблицу для сохранения токена доступа к GA
drop table if exists [GA].[token]

create table [GA].[token](
[id] varchar(200) not null,
[value] varbinary(max)
constraint unique_id unique (id))
获取Google Analytics(分析)令牌 (Get Google Analytics token)

Execute the following code in RStudio: This will open the Google services authentication window in your browser. You will need to log in and give permission to access Google Analytics.

在RStudio中执行以下代码:这将在浏览器中打开Goog​​le服务身份验证窗口。 您需要登录并获得访问Google Analytics(分析)的权限。

# На всякий случай укажем тайм зону
Sys.setenv(TZ="Europe/Berlin")

library(googleAnalyticsR)

# Получаем токен
ga_auth()

PathTokenFile <- paste (  getwd(),"/", ".httr-oauth" , sep="")
TokenFile <- readBin(PathTokenFile, "raw", file.info(PathTokenFile)$size)

# Создали подключение к базе
conStr <- 'Driver={SQL Server};Server=EC2AMAZ-68OQ7JV;Database=Demo;Uid=Sa;Pwd=SaSql123'
ds <- RxOdbcData(table="ga.token", connectionString=conStr)

# Записываем токен в базу
rxWriteObject(ds, "ga_TokenFile", TokenFile)

In SSMS, make sure that the token from Google is received and recorded in the database

在SSMS中,确保已接收到来自Google的令牌并将其记录在数据库中

Select * from [GA].[token]
通过RStudio检查与GA的连接 (Check connection to GA via RStudio)
# Проверяем подключение В RStudio
Sys.setenv(TZ="Europe/Berlin")
library(googleAnalyticsR)

# Получаем токен из базы
conStr <- 'Driver={SQL Server};Server=EC2AMAZ-68OQ7JV;Database=Demo;Uid=Sa;Pwd=SaSql123' # Аутентификация в базе по пользователю
ds <- RxOdbcData(table="ga.token", connectionString=conStr)

PathTokenFile <- paste (  getwd(),"/", ".httr-oauth" , sep="")

TokenFile <- rxReadObject(ds, "ga_TokenFile")

write.filename = file(PathTokenFile, "wb")
writeBin(TokenFile, write.filename)

close(write.filename)

Sys.setenv("GA_AUTH_FILE" = PathTokenFile)

# Прошли аутентификацию
ga_auth()

# определили ga_id
account_list <- ga_account_list()
ga_id <- account_list$viewId

# Сохранили результат запроса
OutputDataSet <-google_analytics(ga_id,
                                 start="2019-01-01",
                                 end="2019-08-01",
                                 metrics = "sessions",
                                 dimensions = "date")

OutputDataSet

If everything went well, add R script to SQL and execute the query.

如果一切顺利,请将R脚本添加到SQL并执行查询。

drop table if exists #GA_session

create table #GA_session
(
[date] date,
[sessions] int
)

    declare @Rscript nvarchar(max)

    set @Rscript = N'
Sys.setenv(TZ="Europe/Berlin")
library(googleAnalyticsR)

# Получаем токен из базы
conStr <- ''Driver={SQL Server};Server=EC2AMAZ-68OQ7JV;Database=Demo;Uid=Sa;Pwd=SaSql123'' # Аутентификация в базе по пользователю
ds <- RxOdbcData(table="ga.token", connectionString=conStr)

PathTokenFile <- paste (  getwd(),"/", ".httr-oauth" , sep="")

TokenFile <- rxReadObject(ds, "ga_TokenFile")

write.filename = file(PathTokenFile, "wb")
writeBin(TokenFile, write.filename)

close(write.filename)

Sys.setenv("GA_AUTH_FILE" = PathTokenFile)

# Прошли аутентификацию
ga_auth()

# определили ga_id
account_list <- ga_account_list()
ga_id <- account_list$viewId

# Сохранили результат запроса
OutputDataSet <-google_analytics(ga_id,
                                 start="2019-01-01",
                                 end="2019-08-01",
                                 metrics = "sessions",
                                 dimensions = "date")

OutputDataSet$date  <- as.character(OutputDataSet$date)
'

-- print @Rscript

insert into #GA_session ([date],[sessions])
 EXECUTE sp_execute_external_script 
    @language = N'R',  
    @script = @Rscript

Select * 
from #GA_session
order by [date] asc

Pay attention that the script uses a Username and Password, which is a good thing. Therefore, we change the connection string to Windows authentication.

请注意,脚本使用用户名和密码,这是一件好事。 因此,我们将连接字符串更改为Windows身份验证。

conStr <- ''Driver={SQL Server};Server=EC2AMAZ-68OQ7JV;Database=Demo;Trusted_Connection=true'' # Виндовс аутентификация <<<=== Лучше выбирать этот вариант – хранить пароли в скриптах неправильно

After changing the authentication method, you will need to add database access rights to the service calling R.

更改身份验证方法后,您将需要向调用R的服务添加数据库访问权限。

(Of course, it is better to work with user groups. I simplified the solution as part of the demonstration)

(当然,最好与用户组合作。在演示中,我简化了解决方案)

We execute the SQL query as a procedure.

我们将SQL查询作为过程执行。

Create procedure Ga.Get_session 
     @Date_start date ='2019-01-01',
     @Date_End date ='2019-08-01'
as

drop table if exists #GA_session

create table #GA_session
(
[date] date,
[sessions] int
)

declare @Rscript nvarchar(max)

set @Rscript =CONCAT( N'
# Проверяем подключение В RStudio
Sys.setenv(TZ="Europe/Berlin")
library(googleAnalyticsR)

# Получаем токен из базы
conStr <- ''Driver={SQL Server};Server=EC2AMAZ-68OQ7JV;Database=Demo;Trusted_Connection=true'' # Виндовс аутентификация <<<=== Лучше выбирать этот вариант - никто пароля неувидит
ds <- RxOdbcData(table="ga.token", connectionString=conStr)

PathTokenFile <- paste (  getwd(),"/", ".httr-oauth" , sep="")

TokenFile <- rxReadObject(ds, "ga_TokenFile")

write.filename = file(PathTokenFile, "wb")
writeBin(TokenFile, write.filename)

close(write.filename)

Sys.setenv("GA_AUTH_FILE" = PathTokenFile)

# Прошли аутентификацию
ga_auth()

# определили ga_id
account_list <- ga_account_list()
ga_id <- account_list$viewId

# Сохранили результат запроса
OutputDataSet <-google_analytics(ga_id,
                                 start="' , @Date_start ,N'",
                                 end="' , @Date_End ,N'",
                                 metrics = "sessions",
                                 dimensions = "date")

OutputDataSet$date  <- as.character(OutputDataSet$date)
'
)
-- print @Rscript

insert into #GA_session ([date],[sessions])
 EXECUTE sp_execute_external_script 
    @language = N'R',  
    @script = @Rscript

Select * 
from #GA_session
order by [date] asc

Check the procedure operation

检查程序操作

-- Default options
exec  Ga.Get_session 

-- Get sessions for a given period
exec  Ga.Get_session  
    @Date_start  ='2019-08-01',
    @Date_End  ='2019-09-01'

R script is not complicated, it can always be copied to R Studio. Modify and save in SQL procedure. For example, I only changed the dimensions parameter and now can load Landing Page by dates.

R脚本并不复杂,可以始终将其复制到R Studio。 修改并保存在SQL过程中。 例如,我只更改了维度参数,现在可以按日期加载“着陆页”。

Create procedure [GA].[Get_landingPage_session] 
 @Date_start date ='2019-01-01',
 @Date_End date ='2019-08-01'
as

drop table if exists #GA_session

create table #GA_session
(
[date] date,
landingPagePath nvarchar(max),
[sessions] int
)

    declare @Rscript nvarchar(max)

    set @Rscript =CONCAT( N'
# Проверяем подключение В RStudio
Sys.setenv(TZ="Europe/Berlin")
library(googleAnalyticsR)

# Получаем токен из базы
conStr <- ''Driver={SQL Server};Server=EC2AMAZ-68OQ7JV;Database=Demo;Trusted_Connection=true'' # Виндовс аутентификация <<<=== Лучше выбирать этот вариант - никто пароля неувидит
ds <- RxOdbcData(table="ga.token", connectionString=conStr)

PathTokenFile <- paste (  getwd(),"/", ".httr-oauth" , sep="")

TokenFile <- rxReadObject(ds, "ga_TokenFile")

write.filename = file(PathTokenFile, "wb")
writeBin(TokenFile, write.filename)

close(write.filename)

Sys.setenv("GA_AUTH_FILE" = PathTokenFile)

# Прошли аутентификацию
ga_auth()

# определили ga_id
account_list <- ga_account_list()
ga_id <- account_list$viewId

# Сохранили результат запроса
OutputDataSet <-google_analytics(ga_id,
                                 start="' , @Date_start ,N'",
                                 end="' , @Date_End ,N'",
                                 metrics = "sessions",
                                 dimensions = c("date" ,"landingPagePath"))

OutputDataSet$date  <- as.character(OutputDataSet$date)

'
)
-- print @Rscript

insert into #GA_session ([date],landingPagePath,[sessions])
 EXECUTE sp_execute_external_script 
    @language = N'R',  
    @script = @Rscript

Select * 
from #GA_session
order by [date] asc

checking

检查

exec [GA].[Get_landingPage_session]

Basically, that’s it.

基本上就是这样。

I would like to note that using R via SQL, you can get data from any API. For example: receiving exchange rates.

我想指出,通过SQL使用R,您可以从任何API获取数据。 例如:接收汇率。

-- https://www.cbr-xml-daily.ru

Declare @script nvarchar(max) 

 set @script = N'           
    encoding = "utf-8"
    Sys.setlocale("LC_CTYPE", "russian")
    Sys.setenv(TZ="Europe/Berlin")

    library(httr)
    url <- "https://www.cbr-xml-daily.ru/daily_json.js"
    resp <- GET(url)
    library(jsonlite)
    Response <- fromJSON(content(resp, as = "text"))
    OutputDataSet <- data.frame(matrix(unlist(Response$Valute$USD), nrow=1, byrow=T),stringsAsFactors=FALSE)
    OutputDataSet <- rbind(OutputDataSet,data.frame(matrix(unlist(Response$Valute$EUR), nrow=1, byrow=T),stringsAsFactors=FALSE))
    '

EXEC sp_execute_external_script 
@language = N'R' 
, @script = @script
 with result SETS UNDEFINED

or obtaining data from the first available API, some farms in Australia…

或从第一个可用的API获取数据,澳大利亚的一些农场…

-- https://dev.socrata.com/

Declare @script nvarchar(max) 

 set @script = N'
    library(httr)
    url <- "https://data.ct.gov/resource/y6p2-px98.json?category=Fruit&item=Peaches"
    resp <- GET(url)
    library(jsonlite)
    Response <- fromJSON(content(resp, as = "text"))
    OutputDataSet <- as.data.frame(Response)

    OutputDataSet <-  OutputDataSet [,
                                 c("category" ,
                                   "item" , 
                                   "farmer_id"  , 
                                   "zipcode" ,  
                                   "business" , 
                                   "l" ,     
                                   "location_1_location",
                                   "location_1_city"  ,
                                   "location_1_state" ,
                                   "farm_name",        
                                   "phone1" ,            
                                   "website",    
                                   "suite")]
'

EXEC sp_execute_external_script 
@language = N'R' 
, @script = @script
 with result SETS UNDEFINED
总共: (In total:)

● connection passwords are not stored anywhere ● rights are distributed centrally through active directory accounts ● no additional configuration files ● no Python fiddles containing passwords to the database ● all code is stored in the procedures and saved when the database is backed up

●连接密码不会存储在任何地方●权限是通过活动目录帐户集中分配的●没有其他配置文件●没有包含数据库密码的Python小提琴●所有代码都存储在过程中,并在备份数据库时保存

MS SQL 2017 database backup with full code is available here (for playback, you need to install packages, distribute the rights, specify the name of your server)

此处提供具有完整代码的MS SQL 2017数据库备份(要播放,您需要安装软件包,分发权限,指定服务器的名称)

翻译自: https://habr.com/en/post/466589/

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值