sql server 入门_SQL Server中的数据挖掘入门

本文介绍了如何使用SQL Server进行数据挖掘,包括创建数据源、数据源视图和挖掘结构,通过实例展示了数据挖掘的过程,如决策树模型的创建与验证。文章强调数据挖掘是一个迭代过程,需要结合多种参数和模型进行优化,以提高预测的准确性。
摘要由CSDN通过智能技术生成

sql server 入门

介绍 (Introduction)

In past chats, we have had a look at a myriad of different Business Intelligence techniques that one can utilize to turn data into information. In today’s get together we are going to have a look at a technique dear to my heart and often overlooked. We are going to be looking at data mining with SQL Server, from soup to nuts.

在过去的聊天中,我们了解了无数种可以用来将数据转换为信息的不同商业智能技术。 在今天的聚会中,我们将了解一种我心中常常被忽视的技术。 我们将研究使用SQL Server进行数据挖掘的过程,从无所不包。

Microsoft has come up with a fantastic set of data mining tools which are often underutilized by Business Intelligence folks, not because they are of poor quality but rather because not many folks know of their existence OR due to the fact that people have never had to opportunity to get to utilize them.

微软提供了一套出色的数据挖掘工具,这些工具经常被商业情报人员利用,这不是因为它们的质量很差,而是因为没有多少人知道他们的存在,或者是因为人们从来没有机会去利用它们。

Rest assured that you are NOW going to get a bird’s eye view of the power of the mining algorithms in our ‘fire-side’ chat today.

请放心,您现在将在今天的“火边”聊天中大致了解挖掘算法的功能。

As I wish to describe the “getting started” process in detail, this article has been split into two parts. The first describes exactly this (getting started), whilst the second part will discuss turning the data into real information.

正如我希望详细描述“入门”过程一样,本文分为两部分。 第一部分准确地描述了这一点(入门),而第二部分将讨论将数据转换为真实信息。

So ‘grab a pick and shovel’ and let us get to it!

因此,“抢一把铲子”,让我们开始吧!

入门 ( Getting started )

For today’s exercise, we start by having a quick look at our source data. It is a simple relational table within the SQLShackFinancial database that we have utilized in past exercises.

对于今天的练习,我们首先快速查看源数据。 它是我们在过去的练习中使用SQLShackFinancial数据库中的简单关系表。

As a disclosure, I have changed the names and addresses of the true customers for the “production data” that we shall be utilizing. The names and addresses of the folks that we shall utilize come from the Microsoft Contoso database. Further, I have split the client data into two distinct tables: one containing customer numbers under 25000 and the other with customer numbers greater than 25000. The reason for doing so will become clear as we progress.

作为披露,我更改了我们将要使用的“生产数据”的真实客户的名称和地址。 我们将利用的人员的姓名和地址来自Microsoft Contoso数据库。 此外,我已经将客户数据分为两个不同的表:一个包含25000以下的客户编号,另一个包含大于25000的客户编号。这样做的原因将随着我们的发展而变得清楚。

Having a quick look at the customer table (containing customer numbers less than 25000), we find the following data.

快速浏览客户表(包含少于25000的客户号),我们发现以下数据。

The screenshot above shows the residential addresses of people who have applied for financial loans from SQLShack Finance.

上面的屏幕截图显示了从SQLShack Finance申请了金融贷款的人的住所。

Moreover, the data shows criteria such as the number of cars that the applicant owns, his or her marital status and whether or not he or she owns a house. NOTE that I have not mentioned the person’s income or net worth. This is will come into play going forward.

此外,数据还显示一些标准,例如申请人拥有的汽车数量,他或她的婚姻状况以及他或她是否拥有房屋。 注意,我没有提及该人的收入或净资产。 这将在未来发挥作用。

创建我们的采矿项目 ( Creating our mining project )

Now that we have had a quick look at our raw data, we open SQL Server Data Tools (henceforward referred to as SSDT) to begin our adventure into the “wonderful world of data mining”.

现在,我们已经快速浏览了原始数据,我们将打开SQL Server数据工具(以下称为SSDT)开始我们的冒险,进入“精彩的数据挖掘世界”。

Opening SSDT, we select “New” from the “File” tab on the activity ribbon and select “Project” (see above).

打开SSDT,我们从活动功能区的“文件”选项卡中选择“新建”,然后选择“项目”(见上文)。

We select the “Analysis Services Multidimensional and Data Mining” option. We give our new project a name and click OK to continue.

我们选择“ Analysis Services多维和数据挖掘”选项。 我们给新项目起一个名字,然后单击“确定”继续。

Having clicked “OK”, we find ourselves on our working surface.

单击“确定”后,我们发现自己在工作表面上。

Our first task is to establish a connection to our relational data. We do this by creating a new “Data Source” (see below).

我们的首要任务是建立与我们的关系数据的连接。 为此,我们创建了一个新的“数据源”(见下文)。

We right-click on the “Data Sources” folder (see above and to the right) and select the “New Data Source” option.

我们右键单击“数据源”文件夹(请参见上方和右侧),然后选择“新数据源”选项。

The “New Data Source” Wizard is brought up. We click “Next”.

出现“新数据源”向导。 我们点击“下一步”。

We now find ourselves looking at connections that we have used in past and SSDT wishes to know which (if any) of these connections we wish to utilize. We choose our “SQLShackFinancial” connection.

现在,我们发现自己正在查看过去使用的连接,SSDT希望知道我们希望使用这些连接中的哪些(如果有)。 我们选择“ SQLShackFinancial”连接。

We select “Next”

我们选择“下一步”

We are asked for our credentials (see above) and click next.

要求我们提供凭据(见上文),然后单击下一步。

We are now asked to give a name to our connection (see above).

现在,我们被要求给我们的连接起一个名字(见上文)。

We click finish.

我们点击完成。

创建我们的数据源视图 ( Creating our Data Source View )

Our next task is to create a Data Source View. This is different to what we have done in past exercises.

我们的下一个任务是创建一个数据源视图。 这与我们在过去的练习中所做的不同。

The data source view permits us to create relationships (from our relational data) which we wish to carry forward into the ‘analytic world’. One may think of a “Data Source View” as a staging area for our relational data prior to its importation into our cubes and mining models.

数据源视图使我们能够(希望从关系数据中)创建关系,并希望将这些关系推向“分析世界”。 在将关系数据导入多维数据集和挖掘模型之前,可以将“数据源视图”视为关系数据的暂存区域。

We right-click on the “Data Source Views” folder and select “New Data Source View”.

我们右键单击“数据源视图”文件夹,然后选择“新数据源视图”。

The “Data Source View” wizard is brought up (see below).

出现“数据源视图”向导(请参见下文)。

We click “Next” (see above).

我们单击“下一步”(见上文)。

We select our “Data Source” that we defined above (see above).

我们选择上面定义的“数据源”(请参见上文)。

The “Name Matching” dialogue box is brought into view. As we shall be working with one table for this exercise, there is not much impact from this screen HOWEVER if we were creating a relationship between two or more tables we would indicate to the system that we want it to create the necessary logical relationships between the two or more tables to ensure that our tables are correctly joined.

出现“名称匹配”对话框。 由于我们将使用一个表进行此练习,因此,如果在两个或多个表之间创建关系,则此屏幕不会产生太大影响,但会向系统指示我们希望系统在表之间创建必要的逻辑关系。两个或更多表,以确保我们的表正确连接。

In our case we merely select “Next” (see above).

在我们的情况下,我们仅选择“下一步”(请参见上文)。

We are now asked to select the table or tables that we wish to utilize.

现在,要求我们选择希望使用的一个或多个表。

For our current exercise, I select the “Customer” table (See above) and move the table to the “Included Objects” (see below).

对于我们当前的练习,我选择“客户”表(见上文),然后将该表移至“包含的对象”(见下文)。

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值