sql 实现决策树
Decision trees, one of the very popular data mining algorithm which is the next topic in our Data Mining series. In the previous article Introduction to SQL Server Data Mining, we discussed what data mining is and how to set up the data mining environment in SQL Server. Then in the next article, Microsoft Naïve Bayes algorithm was discussed. In this Article, Microsoft Decision Trees are discussed with examples. The Microsoft Decision Trees algorithm is a classification and regression algorithm that works well for predictive modeling. The algorithm supports the prediction of both discrete and continuous attributes.
决策树是非常流行的数据挖掘算法之一,这是我们的数据挖掘系列的下一个主题。 在上一篇《 SQL Server数据挖掘简介》中 ,我们讨论了什么是数据挖掘以及如何在SQL Server中设置数据挖掘环境。 然后在下一篇文章中,将讨论MicrosoftNaïveBayes算法。 在本文中,将通过示例讨论Microsoft决策树。 Microsoft决策树算法是一种分类和回归算法,适用于预测建模。 该算法支持离散和连续属性的预测。
什么是决策树 (What is Decision Trees)
Decision Trees are one of the most common data mining algorithm. When you make a decision, you always tend to divide your problem. Let us say you want to go to one place from another place. To decide what time you should leave, you will have a lot of parameters in your mind. Depending on the day (weekend or weekday), type of mode of transport, time of traveling, and if there any special events, type of weather will decide the time. So when you decide on the time, there can combinations. For example, if it is raining, on a weekday, at a peak time, traveling time would be different for different combinations. All these combinations can be visualized into a tree format.
决策树是最常见的数据挖掘算法之一。 做出决定时,您总是倾向于划分问题。 假设您想从另一个地方去一个地方。 要决定应该离开什么时间,您会想到很多参数。 根据日期(周末或工作日),交通方式类型,旅行时间以及是否有特殊事件,天气类型将决定时间。 因此,当您决定时间时,可以进行组合。 例如,如果在工作日的高峰时段下雨,则对于不同的组合,旅行时间会有所不同。 所有这些组合都可以可视化为树格式。
Following is an example of a Decision Tree, which discusses the mode of transport depending on another requirement.
以下是“决策树”的示例,该树讨论了根据其他要求的运输方式。
Source: https://www.displayr.com/what-is-a-decision-tree/
资料来源: https : //www.displayr.com/what-is-a-decision-tree/
As you can see from the above figure, decision trees are extremely easy to understand. That is the most common reason why the decision trees are popular among most of the users.
从上图可以看出,决策树非常容易理解。 这是决策树在大多数用户中流行的最常见原因。
SSAS中的Microsoft决策树 (Microsoft Decision Trees in SSAS)
In SQL Server, using data sets model can be built with Decision Tree algorithms and then predictions can be done from the built decision tree.
在SQL Server中,可以使用决策树算法构建使用数据集模型,然后可以从构建的决策树中进行预测。
We will be using the same dataset vTargetMail view in the AdventureWorksDW database. As we discussed in the previous article, create a SSAS project in the Visual Studio. Then create a data source which will point to the AdventureworksDW database and DataSourceView in which vTargetMail is selected.
我们将在AdventureWorksDW数据库中使用相同的数据集vTargetMail视图。 正如我们在上一篇文章中讨论的那样,在Visual Studio中创建一个SSAS项目。 然后创建一个数据源,该数据源将指向AdventureworksDW数据库和其中选择了vTargetMail的DataSourceView。
Next, create a mining structure and select the Microsoft Decision Trees as shown in the below figure.
接下来,创建一个挖掘结构并选择Microsoft决策树,如下图所示。
In the wizard, vMailTarget was selected as the case table. Next, is to select input and predicted columns. Since we are looking at predicting the bike buyer, BikeBuyer attribute is the predicted attribute.
在向导中&#x