决策树的id3算法是之前学机器学习的一个作业,今天拿出来复习了一遍,想了想,贴到博客里共享吧
先把id3算法的过程贴出来
ID3(Examples, Target_attributes, Attributes)
Examples are the training examples. Target_attribute is the attribute whose
value is to be predicted by the tree. Attributes is a list of other attributes that
may be tested by the learned decision tree. Returns a decision tree that correctly
classifies the given Examples.
Create a Root node for the tree
If all Examples are positive, Return the single‐node tree Root, with label = +
If all Examples are negative, Return the single‐node tree Root, with label = ‐
If Attributes is empty, Return the single‐node tree Root, with label = most
common value of Target_attribute in Examples
Otherwise Begin
A ← the attribute from Attributes that best classifies Examples
The decision attribute for Root ← A
For each possible value, vi, of A
Add a new tree branch below Root, corresponding to the test A = vi
Let Examplesvi be the subset of Examples that have value vi for A
If Examplesvi is empty
Then below this new branch add a leaf node with label = most commonvalue of Target_attribute in Examples
Else below this new branch add the subtreeID3(Examplesvi, Target_attribute, Attributes ‐ {A})
End
Return
RootNote: The best attribute is the one with highest information gain//信息增益,怎么算,自己查书吧
作业的元数据是这样的一张图:
下边是用C++写的源代码:
#include <iostream>
#include <fstream>
#include <math.h>
#include <string>
using namespace std;
#define ROW 14
#define COL 5
#define log2 0.69314718055
typedef struct TNode
{
char data[15];
char weight[15];
TNode * firstchild,*nextsibling;
}*tree;
typedef struct LNode
{
char OutLook[15];
char Temperature[15];
char Humidity[15];
char Wind[15];
char PlayTennis[5];
LNode *next;
}*link;
typedef struct AttrNode
{
char attributes[15];//属性
int attr_Num;//属性的个数
AttrNode *next;
}*Attributes;
char * Examples[ROW][COL] = {//"OverCast","Cool","High","Strong","No",
// "Rain","Hot","Normal","Strong","Yes",
"Sunny","Hot","High","Weak","No",
"Sunny","Hot","High","Strong","No",
"OverCast","Hot","High","Weak","Yes",
"Rain","Mild","High","Weak","Yes",
"Rain","Cool","Normal","Weak","Yes",
"Rain","Cool","Normal","Strong","No",
"OverCast","Cool","Normal","Strong","Yes",
"Sunny","Mild","High","Weak","No",
"Sunny","Cool","Normal","Weak","Yes",
"Rain","Mild","Normal","Weak","Yes",
"Sunny","Mild","Normal","Strong","Yes",
"OverCast","Mild","Normal","Strong","Yes",
"OverCast","Hot","Normal","Weak","Yes",
"Rain","Mild","High","Strong","No"
};
char * Attributes_kind[4] = {"OutLook","Temperature","Humidity","Wind"};
int Attr_kind[4] = {3,3,2,2};
char * OutLook_kind[3] = {"Sunny","OverCast","Rain"};
char * Temperature_kind[3] = {"Hot","