决策树代码实现

最新推荐文章于 2023-11-04 16:20:04 发布

EmmaYuer

最新推荐文章于 2023-11-04 16:20:04 发布

阅读量765

点赞数

分类专栏：决策树文章标签：机器学习

本文链接：https://blog.csdn.net/xiaoyuer3677/article/details/41981297

版权

决策树专栏收录该内容

1 篇文章 0 订阅

订阅专栏

马上要读研未来方向是机器学习最近要加紧学习啦~

先来研究了一下最基本的“决策树” 虽然现在又很多决策树实现的工具例如scikit-learn 不过还是想先自己码一码代码才能理解相关的内容

先来点基础知识篇

信息论相关知识：http://blog.csdn.net/dark_scope/article/details/8459576

决策树算法知识：http://blog.csdn.net/dark_scope/article/details/13168827 我的代码基本上和这个链接里面的一样要感谢这位大神不过他的代码有几个小错我改过来了~

多的不说上代码（不过还没有实现控制决策树的层数，会更新的）：

#-*- coding:utf-8-*-
from __future__ import division
import numpy as np
import scipy as sp
import pylab as py

def pGini(y):
    ty=y.reshape(-1,).tolist() #将原来的y变为一个list
    label=set(ty)  #y中所有的类
    sum=0
    num_case=y.shape[0]
    for i in label:
	sum+=(np.count_nonzero(y==i)/num_case)**2
    return 1-sum

class DTC: #决策树类
    def __init__(self,x,y,property=None):
            #x 是训练样本 是M*N的数组 M是样本数 N是特征数
            #y是每个样本对应的类别
            #property 是一个二进制 长度为N的向量 代表第i个特征是离散的（0）还是连续的（1）
	if x.shape[0]==y.shape[0]:      
	    self.x=np.array(x)
            self.y=np.array(y)
            self.feature_dict={}
	    self.labels,self.y=np.unique(y,return_inverse=True)  #labels是所有不重复的类，y是原来y中每个值对应在labels中的位置 参加：http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html
	    self.DT=list()
	    if property==None:
		self.property=np.zeros((self.x.shape[1],1))   #默认全为离散的feature
	    else:
		self.property=property
	   
	    for i in range(self.x.shape[1]):
		self.feature_dict[i]=np.unique(x[:,i])   #每个feature都有哪些取值
	else:
	    print 'x,y不一样长！'
    
    def Gini(self,x,y,k,k_v):
	if self.property[k]==0:
	    c1=(x[x[:,k]==k_v]).shape[0]
	    c2=(x[x[:,k]!=k_v]).shape[0]
	    D=y.shape[0]
	    return c1*pGini(y[x[:,k]==k_v])/D+c2*pGini(y[x[:,k]!=k_v])/D
	else:
	    c1=(x[x[:,k]>=k_v]).shape[0]
	    c2=(x[x[:,k]<k_v]).shape[0]
            D=y.shape[0]
	    return c1*pGini(y[x[:,k]>=k_v])/D+c2*pGini(y[x[:,k]<k_v])/D

    def makeTree(self,x,y):
	min=10000
	f_index,f_value=0,0
	if np.unique(y).size<=1:   #如果只有一个类 那就返回是这个类
	    return (y[0])   #返回的是类的序号
	for i in range(self.x.shape[1]):
	    for j in self.feature_dict[i]:
		p=self.Gini(x,y,i,j)
		if p<min:
		    min=p
		    f_index,f_value=i,j

	if min==1:
	    return (y[0])    #如果所有feature都没办法有效地分类 则任意定一个类
        left=[]
	right=[]
	if self.property[f_index]==0:
	    left=self.makeTree(x[x[:,f_index]==f_value],y[x[:,f_index]==f_value])
	    right=self.makeTree(x[x[:,f_index]!=f_value],y[x[:,f_index]!=f_value])
	else:
	    left=self.makeTree(x[x[:,f_index]>=f_value],y[x[:,f_index]>=f_value])  #左大右小
	    right=self.makeTree(x[x[:,f_index]<f_value],y[x[:,f_index]<f_value])
	return [(f_index,f_value),left,right]

    def train(self):
	self.DT=self.makeTree(self.x,self.y)
	print self.DT

    def pred(self,x):
	x=np.array(x)
        result=np.zeros((x.shape[0],1))
        for i in range(x.shape[0]):
	    tp=self.DT
	    while type(tp) is list:  #当仍未到达叶子节点时
		a,b=tp[0]  #第一个分切点
		if self.property[a]==0:
		    if x[i][a]==b:
			tp=tp[1]
		    else:
			tp=tp[2]
		else:
		    if x[i][a]>=b:
			tp=tp[1]
		    else:
			tp=tp[2]
	    result[i]=self.labels[tp]  #获得类
	return result


def test():
	x=np.array([[0,0,0,0,8],[0,0,0,1,3.5],[0,1,0,1,3.5],[0,1,1,0,3.5],[0,0,0,0,3.5],[1,0,0,0,3.5],[1,0,0,1,3.5],[1,1,1,1,2],[1,0,1,2,3.5],[1,0,1,2,3.5]])
        y=np.array([[1],[0],[1],[1],[0],[0],[0],[1],[1],[1]])
        prop=np.zeros((5,1))	
        prop[4]=1 
        a=DTC(x,y,prop)
        a.train()
        print a.pred([[0,0,0,0,3.0],[2,1,0,1,2]])

test()

EmmaYuer

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
决策树代码实现

马上要读研未来方向是机器学习最近要加紧学习啦~先来研究了一下最基本的“决策树” 虽然现在又很多决策树实现的工具例如scikit-learn 不过还是想先自己码一码代码才能理解相关的内容先来点基础知识篇信息论相关知识：http://blog.csdn.net/dark_scope/article/details/8459576决策树算法知识：http://blog.csdn
复制链接

扫一扫

专栏目录