local variable ‘classLabel‘ referenced before assignment Python机器学习_决策树DecisionTree中一个报错解决方法

wait021

已于 2024-09-07 10:41:29 修改

阅读量643

点赞数

文章标签： python

于 2023-02-10 17:31:23 首次发布

本文链接：https://blog.csdn.net/wait021/article/details/128974027

版权

在应用决策树进行分类时遇到'local variable 'classLabel' referenced before assignment'的问题，原因是测试数据的特征顺序与决策树的最优特征标签顺序不匹配。解决方案包括在函数开始时为classLabel赋默认值None，手动创建按顺序的测试数据，或根据最优标签顺序从数据集中获取测试数据。

摘要由CSDN通过智能技术生成


def readDataSet(filePath):
    """
    函数说明：读取数据集
    parameters:
        filePath - 数据集文件路径
    returns:
        dataSet - 数据集
        labels - 分类属性
    """
    # 初始化数据集
    dataSet = []
    # 读取数据
    with open(filePath, 'r') as f:
        # 遍历每一行数据
        for line in f.readlines():
            # 去掉每行的空格并将数据分割成列表
            line = line.strip().split()
            # 将每一行的数据添加到数据集
            dataSet.append(line)

    # 数据集中实现了在no和lenses之间加入一个下划线，并将最后的标签整合为一个字符串。
    for i in range(len(dataSet)):
        #dataSet[i] = dataSet[i].replace("no lenses", "no_lenses")
        if(dataSet[i][-2] == 'no'):
            dataSet[i][-1] = dataSet[i][-2] + '_' + dataSet[i][-1]
            dataSet[i] = dataSet[i][:-2] + [dataSet[i][-1]]

    # 获取数据的分类属性
    labels = ['age', 'prescript', 'astigmatic', 'tearRate']
    return dataSet, labels


def majorityCnt(classList):
    """
    函数说明: 统计classList中出现次数最多的元素(类标签)
    Parameters:
        classList - 类标签列表
    Returns:
        sortedClassCount[0][0] - 出现次数最多的类标签
    """
    classCount = {}
    # 统计classList中每个元素出现的次数
    for vote in classList:
        if vote not in classCount.keys():
            classCount[vote] = 0
        classCount[vote] += 1
    # 根据字典的值降序排序
    sortedClassCount = sorted(classCount.items(), key=operator.itemgetter(1), reverse=True)
    return sortedClassCount[0][0]

def createTree(dataSet, labels, featLabels):
    """
    函数说明:创建决策树
    Parameters:
        dataSet - 训练数据集
        labels - 分类属性标签
    Returns:
        myTree - 决策树
    """
    classList = [example[-1] for example in dataSet]
    if classList.count(classList[0]) == len(classList):
        return classList[0]
    if len(dataSet[0]) == 1:
        return majorityCnt(classList)
    bestFeat = chooseBestFeatureToSplit(dataSet)
    bestFeatLabel = labels[bestFeat] 
    featLabels.append(bestFeatLabel) #存储选择的最优特征标签
    myTree = {bestFeatLabel:{}}
    del(labels[bestFeat])
    featValues = [example[bestFeat] for example in dataSet]
    uniqueVals = set(featValues)
    for value in uniqueVals:
        subLabels = labels[:]
        myTree[bestFeatLabel][value] = createTree(
            splitDataSet(dataSet, bestFeat, value), subLabels, featLabels) 
    return myTree
    pass


def classify(inputTree, featLabels, testVec):
    """
    函数说明:使用决策树分类
    Parameters:
        inputTree - 已经生成的决策树
        featLabels - 存储选择的最优特征标签
        testVec - 测试数据列表，顺序对应最优特征标签
    Returns:
        classLabel - 分类结果
    """
    classLabel = None
    firstStr = list(inputTree.keys())[0]
    secondDict = inputTree[firstStr]
    featIndex = featLabels.index(firstStr)
    for key in secondDict.keys():
        if testVec[featIndex] == key:
            if type(secondDict[key]).__name__ == 'dict':
                classLabel = classify(secondDict[key], featLabels, testVec)
            else:
                classLabel = secondDict[key]
    return classLabel
    pass

#在根据决策树分类时，一开始出现local variable 'classLabel' referenced before assignment这个问题网友也经常会遇到，

#classify(第三个参数)这里输入的数据要安装顺序对应最优特征标签排列，比如['tearRate', 'astigmatic', 'prescript', 'age']才不会报错。

#而不是安装数据集中特征的排列数据作为输入

举个例子

如果直接从数据集中提取一条数据，作为classify(inputTree, featLabels, testVec)的testVec（测试数据列表）


testVec1_Glass = dataSet_Glass[int(np.random.choice(len(dataSet_Glass)))

有可能数据集的特征排列和测试数据列表(顺序对应最优特征标签)顺序不同。

例如

在预测隐形眼镜类型中数据的Labels依次是age、prescript、astigmatic、tearRate、class其中前4个为样本特征，而在决策树用来决策时，ID3使用信息增量作为特征划分顺序，其使用的特征顺序为最优特征标签序列: ['tearRate', 'astigmatic', 'prescript', 'age'] ,这就会导致两者的不一致。

同时在 classify(inputTree, featLabels, testVec)中也注释了， testVec - 测试数据列表，顺序对应最优特征标签。

解决方案：

第一种：在函数最初给 classLabel 赋值=None，一旦出现 classLabel = None，就可以知道输入的测试样例与决策树的决策顺序不对

第二中（简单）：自己按照顺序写一个对于最优特征标签的测试数据


testVec1_Glass = ['normal', 'yes', 'hyper', 'young','hard']
classLabel1_Glass = classify(myTree_Glass, featLabels_Glass, testVec1_Glass[:-1])
print("testVec1_Glass => classLabel1_Glass:", classLabel1_Glass)

第三种：安装最优特征标签到数据集中取测试数据


testVec2_Glass = []
Rand_Glass = int(np.random.choice(len(dataSet_Glass)))
for i in range(len(dataSet_Glass[0])):
    testVec2_Glass.append(dataSet_Glass[Rand_Glass][4-i])
classLabel2_Glass = classify(myTree_Glass, featLabels_Glass, testVec2_Glass[1:])
print("testVec2_Glass => classLabel2_Glass:", classLabel2_Glass)

wait021

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫