选择两个UCI数据集,比较10折交叉验证法和留一法所估计出的对率回归的错误率。
解答:
从UCI网站上选择了Iris数据集,这个数据集总共分为3类,每类50个样本,每个实例有四个属性。数据保存在bezdekIris.txt文件中,举一个样本为例:
5.1,3.5,1.4,0.2,Iris-setosa
下面依次以(X1, X2), (X1, X3), (X2, X3)为训练数据应用十折交叉验证法和留一法:
"""
Author: Victoria
Created on: 2017.9.15 11:00
"""
import numpy as np
import matplotlib.pyplot as plt
def readData():
"""
Read data from txt file.
Return:
X1, y1, X2, y2, X3, y3: X is list with shape [50, 4],
y is list with shape [50,]
"""
X1 = []
y1 = []
X2 = []
y2 = []
X3 = []
y3 = []
#read data from txt file
with open("bezdekIris.txt", "r") as f:
for line in f:
x = []
iris = line.strip().split(",")
for attr in iris[0:4]:
x.append(float(attr))
if iris[4]=="Iris-setosa":
X1.append(x)
y1.append(1)
elif iris[4]=="Iris-versicolor":
X2.append(x)
y2.append(2)
else