一、Logistic回归的认知与应用场景
Logistic回归为概率型非线性回归模型,是研究二分类观察结果与一些影响因素之间关系的
一种多变量分析方法。通常的问题是,研究某些因素条件下某个结果是否发生,比如医学中根据病人的一些症状
来判断它是否患有某种病。
二、LR分类器
LR分类器,即Logistic Regression Classifier。
在分类情形下,经过学习后的LR分类器是一组权值,当测试样本的数据输入时,这组权值与测试数据按
照线性加和得到 ,这里是每个样本的个特征。
按照sigmoid函数的形式求出 ,其中sigmoid函数的定义域为,值域为,因此最基本的LR分类器适合对两类目标进行分类。
所以Logistic回归最关键的问题就是研究如何求得这组权值。这个问题是用极大似然估计来做的。
三、Logistic回归模型
考虑具有个独立变量的向量,设条件慨率为
根据观测量相对于某事件发生的概率。那么Logistic回归模型可以表示为
这里称为Logistic函数。其中。
那么在条件下不发生的概率为
所以事件发生与不发生的概率之比为:
这个比值称为事件的发生比(the odds of experiencing an event),简记为odds。
小结:
一般来说,回归不用在分类问题上,因为回归是连续型模型,而且受噪声影响比较大。
如果非要应用在分类问题上,可以使用logistic回归。
logistic回归本质上是线性回归,只是在特征到结果的映射中加入了一层函数映射,
即先把特征线性求和,然后使用函数g(z)将做为假设函数来预测。g(z)可以将连续值映射到0和1上。
logistic回归的假设函数如下所示,线性回归假设函数只是 。
logistic回归用来分类0/1问题,也就是预测结果属于0或者1的二值分类问题。
这里假设了二值满足伯努利分布(0/1分布或两点分布),也就是
四、logistic回归应用案例
(1)sklearn中对LogisticRegressionCV函数的解析
(2)代码如下:
文件链接如下:链接:https://pan.baidu.com/s/1dEWUEhb 密码:bm1p
-
#!/usr/bin/env python
-
# -*- coding:utf-8 -*-
-
# Author:ZhengzhengLiu
-
-
#乳腺癌分类案例
-
-
import sklearn
-
from sklearn.linear_model
import LogisticRegressionCV,LinearRegression
-
from sklearn.model_selection
import train_test_split
-
from sklearn.preprocessing
import StandardScaler
-
from sklearn.linear_model.coordinate_descent
import ConvergenceWarning
-
import numpy
as np
-
import pandas
as pd
-
import matplotlib
as mpl
-
import matplotlib.pyplot
as plt
-
import warnings
-
-
#解决中文显示问题
-
mpl.rcParams[
"font.sans-serif"] = [
u"SimHei"]
-
mpl.rcParams[
"axes.unicode_minus"] =
False
-
-
#拦截异常
-
warnings.filterwarnings(action=
'ignore',category=ConvergenceWarning)
-
-
#导入数据并对异常数据进行清除
-
path =
"datas/breast-cancer-wisconsin.data"
-
names = [
"id",
"Clump Thickness",
"Uniformity of Cell Size",
"Uniformity of Cell Shape"
-
,
"Marginal Adhesion",
"Single Epithelial Cell Size",
"Bare Nuclei",
"Bland Chromatin"
-
,
"Normal Nucleoli",
"Mitoses",
"Class"]
-
-
df = pd.read_csv(path,header=
None,names=names)
-
-
datas = df.replace(
"?",np.nan).dropna(how=
"any")
#只要列中有nan值,进行行删除操作
-
#print(datas.head()) #默认显示前五行
-
-
#数据提取与数据分割
-
X = datas[names[
1:
10]]
-
Y = datas[names[
10]]
-
-
#划分训练集与测试集
-
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=
0.1,random_state=
0)
-
-
#对数据的训练集进行标准化
-
ss = StandardScaler()
-
X_train = ss.fit_transform(X_train)
#先拟合数据在进行标准化
-
-
#构建并训练模型
-
## multi_class:分类方式选择参数,有"ovr(默认)"和"multinomial"两个值可选择,在二元逻辑回归中无区别
-
## cv:几折交叉验证
-
## solver:优化算法选择参数,当penalty为"l1"时,参数只能是"liblinear(坐标轴下降法)"
-
## "lbfgs"和"cg"都是关于目标函数的二阶泰勒展开
-
## 当penalty为"l2"时,参数可以是"lbfgs(拟牛顿法)","newton_cg(牛顿法变种)","seg(minibactch随机平均梯度下降)"
-
## 维度<10000时,选择"lbfgs"法,维度>10000时,选择"cs"法比较好,显卡计算的时候,lbfgs"和"cs"都比"seg"快
-
## penalty:正则化选择参数,用于解决过拟合,可选"l1","l2"
-
## tol:当目标函数下降到该值是就停止,叫:容忍度,防止计算的过多
-
lr = LogisticRegressionCV(multi_class=
"ovr",fit_intercept=
True,Cs=np.logspace(
-2,
2,
20),cv=
2,penalty=
"l2",solver=
"lbfgs",tol=
0.01)
-
re = lr.fit(X_train,Y_train)
-
-
#模型效果获取
-
r = re.score(X_train,Y_train)
-
print(
"R值(准确率):",r)
-
print(
"参数:",re.coef_)
-
print(
"截距:",re.intercept_)
-
print(
"稀疏化特征比率:%.2f%%" %(np.mean(lr.coef_.ravel()==
0)*
100))
-
print(
"=========sigmoid函数转化的值,即:概率p=========")
-
print(re.predict_proba(X_test))
#sigmoid函数转化的值,即:概率p
-
-
#模型的保存与持久化
-
from sklearn.externals
import joblib
-
joblib.dump(ss,
"logistic_ss.model")
#将标准化模型保存
-
joblib.dump(lr,
"logistic_lr.model")
#将训练后的线性模型保存
-
joblib.load(
"logistic_ss.model")
#加载模型,会保存该model文件
-
joblib.load(
"logistic_lr.model")
-
-
#预测
-
X_test = ss.transform(X_test)
#数据标准化
-
Y_predict = lr.predict(X_test)
#预测
-
-
#画图对预测值和实际值进行比较
-
x = range(len(X_test))
-
plt.figure(figsize=(
14,
7),facecolor=
"w")
-
plt.ylim(
0,
6)
-
plt.plot(x,Y_test,
"ro",markersize=
8,zorder=
3,label=
u"真实值")
-
plt.plot(x,Y_predict,
"go",markersize=
14,zorder=
2,label=
u"预测值,$R^2$=%.3f" %lr.score(X_test,Y_test))
-
plt.legend(loc=
"upper left")
-
plt.xlabel(
u"数据编号",fontsize=
18)
-
plt.ylabel(
u"乳癌类型",fontsize=
18)
-
plt.title(
u"Logistic算法对数据进行分类",fontsize=
20)
-
plt.savefig(
"Logistic算法对数据进行分类.png")
-
plt.show()
-
-
print(
"=============Y_test==============")
-
print(Y_test.ravel())
-
print(
"============Y_predict============")
-
print(Y_predict)
-
-
#运行结果:
-
R值(准确率):
0.970684039088
-
参数: [[
1.3926311
0.17397478
0.65749877
0.8929026
0.36507062
1.36092964
-
0.91444624
0.63198866
0.75459326]]
-
截距: [
-1.02717163]
-
稀疏化特征比率:
0.00%
-
=========sigmoid函数转化的值,即:概率p=========
-
[[
6.61838068e-06
9.99993382e-01]
-
[
3.78575185e-05
9.99962142e-01]
-
[
2.44249065e-15
1.00000000e+00]
-
[
0.00000000e+00
1.00000000e+00]
-
[
1.52850624e-03
9.98471494e-01]
-
[
6.67061684e-05
9.99933294e-01]
-
[
6.75536843e-07
9.99999324e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
2.43117004e-05
9.99975688e-01]
-
[
6.13092842e-04
9.99386907e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
2.00330728e-06
9.99997997e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
3.78575185e-05
9.99962142e-01]
-
[
4.65824155e-08
9.99999953e-01]
-
[
5.47788703e-10
9.99999999e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
0.00000000e+00
1.00000000e+00]
-
[
0.00000000e+00
1.00000000e+00]
-
[
6.27260778e-07
9.99999373e-01]
-
[
3.78575185e-05
9.99962142e-01]
-
[
3.85098865e-06
9.99996149e-01]
-
[
1.80189197e-12
1.00000000e+00]
-
[
9.44640398e-05
9.99905536e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
0.00000000e+00
1.00000000e+00]
-
[
4.11688915e-06
9.99995883e-01]
-
[
1.85886872e-05
9.99981411e-01]
-
[
5.83016713e-06
9.99994170e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
1.52850624e-03
9.98471494e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
0.00000000e+00
1.00000000e+00]
-
[
1.51713085e-05
9.99984829e-01]
-
[
2.34685008e-05
9.99976531e-01]
-
[
1.51713085e-05
9.99984829e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
0.00000000e+00
1.00000000e+00]
-
[
2.34685008e-05
9.99976531e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
9.97563915e-07
9.99999002e-01]
-
[
1.70686321e-07
9.99999829e-01]
-
[
1.38382134e-04
9.99861618e-01]
-
[
1.36080718e-04
9.99863919e-01]
-
[
1.52850624e-03
9.98471494e-01]
-
[
1.68154251e-05
9.99983185e-01]
-
[
6.66097483e-04
9.99333903e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
9.77502258e-07
9.99999022e-01]
-
[
5.83016713e-06
9.99994170e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
4.09496721e-06
9.99995905e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
1.37819117e-06
9.99998622e-01]
-
[
6.27260778e-07
9.99999373e-01]
-
[
4.52734741e-07
9.99999547e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
8.88178420e-16
1.00000000e+00]
-
[
1.06976766e-08
9.99999989e-01]
-
[
0.00000000e+00
1.00000000e+00]
-
[
2.45780192e-04
9.99754220e-01]
-
[
3.92389040e-04
9.99607611e-01]
-
[
6.10681985e-05
9.99938932e-01]
-
[
9.44640398e-05
9.99905536e-01]
-
[
1.51713085e-05
9.99984829e-01]
-
[
2.45780192e-04
9.99754220e-01]
-
[
2.45780192e-04
9.99754220e-01]
-
[
1.51713085e-05
9.99984829e-01]
-
[
0.00000000e+00
1.00000000e+00]]
-
=============Y_test==============
-
[
2
2
4
4
2
2
2
4
2
2
4
2
4
2
2
2
4
4
4
2
2
2
4
2
4
4
2
2
2
4
2
4
4
2
2
2
4
-
4
2
4
2
2
2
2
2
2
2
4
2
2
4
2
4
2
2
2
4
2
2
4
2
2
2
2
2
2
2
2
4]
-
============Y_predict============
-
[
2
2
4
4
2
2
2
4
2
2
4
2
4
2
2
2
4
4
4
2
2
2
4
2
4
4
2
2
2
4
2
4
4
2
2
2
4
-
4
2
4
2
2
2
2
2
2
2
4
2
2
4
2
4
2
2
2
4
4
2
4
2
2
2
2
2
2
2
2
4]
五、softmax回归——多分类问题
(1)softmax回归定义与概述
(2)softmax回归案例分析——葡萄酒质量预测模型
-
#!/usr/bin/env python
-
# -*- coding:utf-8 -*-
-
# Author:ZhengzhengLiu
-
-
#葡萄酒质量预测模型
-
-
import numpy
as np
-
import matplotlib
as mpl
-
import matplotlib.pyplot
as plt
-
import pandas
as pd
-
import warnings
-
import sklearn
-
from sklearn.linear_model
import LogisticRegressionCV
-
from sklearn.linear_model.coordinate_descent
import ConvergenceWarning
-
from sklearn.model_selection
import train_test_split
-
from sklearn.preprocessing
import StandardScaler
-
from sklearn.preprocessing
import MinMaxScaler
-
from sklearn.preprocessing
import label_binarize
-
from sklearn
import metrics
-
-
#解决中文显示问题
-
mpl.rcParams[
'font.sans-serif']=[
u'simHei']
-
mpl.rcParams[
'axes.unicode_minus']=
False
-
-
#拦截异常
-
warnings.filterwarnings(action =
'ignore', category=ConvergenceWarning)
-
-
#导入数据
-
path1 =
"datas/winequality-red.csv"
-
df1 = pd.read_csv(path1, sep=
";")
-
df1[
'type'] =
1
-
-
path2 =
"datas/winequality-white.csv"
-
df2 = pd.read_csv(path2, sep=
";")
-
df2[
'type'] =
2
-
-
df = pd.concat([df1,df2], axis=
0)
-
-
names = [
"fixed acidity",
"volatile acidity",
"citric acid",
-
"residual sugar",
"chlorides",
"free sulfur dioxide",
-
"total sulfur dioxide",
"density",
"pH",
"sulphates",
-
"alcohol",
"type"]
-
quality =
"quality"
-
#print(df.head(5))
-
-
#对异常数据进行清除
-
new_df = df.replace(
'?', np.nan)
-
datas = new_df.dropna(how =
'any')
-
print (
"原始数据条数:%d;异常数据处理后数据条数:%d;异常数据条数:%d" % (len(df), len(datas), len(df) - len(datas)))
-
-
#数据提取与数据分割
-
X = datas[names]
-
Y = datas[quality]
-
-
#划分训练集与测试集
-
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=
0.25,random_state=
0)
-
print (
"训练数据条数:%d;数据特征个数:%d;测试数据条数:%d" % (X_train.shape[
0], X_train.shape[
1], X_test.shape[
0]))
-
-
#对数据的训练集进行标准化
-
mms = MinMaxScaler()
-
X_train = mms.fit_transform(X_train)
-
-
#构建并训练模型
-
lr = LogisticRegressionCV(fit_intercept=
True, Cs=np.logspace(
-5,
1,
100),
-
multi_class=
'multinomial', penalty=
'l2', solver=
'lbfgs')
-
lr.fit(X_train, Y_train)
-
-
##模型效果获取
-
r = lr.score(X_train, Y_train)
-
print (
"R值:", r)
-
print (
"特征稀疏化比率:%.2f%%" % (np.mean(lr.coef_.ravel() ==
0) *
100))
-
print (
"参数:",lr.coef_)
-
print (
"截距:",lr.intercept_)
-
-
#预测
-
X_test = mms.transform(X_test)
-
Y_predict = lr.predict(X_test)
-
-
#画图对预测值和实际值进行比较
-
x_len = range(len(X_test))
-
plt.figure(figsize=(
14,
7), facecolor=
'w')
-
plt.ylim(
-1,
11)
-
plt.plot(x_len, Y_test,
'ro',markersize =
8, zorder=
3, label=
u'真实值')
-
plt.plot(x_len, Y_predict,
'go', markersize =
12, zorder=
2, label=
u'预测值,$R^2$=%.3f' % lr.score(X_train, Y_train))
-
plt.legend(loc =
'upper left')
-
plt.xlabel(
u'数据编号', fontsize=
18)
-
plt.ylabel(
u'葡萄酒质量', fontsize=
18)
-
plt.title(
u'葡萄酒质量预测统计', fontsize=
20)
-
plt.savefig(
"葡萄酒质量预测统计.png")
-
plt.show()
-
-
#运行结果:
-
原始数据条数:
6497;异常数据处理后数据条数:
6497;异常数据条数:
0
-
训练数据条数:
4872;数据特征个数:
12;测试数据条数:
1625
-
R值:
0.549466338259
-
特征稀疏化比率:
0.00%
-
参数: [[
0.97934119
2.16608604
-0.41710039
-0.49330657
0.90621136
1.44813439
-
0.75463562
0.2311527
0.01015772
-0.69598672
-0.71473577
-0.2907567 ]
-
[
0.62487587
5.11612885
-0.38168837
-2.16145905
1.21149753
-3.71928146
-
-1.45623362
1.34125165
0.33725355
-0.86655787
-2.7469681
2.02850838]
-
[
-1.73828753
1.96024965
0.48775556
-1.91223567
0.64365084
-1.67821019
-
2.20322661
1.49086179
-1.36192671
-2.2337436
-5.01452059
-0.75501299]
-
[
-1.19975858
-2.60860814
-0.34557812
0.17579494
-0.04388969
0.81453743
-
-0.28250319
0.51716692
-0.67756552
0.18480087
0.01838834
-0.71392084]
-
[
1.15641271
-4.6636028
-0.30902483
2.21225522
-2.00298042
1.66691445
-
-1.02831849
-2.15017982
0.80529532
2.68270545
3.36326129
-0.73635195]
-
[
-0.07892353
-1.82724304
0.69405191
2.07681409
-0.6247279
1.49244742
-
-0.16115782
-1.3671237
0.72694885
1.06878382
4.68718155
0.04669067]
-
[
0.25633987
-0.14301056
0.27158425
0.10213705
-0.08976172
-0.02454203
-
-0.02964911
-0.06312954
0.15983679
-0.14000195
0.40739327
0.42084343]]
-
截距: [
-2.34176729
-1.1649153
4.91027564
4.3206539
1.30164164
-2.25841567
-
-4.76747291]
六、分类问题综合案例——鸢尾花分类问题、ROC/AUC
(1)知识点——python自带内置函数zip()函数
(2)知识点——sklearn.model_selection.train_test_split——随机划分训练集与测试集
(3)知识点——ROC曲线
详细链接地址:http://blog.csdn.net/abcjennifer/article/details/7359370
(4)知识点——所涉及到的几种 sklearn 的二值化编码函数:
OneHotEncoder(), LabelEncoder(), LabelBinarizer(), MultiLabelBinarizer()
详细链接地址为:http://blog.csdn.net/gao1440156051/article/details/55096630
案例代码:
-
#!/usr/bin/env python
-
# -*- coding:utf-8 -*-
-
# Author:ZhengzhengLiu
-
-
#分类综合问题——鸢尾花分类案例(ROC/AUC)
-
-
import numpy
as np
-
import matplotlib
as mpl
-
import matplotlib.pyplot
as plt
-
import pandas
as pd
-
import warnings
-
import sklearn
-
from sklearn.linear_model
import LogisticRegressionCV
-
from sklearn.linear_model.coordinate_descent
import ConvergenceWarning
-
from sklearn.model_selection
import train_test_split
-
from sklearn.preprocessing
import StandardScaler
-
from sklearn.neighbors
import KNeighborsClassifier
-
from sklearn.preprocessing
import label_binarize
-
from sklearn
import metrics
-
-
#解决中文显示问题
-
mpl.rcParams[
'font.sans-serif']=[
u'simHei']
-
mpl.rcParams[
'axes.unicode_minus']=
False
-
-
#拦截异常
-
warnings.filterwarnings(action =
'ignore', category=ConvergenceWarning)
-
-
#导入数据
-
path =
"datas/iris.data"
-
names = [
'sepal length',
'sepal width',
'petal length',
'petal width',
'cla']
-
df = pd.read_csv(path,header=
None,names=names)
-
print(df[
'cla'].value_counts())
-
print(df.head())
-
-
#编码函数
-
def parseRecord(record):
#record是数据集
-
result = []
-
# zip() 函数接受一系列可迭代的对象作为参数,将对象中对应的元素按顺序组合成一个tuple,
-
# 每个tuple中包含的是原有序列中对应序号位置的元素,然后返回由这些tuples组成的list。
-
r = zip(names,record)
-
for name,v
in r:
-
if name ==
"cla":
-
if v ==
"Iris-setosa":
-
result.append(
1)
-
elif v ==
"Iris-versicolor":
-
result.append(
2)
-
elif v ==
"Iris-virginica":
-
result.append(
3)
-
else:
-
result.append(np.nan)
-
else:
-
result.append(float(v))
-
return result
-
-
#数据转换为数字以及分割
-
#数据转换
-
datas = df.apply(
lambda r:parseRecord(r),axis=
1)
-
print(datas.head())
-
#异常数据删除
-
datas = datas.dropna(how=
"any")
-
#数据分割
-
X = datas[names[
0:
-1]]
-
Y = datas[names[
-1]]
-
-
#划分训练集与测试集
-
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=
0.4,random_state=
0)
-
print(
"原始数据条数:%d;训练数据条数:%d;特征个数:%d;测试样本条数:%d" %(len(X),len(X_train),X_train.shape[
1],len(X_test)))
-
-
#对数据集进行标准化
-
ss = StandardScaler()
-
X_train = ss.fit_transform(X_train)
-
X_test = ss.transform(X_test)
-
-
#构建并训练模型
-
lr = LogisticRegressionCV(Cs=np.logspace(
-4,
1,
50),cv=
3,fit_intercept=
True,penalty=
"l2",
-
solver=
"lbfgs",tol=
0.01,multi_class=
"multinomial")
-
lr.fit(X_train,Y_train)
-
-
#模型效果获取
-
#将测试集标签数据用二值化编码的方式转换为矩阵
-
y_test_hot = label_binarize(Y_test,classes=(
1,
2,
3))
-
#得到预测的损失值
-
lr_y_score = lr.decision_function(X_test)
-
#计算ROC的值,lr_threasholds为阈值
-
lr_fpr,lr_tpr,lr_threasholds = metrics.roc_curve(y_test_hot.ravel(),lr_y_score.ravel())
-
#计算AUC值
-
lr_auc = metrics.auc(lr_fpr,lr_tpr)
-
print(
"Logistic算法R值:",lr.score(X_train,Y_train))
-
print(
"Logistic算法AUC值:",lr_auc)
-
-
#模型预测
-
lr_y_predict = lr.predict(X_test)
-
-
#画图对预测值和实际值进行比较
-
plt.figure(figsize=(
8,
6),facecolor=
"w")
-
plt.plot(lr_fpr,lr_tpr,c=
"r",lw=
2,label=
u"Logistic算法,AUC=%.3f" %lr_auc)
-
plt.plot((
0,
1),(
0,
1),c=
'#a0a0a0',lw=
2,ls=
'--')
-
plt.xlim(
-0.01,
1.02)
-
plt.ylim(
-0.01,
1.02)
-
plt.xticks(np.arange(
0,
1.1,
0.1))
-
plt.yticks(np.arange(
0,
1.1,
0.1))
-
plt.xlabel(
'False Positive Rate(FPR)', fontsize=
16)
-
plt.ylabel(
'True Positive Rate(TPR)', fontsize=
16)
-
plt.grid(b=
True, ls=
':')
-
plt.legend(loc=
'lower right', fancybox=
True, framealpha=
0.8, fontsize=
12)
-
plt.title(
u'鸢尾花数据Logistic算法的ROC/AUC', fontsize=
18)
-
plt.savefig(
"鸢尾花数据Logistic算法的ROC和AUC.png")
-
plt.show()
-
-
len_x_test = range(len(X_test))
-
plt.figure(figsize=(
12,
9),facecolor=
"w")
-
plt.ylim(
0.5,
3.5)
-
plt.plot(len_x_test,Y_test,
"ro",markersize=
6,zorder=
3,label=
u"真实值")
-
plt.plot(len_x_test,lr_y_predict,
"go",markersize=
10,zorder=
2,label=
u"Logis算法预测值,$R^2=%.3f$" %lr.score(X_test,Y_test))
-
plt.legend(loc =
'lower right')
-
plt.xlabel(
u'数据编号', fontsize=
18)
-
plt.ylabel(
u'种类', fontsize=
18)
-
plt.title(
u'鸢尾花数据分类', fontsize=
20)
-
plt.savefig(
"鸢尾花数据分类.png")
-
plt.show()
-
-
#运行结果:
-
Iris-versicolor
50
-
Iris-setosa
50
-
Iris-virginica
50
-
Name: cla, dtype: int64
-
sepal length sepal width petal length petal width cla
-
0
5.1
3.5
1.4
0.2 Iris-setosa
-
1
4.9
3.0
1.4
0.2 Iris-setosa
-
2
4.7
3.2
1.3
0.2 Iris-setosa
-
3
4.6
3.1
1.5
0.2 Iris-setosa
-
4
5.0
3.6
1.4
0.2 Iris-setosa
-
sepal length sepal width petal length petal width cla
-
0
5.1
3.5
1.4
0.2
1.0
-
1
4.9
3.0
1.4
0.2
1.0
-
2
4.7
3.2
1.3
0.2
1.0
-
3
4.6
3.1
1.5
0.2
1.0
-
4
5.0
3.6
1.4
0.2
1.0
-
原始数据条数:
150;训练数据条数:
90;特征个数:
4;测试样本条数:
60
-
Logistic算法R值:
0.977777777778
-
Logistic算法AUC值:
0.926944444444