python分类器鸢尾花怎么写_机器学习之路: python k近邻分类器 鸢尾花分类预测

使用python语言 学习k近邻分类器的api

欢迎来到我的git查看源代码: https://github.com/linyi0604/kaggle

1 from sklearn.datasets import load_iris

2 from sklearn.cross_validation import train_test_split

3 from sklearn.preprocessing import StandardScaler

4 from sklearn.neighbors import KNeighborsClassifier

5 from sklearn.metrics import classification_report

6

7 ‘‘‘

8 k近邻分类器

9 通过数据的分布对预测数据做出决策

10 属于无参数估计的一种

11 非常高的计算复杂度和内存消耗

12 ‘‘‘

13

14 ‘‘‘

15 1 准备数据

16 ‘‘‘

17 # 读取鸢尾花数据集

18 iris = load_iris()

19 # 检查数据规模

20 # print(iris.data.shape) # (150, 4)

21 # 查看数据说明

22 # print(iris.DESCR)

23 ‘‘‘

24 Iris Plants Database

25 ====================

26

27 Notes

28 -----

29 Data Set Characteristics:

30 :Number of Instances: 150 (50 in each of three classes)

31 :Number of Attributes: 4 numeric, predictive attributes and the class

32 :Attribute Information:

33 - sepal length in cm

34 - sepal width in cm

35 - petal length in cm

36 - petal width in cm

37 - class:

38 - Iris-Setosa

39 - Iris-Versicolour

40 - Iris-Virginica

41 :Summary Statistics:

42

43 ============== ==== ==== ======= ===== ====================

44 Min Max Mean SD Class Correlation

45 ============== ==== ==== ======= ===== ====================

46 sepal length: 4.3 7.9 5.84 0.83 0.7826

47 sepal width: 2.0 4.4 3.05 0.43 -0.4194

48 petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)

49 petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)

50 ============== ==== ==== ======= ===== ====================

51

52 :Missing Attribute Values: None

53 :Class Distribution: 33.3% for each of 3 classes.

54 :Creator: R.A. Fisher

55 :Donor: Michael Marshall (MARSHALL%[email protected])

56 :Date: July, 1988

57

58 This is a copy of UCI ML iris datasets.

59 http://archive.ics.uci.edu/ml/datasets/Iris

60

61 The famous Iris database, first used by Sir R.A Fisher

62

63 This is perhaps the best known database to be found in the

64 pattern recognition literature. Fisher‘s paper is a classic in the field and

65 is referenced frequently to this day. (See Duda & Hart, for example.) The

66 data set contains 3 classes of 50 instances each, where each class refers to a

67 type of iris plant. One class is linearly separable from the other 2; the

68 latter are NOT linearly separable from each other.

69

70 References

71 ----------

72 - Fisher,R.A. "The use of multiple measurements in taxonomic problems"

73 Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to

74 Mathematical Statistics" (John Wiley, NY, 1950).

75 - Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.

76 (Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.

77 - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System

78 Structure and Classification Rule for Recognition in Partially Exposed

79 Environments". IEEE Transactions on Pattern Analysis and Machine

80 Intelligence, Vol. PAMI-2, No. 1, 67-71.

81 - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions

82 on Information Theory, May 1972, 431-433.

83 - See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II

84 conceptual clustering system finds 3 classes in the data.

85 - Many, many more ...

86

87 共有150个数据样本

88 均匀分布在3个亚种上

89 每个样本采样4个花瓣、花萼的形状描述

90 ‘‘‘

91

92 ‘‘‘

93 2 划分训练集合和测试集合

94 ‘‘‘

95 x_train, x_test, y_train, y_test = train_test_split(iris.data,

96 iris.target,

97 test_size=0.25,

98 random_state=33)

99

100 ‘‘‘

101 3 k近邻分类器 学习模型和预测

102 ‘‘‘

103 # 训练数据和测试数据进行标准化

104 ss = StandardScaler()

105 x_train = ss.fit_transform(x_train)

106 x_test = ss.transform(x_test)

107

108 # 建立一个k近邻模型对象

109 knc = KNeighborsClassifier()

110 # 输入训练数据进行学习建模

111 knc.fit(x_train, y_train)

112 # 对测试数据进行预测

113 y_predict = knc.predict(x_test)

114

115 ‘‘‘

116 4 模型评估

117 ‘‘‘

118 print("准确率:", knc.score(x_test, y_test))

119 print("其他指标:\n", classification_report(y_test, y_predict, target_names=iris.target_names))

120 ‘‘‘

121 准确率: 0.8947368421052632

122 其他指标:

123 precision recall f1-score support

124

125 setosa 1.00 1.00 1.00 8

126 versicolor 0.73 1.00 0.85 11

127 virginica 1.00 0.79 0.88 19

128

129 avg / total 0.92 0.89 0.90 38

130 ‘‘‘

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值