sklearn iris(鸢尾花)数据集应用

数据背景

由Fisher在1936年整理,包含4个特征(Sepal.Length(花萼长度)、Sepal.Width(花萼宽度)、Petal.Length(花瓣长度)、Petal.Width(花瓣宽度)),特征值都为正浮点数,单位为厘米。目标值为鸢尾花的分类(Iris Setosa(山鸢尾)、Iris Versicolour(杂色鸢尾),Iris Virginica(维吉尼亚鸢尾))。

测试代码

新建lr_riis.py文件,编写代码

# -*- coding:utf-8 -*-
import numpy as np

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

def load_data():
    iris = datasets.load_iris()
    print(iris.keys())
    n_samples, n_features = iris.data.shape
    print((n_samples, n_features))
    print(iris.data[0])
    print(iris.target.shape)
    print(iris.target)
    print(iris.target_names)
    print("feature_names:", iris.feature_names)


def main():
   load_data()

if __name__ == '__main__':
    main()
  •  

运行结果为: 
这里写图片描述

由结果可知: 
iris中有5个key值

iris.data 包含了四个特征值,例如[5.1, 3.5, 1.4, 0.2] 
iris.target为目标值 
iris.feature_names为特征名称

模型预测实践

重新更新下代码

# -*- coding:utf-8 -*-
import numpy as np

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

def load_data():
    # 共150条数据,训练120条,测试30条,进行2,8分进行模型训练
    # 每条数据类型为 x{nbarray} [6.4, 3.1, 5.5, 1.8]
    inputdata = datasets.load_iris()
    # 切分,测试训练2,8分
    x_train, x_test, y_train, y_test = \
        train_test_split(inputdata.data, inputdata.target, test_size = 0.2, random_state=0)
    return x_train, x_test, y_train, y_test

def main():
    # 训练集x ,测试集x,训练集label,测试集label
    x_train, x_test, y_train, y_test = load_data()
    # l2为正则项
    model = LogisticRegression(penalty='l2')
    model.fit(x_train, y_train)

    print "w: ", model.coef_
    print "b: ", model.intercept_
    # 准确率
    print "precision: ", model.score(x_test, y_test)
    print "MSE: ", np.mean((model.predict(x_test) - y_test) ** 2)

if __name__ == '__main__':
    main()
  •  

运行结果为: 
这里写图片描述

在Python中,我们可以使用scikit-learn库(sklearn)来实现K近邻(KNN)算法。以下是基本步骤: 1. **导入所需的库**: ```python from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score ``` 2. **加载鸢尾花数据集**: ```python iris = datasets.load_iris() ``` 3. **预处理数据**:通常我们会标准化特征值,因为KNN算法对于特征尺度敏感。 ```python X = iris.data y = iris.target scaler = StandardScaler() X_scaled = scaler.fit_transform(X) ``` 4. **分割数据集为训练集和测试集**: ```python X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42) ``` 5. **创建并训练KNN分类器**: ```python knn = KNeighborsClassifier(n_neighbors=5) # 选择合适的邻居数 knn.fit(X_train, y_train) ``` 6. **进行预测**: ```python y_pred = knn.predict(X_test) ``` 7. **评估模型性能**: ```python accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy: {accuracy}") ``` 8. **预测未知数据类别**: ```python # 假设我们有新的未标记鸢尾花数据点 new_data = [[...]] # 这里应该填入新的观测值,同样需要经过标准化 new_data_scaled = scaler.transform(new_data) predicted_class = knn.predict(new_data_scaled) print(f"The predicted class for the new data is: {predicted_class}") ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值