鸢尾花数据集代码示例
1.数据集获取-----导入sklearn自带数据集
from sklearn.datasets import load_iris
iris= load_iris()
2.数据基本处理----数据拆分(数据集比较规范, 不存在缺失值/异常值)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split(iris.data,iris.target)
3.特征工程----对数据进行预处理-标准化(训练集+测试集), 不需要进行特征选择/降维
from sklearn.preprocessing import StandardScaler
transfer=StandardScaler()
x_train= transfer.fit_transform(x_train)
x_test= transfer.fit_transform(x_test)
4.模型训练----KNN
from sklearn.neighbors import KNeighborsClassifier
estimator= KNeighborsClassifier(n_neighbors=9)
estimator.fit(x_train, y_train)
5.模型评估
方法一:对比预测值和真实值
y_predict=estimator.predict(x_test)
print(“y_pred:”, y_predict)
print(“\n”)
print(“预测结果和真实结果比较:\n”, y_predict==y_test)
方法二:直接计算准确率
score = estimator.score(x_test, y_test) # 传入测试数据集的特征值和目标值
print(“accuracy:\n”, score)