SS00001.Machinelearning——|Arithmetic&Machine.v01|——|TensorFlow：监督学习算法.v01|

yanqi_vip

已于 2022-05-24 14:49:28 修改

阅读量530

点赞数

分类专栏： bigdatav030——机器学习文章标签： tensorflow 算法学习机器学习 python

于 2022-05-23 14:07:00 首次发布

不予转载

本文链接：https://blog.csdn.net/yanqi_vip/article/details/124938750

版权

bigdatav030——机器学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

一、课程大纲

### --- TensorFlow 监督学习算法
~~~     # KNN(K最近邻算法)
~~~     导入相关包
~~~     数据处理
~~~     划分数据集
~~~     建模

### --- 线性回归
~~~     通过TensorFlow 实现线性回归
~~~     通过TensorFlow 预测房价

### --- 逻辑回归(分类)
~~~     通过TensorFlow 实现对图片进行分类

二、KNN(K最近邻算法)

### --- TensorFlow 监督学习算法

~~~     # KNN(K最近邻算法)
~~~     KNN，根据该 k 个邻居的标签为该点分配标签，其中 K 是算法的参数。
~~~     该算法特点是查看数据集中新点与所有其他数据点之间的所有距离。
~~~     # 我们将使用由三种类型的鸢尾花组成的著名数据集： 
~~~     iris setosa ， iris virginica 和irisversicolor 。 
~~~     对于这些标签中特征都是花瓣长度，花瓣宽度，萼片长度和萼片宽度。
~~~     # 150 个数据点(每个数据点都包含前面提到的四个测量值)和 150 个相关标签。 
~~~     我们将它们分为120 个训练数据点和 30 个测试数据点。

### --- 导入相关包

import numpy as np
from sklearn import datasets
import tensorflow as tf
iris = datasets.load_iris()
x = np.array([i for i in iris.data]) # 特征
y = np.array(iris.target) # 标签
x.shape, y.shape

### --- 数据处理
~~~     # 将花标签放在列表中以备后用

flower_labels = ["iris setosa", "iris virginica", "iris versicolor"]

~~~     # np.eye 返回一个二维数组，在对角线上有一个，默认为主对角线。 
~~~     # 然后用y 进行索引为我们提供了所需的y 独热编码：
~~~     # one-hot编码,np.eye将数组转化为one-hot编码

y = np.eye(len(set(y)))[y]
y[0:10]

~~~     # 归一化处理 (转化为0-1之间的数据)

x = (x - x.min(axis=0)) / (x.max(axis=0) - x.min(axis=0))

### --- 划分数据集

~~~     # 设置随机种子，保证每次的运行结果一致
np.random.seed(420)
split = 0.8 # 2:8进行划分
~~~     # 一共150个行 120个样本作为训练集 30样本作为测试集
train_indices = np.random.choice(len(x), round(len(x) * split), replace=False)
test_indices =np.array(list(set(range(len(x))) - set(train_indices)))

~~~     # 划分好的数据集
Xtrain = x[train_indices]
Xtest = x[test_indices]
Ytrain = y[train_indices]
Ytest = y[test_indices]

~~~     # 设置K值
K = 5

### --- 建模

~~~     # 使用到的TF2.X的API：
~~~     distances 包含的是我们 120 个训练点与 30 个测试点之间的所有(曼哈顿)距离；
~~~     也就是说，由30 行乘 120 列组成的数组。

~~~     说明：x[1], x[2]的两个数据点向量的值之差的绝对值； 
~~~     即|x[1] - x[2]| 的两个数据点向量的值之差的绝对值。

~~~     tf.expand_dims(input, axis) : 在Xtest 上增加了一个额外的维数，
~~~     以便在减法发生之前，可以通过广播使两个数组扩展以使其与减法兼容。
~~~     由于x 具有四个特征，并且reduce_sum 超过axis=2 ，
~~~     因此结果是我们 30 个测试点和 120 个训练点之间的距离的 30 行。

~~~     tf.reduce_sum : 表示是用来求多维tensor的元素之和的方法，
~~~     即计算一个张量的各个维度上元素的总和。

~~~     tf.subtract : 表示两个矩阵相减。
~~~     tf.nn.top_k(input,k=1,sorted=True,name=None) : 
~~~     找到输入的张量的最后的一个维度的最大的k个值和它的索引。

~~~     tf.gather : 根据索引收集数据，输出张量维度和输入张量维度相同。
~~~     tf.argmax(input,axis) : 根据axis取值的不同返回每行或者每列最大值的索引。

def prediction(Xtrain,Xtest,Ytrain,k):
# 曼哈顿距离 两个向量点之差的绝对值
distances = tf.reduce_sum(tf.abs(tf.subtract(Xtrain, tf.expand_dims(Xtest,
axis=1))), axis=2)
# 使用tf.nn.top_k返回 KNN 的索引作为其第二个返回值。
# 该函数的第一个返回值是距离本身的值，我们不需要，因此我们将其‘删掉’(带下划线)
_, top_k_indices = tf.nn.top_k(tf.negative(distances), k=k)
# 使用tf.gather，即使用索引作为切片，找到与我们最近的邻居的索引相关联的所有训练标签
top_k_labels = tf.gather(Ytrain, top_k_indices)
# 对预测进行汇总
predictions_sum = tf.reduce_sum(top_k_labels, axis=1)
# 通过找到最大值的索引来返回预测的标签
pred = tf.argmax(predictions_sum, axis=1)
# 返回预测结果 pred
return pred

~~~     # 将预测值与实际值进行比对

i, total = 0 , 0
results = zip(prediction(Xtrain,Xtest,Ytrain,k), Ytest)
print("Predicted Actual")
print("--------- ------")
for pred, actual in results:
print(i, flower_labels[pred.numpy()],"\t",flower_labels[np.argmax(actual)] )
if pred.numpy() == np.argmax(actual):
total += 1
i += 1
~~~     # 准确率
accuracy = round(total/len(test_x),4)*100
print("Accuracy = ",accuracy,"%")

三、从结果上看，有两个分类错误，准确率为93%。