tensorflow基本算法(2)：最近邻算法nearest neighbor

本文链接：https://blog.csdn.net/uncle_ll/article/details/82192101

参考维基百科：

在模式识别领域中，最近邻居法（KNN算法，又译K-近邻算法）是一种用于分类和回归的非参数统计方法。在这两种情况下，输入包含特征空间中的k个最接近的训练样本。

在k-NN分类中，输出是一个分类族群。一个对象的分类是由其邻居的“多数表决”确定的，k个最近邻居（k为正整数，通常较小）中最常见的分类决定了赋予该对象的类别。若k = 1，则该对象的类别直接由最近的一个节点赋予。

在k-NN回归中，输出是该对象的属性值。该值是其k个最近邻居的值的平均值。

最近邻居法采用向量空间模型来分类，概念为相同类别的案例，彼此的相似度高，而可以借由计算与已知类别案例之相似度，来评估未知类别案例可能的分类。

K-NN是一种基于实例的学习，或者是局部近似和将所有计算推迟到分类之后的惰性学习。k-近邻算法是所有的机器学习算法中最简单的之一。

无论是分类还是回归，衡量邻居的权重都非常有用，使较近邻居的权重比较远邻居的权重大。例如，一种常见的加权方案是给每个邻居权重赋值为1/ d，其中d是到邻居的距离。

邻居都取自一组已经正确分类（在回归的情况下，指属性值正确）的对象。虽然没要求明确的训练步骤，但这也可以当作是此算法的一个训练样本集。

k-近邻算法的缺点是对数据的局部结构非常敏感。本算法与K-平均算法（另一流行的机器学习技术）没有任何关系，请勿与之混淆。

算法：

训练样本是多维特征空间向量，其中每个训练样本带有一个类别标签。算法的训练阶段只包含存储的特征向量和训练样本的标签。

在分类阶段，k是一个用户定义的常数。一个没有类别标签的向量（查询或测试点）将被归类为最接近该点的k个样本点中最频繁使用的一类。

一般情况下，将欧氏距离作为距离度量，但是这是只适用于连续变量。在文本分类这种离散变量情况下，另一个度量——重叠度量（或海明距离）可以用来作为度量。例如对于基因表达微阵列数据，k-NN也与Pearson和Spearman相关系数结合起来使用。通常情况下，如果运用一些特殊的算法来计算度量的话，k近邻分类精度可显著提高，如运用大间隔最近邻居或者邻里成分分析法。

“多数表决”分类会在类别分布偏斜时出现缺陷。也就是说，出现频率较多的样本将会主导测试点的预测结果，因为他们比较大可能出现在测试点的K邻域而测试点的属性又是通过k邻域内的样本计算出来的。解决这个缺点的方法之一是在进行分类时将样本到k个近邻点的距离考虑进去。k近邻点中每一个的分类（对于回归问题来说，是数值）都乘以与测试点之间距离的成反比的权重。另一种克服偏斜的方式是通过数据表示形式的抽象。例如，在自组织映射（SOM）中，每个节点是相似的点的一个集群的代表（中心），而与它们在原始训练数据的密度无关。K-NN可以应用到SOM中。

参数选择：

如何选择一个最佳的K值取决于数据。一般情况下，在分类时较大的K值能够减小噪声的影响，但会使类别之间的界限变得模糊。一个较好的K值能通过各种启发式技术（见超参数优化）来获取。

噪声和非相关性特征的存在，或特征尺度与它们的重要性不一致会使K近邻算法的准确性严重降低。对于选取和缩放特征来改善分类已经作了很多研究。一个普遍的做法是利用进化算法优化功能扩展，还有一种较普遍的方法是利用训练样本的互信息进行选择特征。

在二元（两类）分类问题中，选取k为奇数有助于避免两个分类平票的情形。在此问题下，选取最佳经验k值的方法是自助法。

总结：

KNN算法主要是选定参数K后，对待测样本进行最近邻对k个参考点进行距离计算，看K个参考点中哪一类占大多数，并将该待测样本划分为该类。

具体步骤：

1）计算测试数据与各个训练数据之间的距离；

2）按照距离的递增关系进行排序；

3）选取距离最小的K个点；

4）确定前K个点所在类别的出现频率；

5）返回前K个点中出现频率最高的类别作为测试数据的预测分类。

tensorflow实现：

以mnist数据集为例：

import numpy as np
import tensorflow as tf

# Import MINST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Xtr, Ytr = mnist.train.next_batch(5000) #5000 for training (nn candidates)
Xte, Yte = mnist.test.next_batch(200) #200 for testing

# tf Graph Input
xtr = tf.placeholder("float", [None, 784])
xte = tf.placeholder("float", [784])

# Nearest Neighbor calculation using L1 Distance
# Calculate L1 Distance
distance = tf.reduce_sum(tf.abs(tf.add(xtr, tf.negative(xte))), reduction_indices=1)
# Prediction: Get min distance index (Nearest neighbor)
pred = tf.argmin(distance, 0)

accuracy = 0.

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Start training
with tf.Session() as sess:
    sess.run(init)

    # loop over test data
    for i in range(len(Xte)):
        # Get nearest neighbor
        nn_index = sess.run(pred, feed_dict={xtr: Xtr, xte: Xte[i, :]})
        # Get nearest neighbor class label and compare it to its true label
        print("Test", i, "Prediction:", np.argmax(Ytr[nn_index]), \
            "True Class:", np.argmax(Yte[i]))
        # Calculate accuracy
        if np.argmax(Ytr[nn_index]) == np.argmax(Yte[i]):
            accuracy += 1./len(Xte)
    print("Done!")
    print("Accuracy:", accuracy)

结果输出：

Test 0 Prediction: 3 True Class: 3
Test 1 Prediction: 9 True Class: 4
Test 2 Prediction: 1 True Class: 1
Test 3 Prediction: 9 True Class: 9
Test 4 Prediction: 6 True Class: 6
Test 5 Prediction: 9 True Class: 9
Test 6 Prediction: 6 True Class: 6
Test 7 Prediction: 9 True Class: 9
Test 8 Prediction: 3 True Class: 3
Test 9 Prediction: 6 True Class: 6
Test 10 Prediction: 2 True Class: 2
Test 11 Prediction: 3 True Class: 3
Test 12 Prediction: 0 True Class: 0
Test 13 Prediction: 9 True Class: 9
Test 14 Prediction: 9 True Class: 9
Test 15 Prediction: 1 True Class: 1
Test 16 Prediction: 1 True Class: 1
Test 17 Prediction: 8 True Class: 8
Test 18 Prediction: 2 True Class: 2
Test 19 Prediction: 9 True Class: 9
Test 20 Prediction: 6 True Class: 6
Test 21 Prediction: 3 True Class: 3
Test 22 Prediction: 1 True Class: 1
Test 23 Prediction: 7 True Class: 7
Test 24 Prediction: 1 True Class: 1
Test 25 Prediction: 3 True Class: 3
Test 26 Prediction: 4 True Class: 4
Test 27 Prediction: 0 True Class: 0
Test 28 Prediction: 3 True Class: 3
Test 29 Prediction: 1 True Class: 1
Test 30 Prediction: 1 True Class: 1
Test 31 Prediction: 4 True Class: 4
Test 32 Prediction: 8 True Class: 8
Test 33 Prediction: 6 True Class: 6
Test 34 Prediction: 1 True Class: 1
Test 35 Prediction: 1 True Class: 1
Test 36 Prediction: 0 True Class: 0
Test 37 Prediction: 8 True Class: 8
Test 38 Prediction: 8 True Class: 4
Test 39 Prediction: 9 True Class: 9
Test 40 Prediction: 7 True Class: 7
Test 41 Prediction: 9 True Class: 7
Test 42 Prediction: 8 True Class: 8
Test 43 Prediction: 7 True Class: 7
Test 44 Prediction: 0 True Class: 0
Test 45 Prediction: 1 True Class: 1
Test 46 Prediction: 7 True Class: 7
Test 47 Prediction: 7 True Class: 2
Test 48 Prediction: 3 True Class: 3
Test 49 Prediction: 2 True Class: 2
Test 50 Prediction: 1 True Class: 1
Test 51 Prediction: 4 True Class: 4
Test 52 Prediction: 1 True Class: 1
Test 53 Prediction: 8 True Class: 8
Test 54 Prediction: 6 True Class: 6
Test 55 Prediction: 2 True Class: 2
Test 56 Prediction: 1 True Class: 1
Test 57 Prediction: 6 True Class: 6
Test 58 Prediction: 6 True Class: 6
Test 59 Prediction: 5 True Class: 5
Test 60 Prediction: 6 True Class: 6
Test 61 Prediction: 7 True Class: 2
Test 62 Prediction: 8 True Class: 8
Test 63 Prediction: 2 True Class: 2
Test 64 Prediction: 7 True Class: 7
Test 65 Prediction: 9 True Class: 9
Test 66 Prediction: 9 True Class: 9
Test 67 Prediction: 7 True Class: 7
Test 68 Prediction: 7 True Class: 0
Test 69 Prediction: 2 True Class: 2
Test 70 Prediction: 5 True Class: 5
Test 71 Prediction: 8 True Class: 8
Test 72 Prediction: 1 True Class: 1
Test 73 Prediction: 3 True Class: 8
Test 74 Prediction: 6 True Class: 6
Test 75 Prediction: 8 True Class: 8
Test 76 Prediction: 4 True Class: 4
Test 77 Prediction: 0 True Class: 0
Test 78 Prediction: 5 True Class: 5
Test 79 Prediction: 7 True Class: 7
Test 80 Prediction: 0 True Class: 0
Test 81 Prediction: 6 True Class: 6
Test 82 Prediction: 9 True Class: 9
Test 83 Prediction: 1 True Class: 1
Test 84 Prediction: 0 True Class: 0
Test 85 Prediction: 3 True Class: 3
Test 86 Prediction: 7 True Class: 7
Test 87 Prediction: 7 True Class: 7
Test 88 Prediction: 6 True Class: 6
Test 89 Prediction: 1 True Class: 1
Test 90 Prediction: 8 True Class: 8
Test 91 Prediction: 7 True Class: 7
Test 92 Prediction: 6 True Class: 6
Test 93 Prediction: 8 True Class: 8
Test 94 Prediction: 9 True Class: 9
Test 95 Prediction: 5 True Class: 5
Test 96 Prediction: 1 True Class: 1
Test 97 Prediction: 6 True Class: 6
Test 98 Prediction: 3 True Class: 3
Test 99 Prediction: 7 True Class: 7
Test 100 Prediction: 7 True Class: 7
Test 101 Prediction: 0 True Class: 0
Test 102 Prediction: 2 True Class: 2
Test 103 Prediction: 7 True Class: 7
Test 104 Prediction: 0 True Class: 0
Test 105 Prediction: 7 True Class: 7
Test 106 Prediction: 0 True Class: 0
Test 107 Prediction: 5 True Class: 3
Test 108 Prediction: 6 True Class: 6
Test 109 Prediction: 8 True Class: 8
Test 110 Prediction: 3 True Class: 3
Test 111 Prediction: 3 True Class: 3
Test 112 Prediction: 7 True Class: 7
Test 113 Prediction: 2 True Class: 2
Test 114 Prediction: 4 True Class: 4
Test 115 Prediction: 9 True Class: 9
Test 116 Prediction: 5 True Class: 5
Test 117 Prediction: 2 True Class: 2
Test 118 Prediction: 7 True Class: 7
Test 119 Prediction: 7 True Class: 7
Test 120 Prediction: 6 True Class: 6
Test 121 Prediction: 1 True Class: 1
Test 122 Prediction: 1 True Class: 1
Test 123 Prediction: 9 True Class: 9
Test 124 Prediction: 5 True Class: 5
Test 125 Prediction: 1 True Class: 1
Test 126 Prediction: 6 True Class: 6
Test 127 Prediction: 9 True Class: 9
Test 128 Prediction: 3 True Class: 3
Test 129 Prediction: 0 True Class: 0
Test 130 Prediction: 0 True Class: 0
Test 131 Prediction: 4 True Class: 4
Test 132 Prediction: 1 True Class: 1
Test 133 Prediction: 3 True Class: 3
Test 134 Prediction: 9 True Class: 9
Test 135 Prediction: 0 True Class: 0
Test 136 Prediction: 4 True Class: 4
Test 137 Prediction: 8 True Class: 8
Test 138 Prediction: 5 True Class: 5
Test 139 Prediction: 0 True Class: 0
Test 140 Prediction: 2 True Class: 2
Test 141 Prediction: 8 True Class: 8
Test 142 Prediction: 6 True Class: 6
Test 143 Prediction: 9 True Class: 9
Test 144 Prediction: 3 True Class: 3
Test 145 Prediction: 8 True Class: 8
Test 146 Prediction: 7 True Class: 7
Test 147 Prediction: 9 True Class: 9
Test 148 Prediction: 0 True Class: 0
Test 149 Prediction: 6 True Class: 6
Test 150 Prediction: 6 True Class: 6
Test 151 Prediction: 3 True Class: 3
Test 152 Prediction: 6 True Class: 6
Test 153 Prediction: 1 True Class: 1
Test 154 Prediction: 1 True Class: 1
Test 155 Prediction: 5 True Class: 5
Test 156 Prediction: 6 True Class: 6
Test 157 Prediction: 1 True Class: 1
Test 158 Prediction: 1 True Class: 1
Test 159 Prediction: 7 True Class: 7
Test 160 Prediction: 1 True Class: 1
Test 161 Prediction: 1 True Class: 1
Test 162 Prediction: 3 True Class: 2
Test 163 Prediction: 1 True Class: 1
Test 164 Prediction: 5 True Class: 5
Test 165 Prediction: 9 True Class: 9
Test 166 Prediction: 2 True Class: 2
Test 167 Prediction: 9 True Class: 9
Test 168 Prediction: 9 True Class: 4
Test 169 Prediction: 7 True Class: 7
Test 170 Prediction: 3 True Class: 3
Test 171 Prediction: 7 True Class: 7
Test 172 Prediction: 1 True Class: 1
Test 173 Prediction: 6 True Class: 6
Test 174 Prediction: 7 True Class: 7
Test 175 Prediction: 4 True Class: 4
Test 176 Prediction: 8 True Class: 8
Test 177 Prediction: 9 True Class: 9
Test 178 Prediction: 9 True Class: 4
Test 179 Prediction: 3 True Class: 3
Test 180 Prediction: 5 True Class: 8
Test 181 Prediction: 8 True Class: 8
Test 182 Prediction: 7 True Class: 7
Test 183 Prediction: 6 True Class: 6
Test 184 Prediction: 3 True Class: 3
Test 185 Prediction: 5 True Class: 5
Test 186 Prediction: 1 True Class: 3
Test 187 Prediction: 9 True Class: 9
Test 188 Prediction: 0 True Class: 0
Test 189 Prediction: 1 True Class: 1
Test 190 Prediction: 7 True Class: 7
Test 191 Prediction: 7 True Class: 7
Test 192 Prediction: 5 True Class: 5
Test 193 Prediction: 0 True Class: 0
Test 194 Prediction: 1 True Class: 1
Test 195 Prediction: 5 True Class: 5
Test 196 Prediction: 3 True Class: 3
Test 197 Prediction: 1 True Class: 1
Test 198 Prediction: 8 True Class: 8
Test 199 Prediction: 1 True Class: 6
Done!
Accuracy: 0.9300000000000007