Tensorflow从头学(一)——支持向量机(2)

最新推荐文章于 2023-01-01 17:04:21 发布

千殇_不哭

最新推荐文章于 2023-01-01 17:04:21 发布

阅读量228

点赞数

分类专栏： tensoflow SVM

本文链接：https://blog.csdn.net/breakMiracle/article/details/84963456

版权

tensoflow 同时被 2 个专栏收录

2 篇文章 1 订阅

订阅专栏

SVM

2 篇文章 0 订阅

订阅专栏

Tensorflow从头学(一)——支持向量机(2)

本文所有内容是参考《Tensorflow机器学习实战指南》

非线性的支持向量机模型

本文将接上文继续非线性的支持向量机。当数据集中的点无法使用一个平面分割开的时候，我们就需要构造非线性的函数来解决问题，由此我们引入了核函数的概念。

为什么要引入核函数

如果数据的点不能被线性分割开，那么函数 $A^Tx + b = 0$ 将无法将数据分开。由此我们定义函数 $\phi(x)$ 表示将原来的x映射到更高的维度。说一个简单的映射：
$\phi([x_1,x_2])=[x_1^2, \quad 2x_1x_2,\quad x_2^2]$
为什么要这这样一个映射呢，当然是为了丰富特征，这样一个过程相当于特征工程中的特征融合的操作。新特征 $x_1x_2$ 就是两个特征的一个交叉。问题来了，既然能够丰富特征，那么我们就可以无限的映射下去，使得模型的效果更优，然而，随着特征维度的增加，相应的我们需要增加A的维度。于是，我们面临一个选择：增加维度丰富特征会导致计算复杂度增加，不增加维度又难以使模型更加优良。
如果有一种计算工具可以将高维空间的计算结果在低维空间完成，那么就完美了。核函数就是这样的工具：
$k(x_1,x_2)=\phi(x_1)^T\phi(x_2)$
其中 $\phi(x)$ 是一种映射将 $x$ 从低维映射到高维。核函数 $k(x_1,x_2)$ 表示只要知道两个向量 $x_1,x_2$ ，就可以通过核函数得到他们映射到高维空间的向量乘积。那么什么样的函数能满足这个条件呢？
线性核函数( $\phi(x)=x$ )： $k(x_1,x_2)=x_1^Tx_2$
多项式核函数( $\phi(x)=$ ): $k(x_1,x_2)=(x_1^Tx_2)^d$
高斯核函数( $\phi(x)=$ ): $k(x_1,x_2)=exp(\frac {||x_1-x_2||^2}{2\sigma^2})$

引入非线性的SVM模型

我们定义特征空间(即是说经过 $\phi$ 映射候的空间)的超平面：
$f(x)=A^T\phi(x)+b$
注意：这里的 $A$ 的维度和高维特征空间的维度相同，具体的维度和我们选择的核函数(虽然核函数没有在公式中出现)有关。

参照线性的SVM模型，最大间隔为 $\frac {1}{||A||^2}$ 。为了最大化间隔，所以我们的目标函数为：
$\min_{A,b}\frac{1}{2}||A||^2\quad\quad s.t. \sum_{i=1}^ny_i(A^T\phi(x_i)+ b)\geq1,i=1,2,3,...,n$
为了解决上述，要引入对偶问题的概念，具体的细节请参考其他资料，我们直接给出它的对偶问题：
$\max_{\alpha}\sum_{i=1}^n\alpha_i-\frac{1}{2}\sum_{i=1}^n\sum_{j=1}^n\alpha_i\alpha_jy_iy_j\phi(x_i)^T\phi(x_j)\quad\quad s.t.\sum_{i=1}^n\alpha_iy_i=0, \alpha_i\geq0,i=1,2,...,m$
使用核函数代替上式中的高维空间的向量乘法，可得
$\max_{\alpha}\sum_{i=1}^n\alpha_i-\frac{1}{2}\sum_{i=1}^n\sum_{j=1}^n\alpha_i\alpha_jy_iy_jk(x_i,x_j)\quad\quad s.t.\sum_{i=1}^n\alpha_iy_i=0, \alpha_i\geq0,i=1,2,...,m$
此时满足
$A=\sum_{i=1}^n\alpha_iy_i\phi(x_i)^T$
其中 $\alpha$ 是引入的参数，也是待优化的参数，它的维度和样本维度一样，为 $n$ 。其中 $k(x_i,x_j)$ 是我们要使用的核函数。最大化上式求出 $\alpha$ 即是求得模型的解。
将求解结果带回原方程：
$f(x)=A^T\phi(x)+b=\sum_{i=1}^n\alpha_iy_i\phi(x_i)^T\phi(x)+b=\sum_{i=1}^n\alpha_iy_ik(x_i,x) + b$
所以，整个过程中，我们并没有涉及A的计算，因为我们不需要知道高维空间的特征是什么样子的。

以下是代码部分

# -*- coding: utf-8 -*-
# @Time    : 2018/12/8 17:27
# @Author  : chaucerhou


import matplotlib.pyplot as plt
import  numpy as np
import tensorflow as tf
from sklearn import datasets

sess =tf.Session()

(x_vals, y_vals) = datasets.make_circles(n_samples=500, factor=0.5, noise=.1)
y_vals = np.array([1 if y == 1 else -1 for y in y_vals])
class1_x = [x[0] for i, x in enumerate(x_vals) if y_vals[i] == 1]
class1_y = [x[1] for i, x in enumerate(x_vals) if y_vals[i] == 1]

class2_x = [x[0] for i, x in enumerate(x_vals) if y_vals[i] == -1]
class2_y = [x[1] for i, x in enumerate(x_vals) if y_vals[i] == -1]

batch_size= 200
x_data = tf.placeholder(shape=[None, 2], dtype=tf.float32)
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32)

prediction_grid = tf.placeholder(shape=[None, 2], dtype=tf.float32)
b = tf.Variable(tf.random_normal(shape=[1, batch_size]))

gamma = tf.constant([-50.])
dist =tf.reduce_sum(tf.square(x_data), 1)
dist = tf.reshape(dist, [-1, 1])
sq_dists = tf.add(tf.subtract(dist, tf.multiply(2., tf.matmul(x_data, tf.transpose(x_data)))), tf.transpose(dist))
my_kernel = tf.exp(tf.multiply(gamma, tf.abs(sq_dists)))

model_output = tf.matmul(b, my_kernel)
first_term = tf.reduce_sum(b)
b_vec_cross = tf.matmul(tf.transpose(b), b)
y_target_cross = tf.matmul(y_target, tf.transpose(y_target))
sercond_term = tf.reduce_sum(tf.multiply(my_kernel, tf.multiply(b_vec_cross, y_target_cross)))
loss = tf.negative(tf.subtract(first_term, sercond_term))


#创建预测函数， 准确度函数

rA = tf.reshape(tf.reduce_sum(tf.square(x_data), 1), [-1, 1])
rB = tf.reshape(tf.reduce_sum(tf.square(prediction_grid), 1), [-1, 1])

pred_sq_dist = tf.add(tf.subtract(rA, tf.multiply(2., tf.matmul(x_data, tf.transpose(prediction_grid)))), tf.transpose(rB))

pred_kernel = tf.exp(tf.multiply(gamma, tf.abs(pred_sq_dist)))

prediction_output = tf.matmul(tf.multiply(tf.transpose(y_target), b), pred_kernel)


prediction = tf.sign(prediction_output - tf.reduce_mean(prediction_output))

accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.squeeze(prediction), tf.squeeze(y_target)), tf.float32))

#  创建优化器

my_opt = tf.train.GradientDescentOptimizer(0.002)
train_step = my_opt.minimize(loss)

init = tf.global_variables_initializer()
sess.run(init)


loss_vec = []

batch_accuracy = []
for i in range(1000):
    rand_index = np.random.choice(len(x_vals), batch_size)
    rand_x = x_vals[rand_index]
    rand_y = np.transpose([y_vals[rand_index]])

    sess.run(train_step, feed_dict={x_data:rand_x, y_target:rand_y})

    temp_loss = sess.run(loss, feed_dict={x_data:rand_x, y_target:rand_y})
    loss_vec.append(temp_loss)
    #print("prediction_output:",  sess.run(prediction_output, feed_dict={x_data:rand_x, y_target:rand_y, prediction_grid:rand_x}))
    #print("prediction_output:",  sess.run(prediction, feed_dict={x_data: rand_x, y_target: rand_y, prediction_grid: rand_x}))
    acc_temp = sess.run(accuracy, feed_dict={x_data:rand_x, y_target:rand_y, prediction_grid:rand_x})

    batch_accuracy.append(acc_temp)

    if(i + 1) % 100 == 0:
        print("Step # " + str(i + 1))
        print("Loss = " + str(temp_loss))


x_min, x_max = x_vals[:, 0].min() - 1, x_vals[:, 0].max() + 1
y_min, y_max = x_vals[:, 1].min() - 1, x_vals[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))

grid_points = np.c_[xx.ravel(), yy.ravel()]
[grid_prediction] = sess.run(prediction, feed_dict={x_data:rand_x, y_target:rand_y, prediction_grid:grid_points})
grid_predictions = grid_prediction.reshape(xx.shape)


plt.contourf(xx, yy, grid_predictions, cmap=plt.cm.Paired, alpha=0.8)
plt.plot(class1_x, class1_y, "ro", label="Class 1")
plt.plot(class2_x, class2_y, "kx", label="Class -1")
plt.legend(loc="lower right")
plt.ylim([-1.5, 1.5])
plt.xlim([-1.5, 1.5])
plt.show()


plt.plot(batch_accuracy, "k-", label="Accuracy")
plt.title("Batch Accuracy")
plt.xlabel("Generation")
plt.ylabel("Accuracy")
plt.legend(loc="lower right")
plt.show()

plt.plot(loss_vec, "k-")
plt.title("Loss Per Generation")
plt.xlabel("Generation")
plt.ylabel("Loss")
plt.show()

下面是效果图：

千殇_不哭

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Tensorflow从头学(一)——支持向量机(2)

Tensorflow从头学(一)——支持向量机(2)本文所有内容是参考《Tensorflow机器学习实战指南》非线性的支持向量机模型本文将接上文继续非线性的支持向量机。当数据集中的点无法使用一个平面分割开的时候，我们就需要构造非线性的函数来解决问题，由此我们引入了核函数的概念。为什么要引入核函数如果数据的点不能被线性分割开，那么函数ATx+b=0A^Tx + b = 0ATx+b=0将无...
复制链接

扫一扫

专栏目录