前言
其实看了这么久的SVM也没有在任何一个实验中真实确定地作为最终方法使用过,因为。。。。这玩意效果真的很差很差很差!!!!!
但是直觉告诉我有一天我会遇到我的如意文章,踏着SVM来找我,所以我还是坚持继续好好学下去,这应该是在上了一学期《模式识别》和一学期《矩阵分析》以后再回来小推一下,理解更深刻了。。。。吧?
本章接着之前的SVM直观理解继续深入,有兴趣的同学可以回去看看:
白歌:帮助理解SVM+RBF的代码(python)zhuanlan.zhihu.comemm稍微运行一下感受一下就可以回来了,然后这里是SVM随需求发展而增加内容的。。。流程图?帮助对SVM中涉及到的各个部件大致有啥用有个了解呐
![c79a7240e730fb40d56ab59a355dea8d.png](https://i-blog.csdnimg.cn/blog_migrate/c0d02d1f9436b2e1eb395752a603bafa.jpeg)
支持向量
首先SVM的全称是Support Vector Machine,那么支持向量(Support Vector)是什么呢?你看:
![7ee63e77ff6eac420e0762eefeeeca21.png](https://i-blog.csdnimg.cn/blog_migrate/b8b8615f6c0f440b3ab4f248305af663.jpeg)
图中分类边界S的Support Vector就是支持向量啦~而支持向量所决定的直线(超平面,具体超平面是个啥不懂的话就自行查一下啦)就是两个类的分类边界,至于一个法向量为何能决定一个超平面,请看:
法向量决定超平面
设法向量SV坐标为
![equation?tex=%28SV_1%2C+...%2CSV_n%29](https://i-blog.csdnimg.cn/blog_migrate/54d002a64bfb35442c6c30ccc7f26a60.png)
![equation?tex=S_1X_%7B%E2%88%881%7D%27%2B...%2BS_nx_n%27%3D0](https://i-blog.csdnimg.cn/blog_migrate/2cea31696d92da7905c4668d8de32efb.png)
可知超平面上的点的坐标为
![equation?tex=X%27%3D%28X_%7B%E2%88%881%7D%27%2C...%2Cx_n%27%29](https://i-blog.csdnimg.cn/blog_migrate/accecc17b14193123537ab8a68a09ed2.png)
![equation?tex=SV%E2%8A%A5S](https://i-blog.csdnimg.cn/blog_migrate/3712206d9434377bb70bc6a41863121e.png)
![equation?tex=SV_1X_%7B%E2%88%881%7D%27%2B...%2BSV_nx_n%27%3D0+](https://i-blog.csdnimg.cn/blog_migrate/9a18cc6eab5067e0992ec5bb3cbb95ca.png)
即:
![equation?tex=SV%5ET%2AX%27%3D0](https://i-blog.csdnimg.cn/blog_migrate/ebe689c87213fad14c64dbd68ea4d58b.png)
正好法向量与超平面垂直,而由上式可知,法向量的元素即为超平面的系数。所以SVM求得了支持向量,即为求得了支持向量机的分类边界。
线性可分问题
还是那张图,你看:
![9c79ecdfac32ba2c40002aa4943efc61.png](https://i-blog.csdnimg.cn/blog_migrate/758610810e858c8ea1f710c3a59d9371.jpeg)
假设样本点为
![equation?tex=X_%7B%E2%88%881%7D%3D%28x_1%2C...%2Cx_n%29](https://i-blog.csdnimg.cn/blog_migrate/235bd4c6e6c2d0d9a8ed1e4076e605e0.png)
![equation?tex=%5Cvec%7BOP_%7BSV%7D%7D%3D%7C%7C%5Cvec%7BOX_%7B%5Cin1%7D%7D%7C%7Ccos%5Cangle%7BP_%7BSV%7DOX_%7B%E2%88%881%7D%7D%3DSV%5ET%2AX+](https://i-blog.csdnimg.cn/blog_migrate/98d17a34f35fa10ed0997eb919a0062b.png)
该投影长度也即为点
![equation?tex=X_%7B%E2%88%881%7D](https://i-blog.csdnimg.cn/blog_migrate/4ebe00f2c528ffd39df13b1fc740441e.png)
令:
![equation?tex=X_%7B%E2%88%881%7D](https://i-blog.csdnimg.cn/blog_migrate/4ebe00f2c528ffd39df13b1fc740441e.png)
又由于
![equation?tex=X_%7B%E2%88%881%7D](https://i-blog.csdnimg.cn/blog_migrate/4ebe00f2c528ffd39df13b1fc740441e.png)
![equation?tex=%5Cvec%7BOP_%7BSV%7D%7D](https://i-blog.csdnimg.cn/blog_migrate/86089af0a9ae0d0b5b3d115e59ccd9e8.png)
![equation?tex=SV](https://i-blog.csdnimg.cn/blog_migrate/a68c781121d6171b537c434fb87d6ff6.png)
![equation?tex=SV%5ETX_%7B%5Cin%7B1%7D%7D%3DSV%5ET%5Cvec%7BOP_%7BSV%7D%7D%3D%7C%7CSV%7C%7C%2A%7C%7C%5Cvec%7BOP_%7BSV%7D%7D%7C%7C%3D1+](https://i-blog.csdnimg.cn/blog_migrate/a68c781121d6171b537c434fb87d6ff6.png%5ETX_%7B%5Cin%7B1%7D%7D%3DSV%5ET%5Cvec%7BOP_%7BSV%7D%7D%3D%7C%7CSV%7C%7C%2A%7C%7C%5Cvec%7BOP_%7BSV%7D%7D%7C%7C%3D1+)
同理,
![equation?tex=X_%7B%5Cin0%7D](https://i-blog.csdnimg.cn/blog_migrate/828faee33c0f7851564056f97202526a.png)
![equation?tex=%5Cvec%7BOP_%7BSV%7D%7D](https://i-blog.csdnimg.cn/blog_migrate/86089af0a9ae0d0b5b3d115e59ccd9e8.png)
![equation?tex=SV%5ET%2AX_%7B%5Cin0%7D%3D-1](https://i-blog.csdnimg.cn/blog_migrate/a68c781121d6171b537c434fb87d6ff6.png%5ET%2AX_%7B%5Cin0%7D%3D-1)
此时,可知,当||SV||越小,则
![equation?tex=%7C%7C%5Cvec%7BOP_%7BSV%7D%7D%7C%7C](https://i-blog.csdnimg.cn/blog_migrate/6fb23d28c23af252625f8f493760e601.png)
![equation?tex=min%5C+%5Cfrac%7B1%7D%7Bn%7D%28SV%5ET%2ASV%29+](https://i-blog.csdnimg.cn/blog_migrate/dd09d8a36acc09ff88bfda6bf10d723e.png)
![equation?tex=std1%3ASV%5ET%2AX_%7B%5Cin1%7D%3D1](https://i-blog.csdnimg.cn/blog_migrate/39eb9d43de0b90d2a0c7ff6a5ecf9b8f.png)
![equation?tex=+std2%3ASV%5ET%2AX_%7B%5Cin0%7D%3D-1](https://i-blog.csdnimg.cn/blog_migrate/b0b374a7a9ad27d46f52f3dcfc4a839b.png)
此时
![equation?tex=SV%3D%28SV_1%2C...%2CSV_n%29](https://i-blog.csdnimg.cn/blog_migrate/a68c781121d6171b537c434fb87d6ff6.png%3D%28SV_1%2C...%2CSV_n%29)
![equation?tex=X%3D%28X_%7B%5Cin1%7D%2CX_%7B%5Cin0%7D%29](https://i-blog.csdnimg.cn/blog_migrate/91d69788a69a62104725b5384a17b8aa.png)
![equation?tex=+l%3D%5Cfrac%7B1%7D%7Bn%7D%5Csum%28SV%5ET%2ASV%29%2B%5Csum%7B%28%5Csigma_%7B%5Cin1%7D%28SV%5ET%2AX_%7B%5Cin1%7D-1%29%7D%2B%5Csum%7B%5Csigma_%7B%5Cin0%7D%28SV%5ET%2AX_%7B%5Cin0%7D%2B1%29%29%7D](https://i-blog.csdnimg.cn/blog_migrate/1a7807447dc1c1fe0fdf8925a441d695.png)
其中
![equation?tex=%5Csigma_%7B%5Cin1%2F0%7D](https://i-blog.csdnimg.cn/blog_migrate/5cb0875c0b99edbffb5e1a047fe1c523.png)
线性不可分问题
先看之前提到的拉格朗日方程,其中,只有当
![equation?tex=%5Csum%7B%5Csigma_%7B%5Cin1%7D%28SV%5ET%2AX_%7B%5Cin1%7D-1%29%7D](https://i-blog.csdnimg.cn/blog_migrate/3fffa401c2aafe384d06ec54610a26f0.png)
![equation?tex=%5Csum%7B%5Csigma_%7B%5Cin0%7D%28SV%5ET%2AX_%7B%5Cin0%7D%2B1%29%7D](https://i-blog.csdnimg.cn/blog_migrate/0dde72cad1bbb9895af27cda77397bca.png)
![equation?tex=%5Cfrac%7B1%7D%7Bn%7D%5Csum%28SV%5ET%2ASV%29](https://i-blog.csdnimg.cn/blog_migrate/ba6d47eb54959f9ac169b5a86efc4039.png)
![d62bdc57aef9b73e398f1962b85916ca.png](https://i-blog.csdnimg.cn/blog_migrate/2aab7b731bcfeb0b5590d9494f38c949.jpeg)
所以这种情况下拉格朗日方程更多地是被当做SVM的损失函数使用,于是就被转换为:
![equation?tex=loss%3D%5Cfrac%7B1%7D%7Bn%7D%5Csum%28SV%5ET%2ASV%29%2B%5Csum%7B%28y_1%2Acost%28%5Chat%7By%7D%2Cy_1%29%2B%281-y_1%29%2Acost%28%5Chat%7By%7D%2Cy_0%29%29%7D](https://i-blog.csdnimg.cn/blog_migrate/cef415ddaa0d03df3ecc4951a45f650e.png)
其中,
![equation?tex=y_%7B1%2F0%7D](https://i-blog.csdnimg.cn/blog_migrate/68d29b18409302a3eb7bd7197ca0a80c.png)
![equation?tex=cost%28%5Chat%7By%7D%2Cy%29](https://i-blog.csdnimg.cn/blog_migrate/121183c4088b0004c876c8d33ce65133.png)
![equation?tex=%5Chat%7By%7D](https://i-blog.csdnimg.cn/blog_migrate/cb57703be796b43f52cee9b9c6c49484.png)
![equation?tex=y_%7B1%2F0%7D%5Cin%5C%7B0%2C1%5C%7D](https://i-blog.csdnimg.cn/blog_migrate/68d29b18409302a3eb7bd7197ca0a80c.png%5Cin%5C%7B0%2C1%5C%7D)
![equation?tex=loss%3D%5Cfrac%7B1%7D%7Bn%7D%5Csum%28SV%5ET%2ASV%29-%5Csum%7B%28P%28y_1%7CX%29ln%28P%28%5Chat%7By_1%7D%7CX%29%29%2B%281-P%28y_0%7CX%29%29ln%28P%28%5Chat%7By_0%7D%7CX%29%29%7D](https://i-blog.csdnimg.cn/blog_migrate/7b132ba9cfdb3cb257bf02e5d5ba1eb8.png)
交叉熵定义如下:
交叉熵(Cross Entropy)是Shannon信息论中一个重要概念,主要用于度量两个概率分布间的差异性信息——百度百科
公式如下:
![equation?tex=En%28y_1%2Cy_2%29%3D-y_1%2Aln%28y_2%29](https://i-blog.csdnimg.cn/blog_migrate/0e1082e4187220b37610acb354525784.png)
具体为啥这个函数能反映概率分布的差异性,还没仔细研究,就假装很懂地说,它就这样吧,其中
![equation?tex=y_1](https://i-blog.csdnimg.cn/blog_migrate/ca23ae6cf03780d0ae7657f132ac3071.png)
![equation?tex=y_2](https://i-blog.csdnimg.cn/blog_migrate/41279abff97faa71275471b6305afc78.png)
![equation?tex=En%28y_1%2Cy_2%29%3D-y_1%2Aln%28y_2%29](https://i-blog.csdnimg.cn/blog_migrate/0e1082e4187220b37610acb354525784.png)
![equation?tex=En%28y_1%2Cy_2%29%27%3Dy_1%2Aln%28y_2%29](https://i-blog.csdnimg.cn/blog_migrate/092916537a8a3b2ede14b10a8ac356ee.png)
![047ee5dcb5d77002c14578751a22afa4.png](https://i-blog.csdnimg.cn/blog_migrate/2c95825a93af1ca415879ec25973a860.jpeg)
由图可知,在每一个水平线上,
![equation?tex=y_1](https://i-blog.csdnimg.cn/blog_migrate/ca23ae6cf03780d0ae7657f132ac3071.png)
![equation?tex=y_2](https://i-blog.csdnimg.cn/blog_migrate/41279abff97faa71275471b6305afc78.png)
![equation?tex=En%28y_1%2Cy_2%29%27](https://i-blog.csdnimg.cn/blog_migrate/1427be137a6d43079b1c5f54a67be1a6.png)
![equation?tex=En%28y_1%2Cy_2%29](https://i-blog.csdnimg.cn/blog_migrate/71360645e8ffa4fe0ba2305cacb2f0ba.png)
下一篇:SVM对偶问题