《集体智慧编程》第九章关于婚介数据集的SVM分类

最新推荐文章于 2021-06-25 12:30:39 发布

beifeng600

最新推荐文章于 2021-06-25 12:30:39 发布

阅读量1.3k

点赞数 1

分类专栏： Python 读书笔记文章标签：集体智慧编程婚介数据集 svm分类 libsvm

读书笔记同时被 2 个专栏收录

4 篇文章 0 订阅

订阅专栏

3 篇文章 0 订阅

订阅专栏

原文转自 http://muilpin.blog.163.com/blog/static/165382936201131875249123/

《集体智慧编程》关于婚介数据集的SVM分类

作者写这本书的年代已经很久远了，于是里面使用到的LIBSVM接口与现在也非常不一样：

1.书本上提高的官方下载svm已经更新至3.x版本，不适合（研究了很久，发现接口很大不一样，建议阅读者不要使用最新版本，如果实在需要请参考本文第4点）

2.网上有人用libsvm2.89在Python2.6成功，于是仿效。两步：将libsvm-2.89\windows\python目录下的svmc.pyd文件复制到C:\Python26\DLLs；将libsvm-2.89\python目录下的svm.py放到C:\Python26\Lib目录里。from svm import * 成功(本点非本人总结，但附上libsvm2.89的下载地址：http://ishare.iask.sina.com.cn/f/6344231.html)

3.如果想要超级简单的了解关于SVM的知识，可以参考书籍提供下载svm官网中的guide文件，写的挺精辟的；

4.本人花了一个下午的时间使用了svm3.x版本，如果你实在坚持使用3.x版本，可以参考以下的方法：

(1)下载libsvm代码包，解压后将文件夹Window下的libsvm.dll文件复制到C:\WINDOWS\System32路径下；

(2)将代码包里面的文件夹Python中的svmutil.py文件复制到你的工作目录（如果想用svm.py，同理操作，以下以svmutil.py举例，区别请看python文件夹下的README！）

(3)与书上不同的地方在于：svm_parameter()函数的使用方法更新了，方法如下说明：

用法： param = svm_parameter('Training Opition')

实例：param = svm_parameter('-s 3 -c 5 -h 0')

其中里面的参数如下详细说明：

options:

-s svm_type : set type of SVM (default 0)

0 -- C-SVC

1 -- nu-SVC

2 -- one-class SVM

3 -- epsilon-SVR

4 -- nu-SVR

-t kernel_type : set type of kernel function (default 2)

0 -- linear: u'*v

1 -- polynomial: (gamma*u'*v + coef0)^degree

2 -- radial basis function: exp(-gamma*|u-v|^2)

3 -- sigmoid: tanh(gamma*u'*v + coef0)

4 -- precomputed kernel (kernel values in training_set_file)

-d degree : set degree in kernel function (default 3)

-g gamma : set gamma in kernel function (default 1/num_features)

-r coef0 : set coef0 in kernel function (default 0)

-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1)

-n nu : set the parameter nu of nu-SVC, one-class SVM, and nu-SVR (default 0.5)

-p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1)

-m cachesize : set cache memory size in MB (default 100)

-e epsilon : set tolerance of termination criterion (default 0.001)

-h shrinking : whether to use the shrinking heuristics, 0 or 1 (default 1)

-b probability_estimates : whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)

-wi weight : set the parameter C of class i to weight*C, for C-SVC (default 1)

-v n: n-fold cross validation mode

-q : quiet mode (no outputs)

(4)svm_model已经成为一个类，不能直接调用，但可以通过以下两种方法实现对数据集的建模：

>>> model = svm_train(prob, param)

>>> model = svm_load_model('model_file_name') #从保存的model中获取模型

(5)另外有一定需要特别注意的是，书本上的这种写法已经不合适了：

>>> newrow=[28.0,-1,-1,26.0,-1,1,2,0.8] # Man doesn't want children, woman does

>>> m.predict(scalef(newrow))

可以更新为：

>>>newrow=[(28.0,-1,-1,26.0,-1,1,2,0.8)] # 注意里面多了一个元组符号'()'

>>>svm_predict([0]*len(newrow),newrow,m) #注意m代表svm_train出来的模型，第一个参数的解释如下：

a list/tuple of l true labels (type must be int/double). It is used for calculating the accuracy. Use [0]*len(x) if true labels are unavailable.

5.如果你下载了svm3.x版本，就需要详细看下载包里面的README文件，里面有提到各种函数的用法，但解释感觉不全面；

6.另外书中第9章还有一些错误如下：

def scaledata(rows):

low=[999999999.0]*len(rows[0].data)

high=[-999999999.0]*len(rows[0].data)

# Find the lowest and highest values

for row in rows:

d=row.data

for i in range(len(d)):

if d[i]<low[i]: low[i]=d[i]

if d[i]>high[i]: high[i]=d[i]

# Create a function that scales data

def scaleinput(d):

return [(d.data[i]-low[i])/(high[i]-low[i])for i in range(len(low))] #可能出错(1)(2)

# Scale all the data

newrows=[matchrow(scaleinput(row.data)+[row.match])for row in rows]

# Return the new data and the function

return newrows,scaleinput

可能出错(1):如果使用作者前面计算位置距离的函数milesdistance():

def milesdistance(a1,a2):

return 0

分母则会为0出错，我的做法如下：1.产生[0,1]随机数；2.分母另外加上0.000000001；但使用Yahoo来获取距离是可以的！

出错的地方(2)：d.data[i]出错，应该更改为d[i]

还有附录B中计算点积的公式有误：

def veclength(a):

return sum([a[i] for i in range(len(a))])**.5

一个多维向量的模应该为a[i]**2而非a[i];

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
《集体智慧编程》第九章关于婚介数据集的SVM分类

原文转自 http://muilpin.blog.163.com/blog/static/165382936201131875249123/ 《集体智慧编程》关于婚介数据集的SVM分类作者写这本书的年代已经很久远了，于是里面使用到的LIBSVM接口与现在也非常不一样：1.书本上提高的官方下载svm已经更新至3.x版本，不适合（研究了很久，发现接口很大不一样，建议
复制链接

扫一扫

专栏目录

评论 1

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。