libsvm安装参考网站:http://www.cnblogs.com/liangxw1987/archive/2012/11/26/2788850.html
关于xgboost操作libsvm参考网站:http://blog.csdn.net/john159151/article/details/45549143
还有一种工具叫phraug;可以使csv到libsvm格式之间转换
第一步:从官网下载libsvm安装文件
官网地址:http://www.csie.ntu.edu.tw/~cjlin/libsvm/
$ wget -r -O libsvm+tar.gz "http://www.csie.ntu.edu.tw/~cjlin/cgi-bin/libsvm.cgi?+http://www.csie.ntu.edu.tw/~cjlin/libsvm+tar.gz"
$ tar -zxvf libsvm+tar.gz
第二步:进入libsvm-3.22目录
在***/libsvm-3.22目录下执行
$ make lib #生成libsvm.so.2文件
在***/libsvm-3.22/python目录下执行
$ make
将***/libsvm-3.22/python目录下的*.py和***/libsvm-3.22目录下的libsvm.so.2文件copy到python环境包里
$ sudo cp *.py /usr/lib/python2.7/dist-packages/
$ cd ..
$ sudo cp libsvm.so.2 /usr/lib/python2.7/
第三部:检查是否安装成功
python文件中尝试引入包
import svm
import svmutil
关于libsvm
他的数据格式为
说明一点:训练数据DataFrame格式一共有多少行,libsvm就有多少行。indexi表示那一列特征。
<label> <index1>:<value1> <index2>:<value2> .......
比如
1 1:2.927699e+01 2:1.072510e+02 3:1.149632e-01 4:1.077885e+02
一个简单的dataFrame转libsvm文件的例子
#-*- coding:utf-8 -*-
import numpy as np
import pandas as pd
from sklearn.datasets import dump_svmlight_file
from svmutil import svm_read_problem
# dataFrame2libsvm
df = pd.DataFrame()
df['feature1'] = np.random.rand(10,)
df['feature2'] = np.random.rand(10,)
df['feature3'] = np.random.rand(10,)
df['label'] = map(lambda x: -1 if x < 0.5 else 1, np.random.rand(10,))
train_x = df[ np.setdiff1d(df.columns,['label']) ]
train_y = df.label
print df
dump_svmlight_file(train_x,train_y,'./data/smvlight.libsvm',zero_based=True,multilabel=False)
train_y, train_x = svm_read_problem('./data/smvlight.libsvm') #注意返回参数中第一个list是label
print train_y # type = list
print train_x # type = list
输出结果
tensorflow@NoNo:~/py_workspace/code_test$ python libsvm_test.py
feature1 feature2 feature3 label
0 0.658051 0.231439 0.959484 -1
1 0.946773 0.162317 0.019349 1
2 0.492598 0.538605 0.487779 1
3 0.878717 0.180026 0.317419 1
4 0.274376 0.757067 0.763130 -1
5 0.028619 0.756345 0.384797 1
6 0.297084 0.037591 0.170282 -1
7 0.690053 0.772461 0.771781 -1
8 0.552999 0.006163 0.194889 -1
9 0.121132 0.784318 0.213316 1
[-1.0, 1.0, 1.0, 1.0, -1.0, 1.0, -1.0, -1.0, -1.0, 1.0]
[{0: 0.6580514571148101, 1: 0.2314389067431816, 2: 0.9594842762467438},
{0: 0.9467733677685749, 1: 0.1623167175795766, 2: 0.01934886005436343},
{0: 0.4925978580764161, 1: 0.5386046263984983, 2: 0.4877788468915214},
{0: 0.8787172500390132, 1: 0.1800264331531124, 2: 0.3174192448003894},
{0: 0.2743762486602979, 1: 0.7570674932547728, 2: 0.763130398639558},
{0: 0.02861937544524562, 1: 0.7563446388879407, 2: 0.3847974464441019},
{0: 0.2970842179706645, 1: 0.03759119176879699, 2: 0.1702815123822912},
{0: 0.6900527245965673, 1: 0.7724610030039325, 2: 0.7717805678493267},
{0: 0.552998651635158, 1: 0.006163155707424983, 2: 0.1948893427641173},
{0: 0.1211315661039534, 1: 0.7843176563346573, 2: 0.2133161959586511}]