1、dataframe to libsvm
首先我们看下目标数据
2.0000001.00000038.50000054.00000020.0000000.0000001.0000002.0000002.0000003.0000004.0000001.0000002.0000002.0000005.9000000.0000002.00000042.0000006.3000000.0000000.0000001.000000
一共22列,最后一列尾标签
我们先读入数据转换成dataframe格式【当然也可以直接转换libsvm】
import pandas as pd
import os
#读入TXT文件
file_name = "***Test.txt"
file_data = open(file_name, 'r')
data=[]
for line in file_data.readlines():
features = line.strip().split('\t')
data.append(features)
#存储到list
df=pd.DataFrame(data)
cwd = os.getcwd()#获取当前路径
libsvmtxt = cwd + '/libsvm.txt'#创建一个TXT文件
f=open(libsvmtxt,'w')
num=df.shape[0]
columns=df.shape[1]
label = df[columns-1]
for j in range(num-1):
libsvm = ''
for i in range(columns-1):
libsvm += " %d:%s" % (i, df[i][j])
#print (svm_format)
svm_format = "%s%s\n" % (label[j], libsvm)
f.write(svm_format)#写入
这样就可以得到需要的libsvm格式了
2、 libsvm to dataframe
我们直接使用load_svmlight_file
from sklearn.datasets import load_svmlight_file
from pandas import DataFrame
import pandas as pd
file_name = cwd + '/libsvm.txt'
X_train, y_train = load_svmlight_file(file_name)
这样直接得到的数据是sparse matrix
需要转化一下
mat = X_train.todense()
#X
df1 = pd.DataFrame(mat)
#y
df2 = pd.DataFrame(y_train)
df2.columns = ['target']
#合在一起
df = pd.concat([df2, df1], axis=1) # 第一列为target
df.to_csv("df_data.txt", index=False)
Python sklearn.datasets.dump_svmlight_file() Examples: