python将arff文件转为csv文件

1、方法说明

数据集有可能是以arff格式(weka用的)保存,一般的机器学习使用numpy,pandas和sklearn多一些,无法直接读取文件,所以需要scipy.io.arff.loadarff过渡下。

2、代码示例

from scipy.io import arff
import pandas as pd 

file_name='/Users/schillerxu/Documents/sourcecode/python/pandas/CM1.arff'

data,meta=arff.loadarff(file_name)
#print(data)
print(meta)

df=pd.DataFrame(data)
print(df.head())
#print(df)

#保存为csv文件
# out_file='/Users/schillerxu/Documents/sourcecode/python/pandas/CM1.csv'
# output=pd.DataFrame(df)
# output.to_csv(out_file,index=False)

程序运行的结果如下:

[Running] python -u "/Users/schillerxu/Documents/sourcecode/python/pandas/arff_to_csv.py"
Dataset: CM1
	LOC_BLANK's type is numeric
	BRANCH_COUNT's type is numeric
	CALL_PAIRS's type is numeric
	LOC_CODE_AND_COMMENT's type is numeric
	LOC_COMMENTS's type is numeric
	CONDITION_COUNT's type is numeric
	CYCLOMATIC_COMPLEXITY's type is numeric
	CYCLOMATIC_DENSITY's type is numeric
	DECISION_COUNT's type is numeric
	DECISION_DENSITY's type is numeric
	DESIGN_COMPLEXITY's type is numeric
	DESIGN_DENSITY's type is numeric
	EDGE_COUNT's type is numeric
	ESSENTIAL_COMPLEXITY's type is numeric
	ESSENTIAL_DENSITY's type is numeric
	LOC_EXECUTABLE's type is numeric
	PARAMETER_COUNT's type is numeric
	HALSTEAD_CONTENT's type is numeric
	HALSTEAD_DIFFICULTY's type is numeric
	HALSTEAD_EFFORT's type is numeric
	HALSTEAD_ERROR_EST's type is numeric
	HALSTEAD_LENGTH's type is numeric
	HALSTEAD_LEVEL's type is numeric
	HALSTEAD_PROG_TIME's type is numeric
	HALSTEAD_VOLUME's type is numeric
	MAINTENANCE_SEVERITY's type is numeric
	MODIFIED_CONDITION_COUNT's type is numeric
	MULTIPLE_CONDITION_COUNT's type is numeric
	NODE_COUNT's type is numeric
	NORMALIZED_CYLOMATIC_COMPLEXITY's type is numeric
	NUM_OPERANDS's type is numeric
	NUM_OPERATORS's type is numeric
	NUM_UNIQUE_OPERANDS's type is numeric
	NUM_UNIQUE_OPERATORS's type is numeric
	NUMBER_OF_LINES's type is numeric
	PERCENT_COMMENTS's type is numeric
	LOC_TOTAL's type is numeric
	Defective's type is nominal, range is ('Y', 'N')

   LOC_BLANK  BRANCH_COUNT  CALL_PAIRS  ...  PERCENT_COMMENTS  LOC_TOTAL  Defective
0        6.0           9.0         2.0  ...              4.00       25.0       b'N'
1       15.0           7.0         3.0  ...             39.22       32.0       b'Y'
2       27.0           9.0         1.0  ...             47.27       33.0       b'Y'
3        7.0           3.0         2.0  ...              0.00       12.0       b'N'
4       51.0          25.0        13.0  ...             11.67      106.0       b'N'

[5 rows x 38 columns]

[Done] exited with code=0 in 0.664 seconds

可以明显看到meta保存的是数据集的基本信息。

3、参考资料

Python加载arff文件

  • 10
    点赞
  • 34
    收藏
    觉得还不错? 一键收藏
  • 9
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 9
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值