numpy如何对txt文件读取_读取使用numpy的loadtxt功能从文本文件中值

最新推荐文章于 2023-03-21 20:58:57 发布

weixin_39722692

最新推荐文章于 2023-03-21 20:58:57 发布

阅读量432

点赞数

文章标签： numpy如何对txt文件读取

本文链接：https://blog.csdn.net/weixin_39722692/article/details/111532925

版权

I have a file with this form:

label1, value1, value2, value3,

label2, value1, value2, value3,

...

I want to read it using numpy loadtxt function so I can have each label with its values in an array, so the final result will be an array of arrays, each array of them include the label and an array of features like this:

array([[label1, [value1, value2, value3]],

[label2, [value1, value2, value3]]])

I have tried the following but did not work:

c = StringIO(u"text.txt")

np.loadtxt(c,

dtype={'samples': ('label', 'features'), 'formats': ('s9',np.float)},

delimiter=',', skiprows=0)

any idea?

解决方案

You are on the right tract with defining the dtype. You are just missing the field shape.

I'll demonstrate:

A 'text' file - a list of lines (bytes in Py3):

In [95]: txt=b"""label1, 12, 23.2, 232

....: label2, 23, 2324, 324

....: label3, 34, 123, 2141

....: label4, 0, 2, 3

....: """

In [96]: txt=txt.splitlines()

A dtype with 2 fields, one with strings, the other with floats (3 for 'field shape'):

In [98]: dt=np.dtype([('label','U10'),('values', 'float',(3))])

In [99]: data=np.genfromtxt(txt,delimiter=',',dtype=dt)

In [100]: data

Out[100]:

array([('label1', [12.0, 23.2, 232.0]), ('label2', [23.0, 2324.0, 324.0]),

('label3', [34.0, 123.0, 2141.0]), ('label4', [0.0, 2.0, 3.0])],

dtype=[('label', '

In [101]: data['label']

Out[101]:

array(['label1', 'label2', 'label3', 'label4'],

dtype='

In [103]: data['values']

Out[103]:

array([[ 1.20000000e+01, 2.32000000e+01, 2.32000000e+02],

[ 2.30000000e+01, 2.32400000e+03, 3.24000000e+02],

[ 3.40000000e+01, 1.23000000e+02, 2.14100000e+03],

[ 0.00000000e+00, 2.00000000e+00, 3.00000000e+00]])

With this definition the numeric values can be accessed as a 2d array. Sub-arrays like this are under appreciated.

The dtype could be been specified with the dictionary syntax, but I'm more familiar with the list of tuples form.

Equivalent dtype specs:

np.dtype("U10, (3,)f")

np.dtype({'names':['label','values'], 'formats':['S10','(3,)f']})

np.genfromtxt(txt,delimiter=',',dtype='S10,(3,)f')

===============================

I think that this txt, if parsed with dtype=None would produce

In [30]: y

Out[30]:

array([('label1', 12.0, 23.2, 232.0), ('label2', 23.0, 2324.0, 324.0),

('label3', 34.0, 123.0, 2141.0), ('label4', 0.0, 2.0, 3.0)],

dtype=[('f0', '

The could be converted to the subfield form with

y.view(dt)

This works as long as the underlying data representation (seen as a flat list of bytes) is compatible (here 10 unicode characters (40 bytes), and 3 floats, per record).

weixin_39722692

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫