I have a file with this form:
label1, value1, value2, value3,
label2, value1, value2, value3,
...
I want to read it using numpy loadtxt function so I can have each label with its values in an array, so the final result will be an array of arrays, each array of them include the label and an array of features like this:
array([[label1, [value1, value2, value3]],
[label2, [value1, value2, value3]]])
I have tried the following but did not work:
c = StringIO(u"text.txt")
np.loadtxt(c,
dtype={'samples': ('label', 'features'), 'formats': ('s9',np.float)},
delimiter=',', skiprows=0)
any idea?
解决方案
You are on the right tract with defining the dtype. You are just missing the field shape.
I'll demonstrate:
A 'text' file - a list of lines (bytes in Py3):
In [95]: txt=b"""label1, 12, 23.2, 232
....: label2, 23, 2324, 324
....: label3, 34, 123, 2141
....: label4, 0, 2, 3
....: """
In [96]: txt=txt.splitlines()
A dtype with 2 fields, one with strings, the other with floats (3 for 'field shape'):
In [98]: dt=np.dtype([('label','U10'),('values', 'float',(3))])
In [99]: data=np.genfromtxt(txt,delimiter=',',dtype=dt)
In [100]: data
Out[100]:
array([('label1', [12.0, 23.2, 232.0]), ('label2', [23.0, 2324.0, 324.0]),
('label3', [34.0, 123.0, 2141.0]), ('label4', [0.0, 2.0, 3.0])],
dtype=[('label', '
In [101]: data['label']
Out[101]:
array(['label1', 'label2', 'label3', 'label4'],
dtype='
In [103]: data['values']
Out[103]:
array([[ 1.20000000e+01, 2.32000000e+01, 2.32000000e+02],
[ 2.30000000e+01, 2.32400000e+03, 3.24000000e+02],
[ 3.40000000e+01, 1.23000000e+02, 2.14100000e+03],
[ 0.00000000e+00, 2.00000000e+00, 3.00000000e+00]])
With this definition the numeric values can be accessed as a 2d array. Sub-arrays like this are under appreciated.
The dtype could be been specified with the dictionary syntax, but I'm more familiar with the list of tuples form.
Equivalent dtype specs:
np.dtype("U10, (3,)f")
np.dtype({'names':['label','values'], 'formats':['S10','(3,)f']})
np.genfromtxt(txt,delimiter=',',dtype='S10,(3,)f')
===============================
I think that this txt, if parsed with dtype=None would produce
In [30]: y
Out[30]:
array([('label1', 12.0, 23.2, 232.0), ('label2', 23.0, 2324.0, 324.0),
('label3', 34.0, 123.0, 2141.0), ('label4', 0.0, 2.0, 3.0)],
dtype=[('f0', '
The could be converted to the subfield form with
y.view(dt)
This works as long as the underlying data representation (seen as a flat list of bytes) is compatible (here 10 unicode characters (40 bytes), and 3 floats, per record).