I have been using Python 2.7 for some time now and have recently switched to Python 3. I have already updated my code on some points, but the problem I currently have deludes me. What I am trying to do is to load a dataset using np.loadtxt. Because this data also contains strings I am importing the full array as a string. I want to do type conversions after to convert some entries to float. This fails miserably and I do not understand why. All I see is that in Python 3 all strings get the prefix 'b' and I have the feeling this has something to do with this, but I cannot find a concise answer. Code and error below.
filename = 'train.csv'
raw_data = open(filename, 'rb')
data = np.loadtxt(raw_data, delimiter=",", dtype = 'str')
dataset = data[1:,1:]
print(dataset)
original_data = dataset
test = float(dataset[0,0])
print(test)
Result
[["b'60'" "b'RL'" "b'65'" ..., "b'WD'" "b'Normal'" "b'208500'"]
["b'20'" "b'RL'" "b'80'" ..., "b'WD'" "b'Normal'" "b'181500'"]
["b'60'" "b'RL'" "b'68'" ..., "b'WD'" "b'Normal'" "b'223500'"]
...,
["b'70'" "b'RL'" "b'66'" ..., "b'WD'" "b'Normal'" "b'266500'"]
["b'20'" "b'RL'" "b'68'" ..., "b'WD'" "b'Normal'" "b'142125'"]
["b'20'" "b'RL'" "b'75'" ..., "b'WD'" "b'Normal'" "b'147500'"]]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
5 print(dataset)
6 original_data = dataset
----> 7 test = float(dataset[0,0])
8 print(test)
ValueError: could not convert string to float: "b'60'"
解决方案
As suggested by dnalow, something goes wrong in the type conversion because I first open the file and then read from it. The solution is to not use open open(filename, 'rb') and np.loadtxt, but to use np.genfromtxt. Code below.
filename = 'train.csv'
data = np.genfromtxt(filename, delimiter=",", dtype = 'str')
dataset = data[1:,1:]
print(dataset)
original_data = dataset
test = float(dataset[0,0])
print(test)
filename = 'train.csv'
data = np.genfromtxt(filename, delimiter=",", dtype = 'str')
dataset = data[1:,1:]
print(dataset)
original_data = dataset
test = float(dataset[0,0])
print(test)
Result
[['60' 'RL' '65' ..., 'WD' 'Normal' '208500']
['20' 'RL' '80' ..., 'WD' 'Normal' '181500']
['60' 'RL' '68' ..., 'WD' 'Normal' '223500']
...,
['70' 'RL' '66' ..., 'WD' 'Normal' '266500']
['20' 'RL' '68' ..., 'WD' 'Normal' '142125']
['20' 'RL' '75' ..., 'WD' 'Normal' '147500']]
60.0