在我拥有的数据中,一些特征值是?.如何用NA代替它们?
编辑
代码和输出如下:
df = pd.read_csv("cca-census-income.csv", header = None)
df.replace('?', np.nan, inplace=True)
df.ix[0,]
23 Other relative of householder
24 1700.09
25 ?
26 ?
27 ?
28 Not in universe under 1 year old
29 ?
30 0
解决方法:
添加参数na_values =’?’到read_csv.
样品:
import pandas as pd
import io
temp=u"""Date Time,a
2010-01-27 16:00:00,?
2010-01-27 16:10:00,2.2
2010-01-27 16:30:00,1.7"""
df = pd.read_csv(io.StringIO(temp),na_values='?')
print (df)
Date Time a
0 2010-01-27 16:00:00 NaN
1 2010-01-27 16:10:00 2.2
2 2010-01-27 16:30:00 1.7
编辑:
谢谢‘shivsn’的建议,添加skipinitialspace = True:
temp=u"""Date Time,a
? , ?
? ,?
2010-01-27 16:30:00,1.7"""
df = pd.read_csv(io.StringIO(temp),na_values=['?', '? '], skipinitialspace =True)
print (df)
Date Time a
0 NaN NaN
1 NaN NaN
2 2010-01-27 16:30:00 1.7
EDIT1按文件:
似乎前面没有空格?:
df = pd.read_csv('census-income.data',
header = None,
na_values=['?'],
skipinitialspace =True)
print (df)
标签:pandas,python-2-7,python,numpy
来源: https://codeday.me/bug/20191026/1940023.html