背景:
读一个文件中的数据,用来训练一个小模型,发现数据中有异常值,如下:
使用pandas读数据,然后对数值类型特征,进行归一化,报错:
def minmax_norm(df):
return (df - df.min()) / (df.max() - df.min())
if __name__=='__main__':
train_data_path = 'train_1205_shanghai.txt'
test_data_path = 'test_1206_shanghai.txt'
# load_data_to_df(path)
col_name = ['a','b','c']
train_data = pd.read_table(train_data_path, header=None)
train_data.columns = col_name
test_data = pd.read_table(test_data_path, header=None)
test_data.columns = col_name
# print(data.head(3)) 'avg_rider_done_ord_cnt'
number_feat = ['a','b']
for i in range(len(number_feat)):
train_data[[number_feat[i]]] = minmax_norm(train_data[[number_feat[i]]])
test_data[[number_feat[i]]] = minmax_norm(test_data[[number_feat[i]]])
报错:
Traceback (most recent call last):
File "/Users/alsc/.conda/envs/algoTest/lib/python3.6/site-packages/pandas/core/ops/array_ops.py", line 143, in na_arithmetic_op
result = expressions.evaluate(op, left, right)
File "/Users/alsc/.conda/envs/algoTest/lib/python3.6/site-packages/pandas/core/computation/expressions.py", line 233, in evaluate
return _evaluate(op, op_str, a, b) # type: ignore
File "/Users/alsc/.conda/envs/algoTest/lib/python3.6/site-packages/pandas/core/computation/expressions.py", line 68, in _evaluate_standard
return op(a, b)
TypeError: unsupported operand type(s) for -: 'str' and 'float'
排查:
这个错误的意思:类型有错误,不能将str和float类型的数据进行相减‘-’。
划重点,终于知道了【for -:】是什么意思,就是在减号处,出现了类型不匹配的问题,想修复问题,就去减号附近看看有没有涉及到不同类型计算的。
解决:
删除数据中的异常值。