本文翻译自:How can I replace all the NaN values with Zero's in a column of a pandas dataframe
I have a dataframe as below 我有一个数据框如下
itm Date Amount
67 420 2012-09-30 00:00:00 65211
68 421 2012-09-09 00:00:00 29424
69 421 2012-09-16 00:00:00 29877
70 421 2012-09-23 00:00:00 30990
71 421 2012-09-30 00:00:00 61303
72 485 2012-09-09 00:00:00 71781
73 485 2012-09-16 00:00:00 NaN
74 485 2012-09-23 00:00:00 11072
75 485 2012-09-30 00:00:00 113702
76 489 2012-09-09 00:00:00 64731
77 489 2012-09-16 00:00:00 NaN
when I try to .apply a function to the Amount column I get the following error. 当我尝试将一个函数应用于“金额”列时,出现以下错误。
ValueError: cannot convert float NaN to integer
I have tried applying a function using .isnan from the Math Module I have tried the pandas .replace attribute I tried the .sparse data attribute from pandas 0.9 I have also tried if NaN == NaN statement in a function. 我已经尝试过使用数学模块中的.isnan来应用函数。我已经尝试过pandas .replace属性。我已经尝试过pandas 0.9的.sparse data属性。我还尝试过在函数中使用NaN == NaN语句。 I have also looked at this article How do I replace NA values with zeros in an R dataframe? 我还看了这篇文章如何在R数据帧中用零替换NA值? whilst looking at some other articles. 同时查看其他文章。 All the methods I have tried have not worked or do not recognise NaN. 我尝试过的所有方法均无效或无法识别NaN。 Any Hints or solutions would be appreciated. 任何提示或解决方案将不胜感激。
#1楼
参考:https://stackoom.com/question/tmpL/如何在熊猫数据框的列中将所有NaN值替换为零
#2楼
I believe DataFrame.fillna()
will do this for you. 我相信DataFrame.fillna()
会为您做到这一点。
Link to Docs for a dataframe and for a Series . 链接到文档以获取数据框和系列 。
Example: 例:
In [7]: df
Out[7]:
0 1
0 NaN NaN
1 -0.494375 0.570994
2 NaN NaN
3 1.876360 -0.229738
4 NaN NaN
In [8]: df.fillna(0)
Out[8]:
0 1
0 0.000000 0.000000
1 -0.494375 0.570994
2 0.000000 0.000000
3 1.876360 -0.229738
4 0.000000 0.000000
To fill the NaNs in only one column, select just that column. 要仅将NaN填入一列,请仅选择该列。 in this case I'm using inplace=True to actually change the contents of df. 在这种情况下,我使用inplace = True实际更改df的内容。
In [12]: df[1].fillna(0, inplace=True)
Out[12]:
0 0.000000
1 0.570994
2 0.000000
3 -0.229738
4 0.000000
Name: 1
In [13]: df
Out[13]:
0 1
0 NaN 0.000000
1 -0.494375 0.570994
2 NaN 0.000000
3 1.876360 -0.229738
4 NaN 0.000000
#3楼
I just wanted to provide a bit of an update/special case since it looks like people still come here. 我只想提供一些更新/特殊案例,因为看起来人们仍然来这里。 If you're using a multi-index or otherwise using an index-slicer the inplace=True option may not be enough to update the slice you've chosen. 如果您正在使用多索引或以其他方式使用索引切片器,则inplace = True选项可能不足以更新您选择的切片。 For example in a 2x2 level multi-index this will not change any values (as of pandas 0.15): 例如,在2x2级多索引中,这不会更改任何值(从熊猫0.15开始):
idx = pd.IndexSlice
df.loc[idx[:,mask_1],idx[mask_2,:]].fillna(value=0,inplace=True)
The "problem" is that the chaining breaks the fillna ability to update the original dataframe. “问题”是链接中断了fillna更新原始数据帧的能力。 I put "problem" in quotes because there are good reasons for the design decisions that led to not interpreting through these chains in certain situations. 我将“问题”用引号引起来,因为设计决策有充分的理由导致在某些情况下无法通过这些链条进行解释。 Also, this is a complex example (though I really ran into it), but the same may apply to fewer levels of indexes depending on how you slice. 另外,这是一个复杂的示例(尽管我确实遇到过),但是根据切片的方式,同样的情况可能适用于较少级别的索引。
The solution is DataFrame.update: 解决方案是DataFrame.update:
df.update(df.loc[idx[:,mask_1],idx[[mask_2],:]].fillna(value=0))
It's one line, reads reasonably well (sort of) and eliminates any unnecessary messing with intermediate variables or loops while allowing you to apply fillna to any multi-level slice you like! 它是一行,读起来相当好(某种),并消除了中间变量或循环的不必要混乱,同时允许您将fillna应用于您喜欢的任何多层次切片!
If anybody can find places this doesn't work please post in the comments, I've been messing with it and looking at the source and it seems to solve at least my multi-index slice problems. 如果有人可以找到行不通的地方,请在评论中发帖,我一直在弄乱它并查看源代码,它似乎至少解决了我的多索引切片问题。
#4楼
The below code worked for me. 下面的代码为我工作。
import pandas
df = pandas.read_csv('somefile.txt')
df = df.fillna(0)
#5楼
It is not guaranteed that the slicing returns a view or a copy. 不能保证切片会返回视图或副本。 You can do 你可以做
df['column'] = df['column'].fillna(value)
#6楼
You could use replace
to change NaN
to 0
: 您可以使用replace
将NaN
更改为0
:
import pandas as pd
import numpy as np
# for column
df['column'] = df['column'].replace(np.nan, 0)
# for whole dataframe
df = df.replace(np.nan, 0)
# inplace
df.replace(np.nan, 0, inplace=True)