如何在熊猫数据框的列中将所有NaN值替换为零

本文翻译自:How can I replace all the NaN values with Zero's in a column of a pandas dataframe

I have a dataframe as below 我有一个数据框如下

      itm Date                  Amount 
67    420 2012-09-30 00:00:00   65211
68    421 2012-09-09 00:00:00   29424
69    421 2012-09-16 00:00:00   29877
70    421 2012-09-23 00:00:00   30990
71    421 2012-09-30 00:00:00   61303
72    485 2012-09-09 00:00:00   71781
73    485 2012-09-16 00:00:00     NaN
74    485 2012-09-23 00:00:00   11072
75    485 2012-09-30 00:00:00  113702
76    489 2012-09-09 00:00:00   64731
77    489 2012-09-16 00:00:00     NaN

when I try to .apply a function to the Amount column I get the following error. 当我尝试将一个函数应用于“金额”列时,出现以下错误。

ValueError: cannot convert float NaN to integer

I have tried applying a function using .isnan from the Math Module I have tried the pandas .replace attribute I tried the .sparse data attribute from pandas 0.9 I have also tried if NaN == NaN statement in a function. 我已经尝试过使用数学模块中的.isnan来应用函数。我已经尝试过pandas .replace属性。我已经尝试过pandas 0.9的.sparse data属性。我还尝试过在函数中使用NaN == NaN语句。 I have also looked at this article How do I replace NA values with zeros in an R dataframe? 我还看了这篇文章如何在R数据帧中用零替换NA值? whilst looking at some other articles. 同时查看其他文章。 All the methods I have tried have not worked or do not recognise NaN. 我尝试过的所有方法均无效或无法识别NaN。 Any Hints or solutions would be appreciated. 任何提示或解决方案将不胜感激。


#1楼

参考:https://stackoom.com/question/tmpL/如何在熊猫数据框的列中将所有NaN值替换为零


#2楼

I believe DataFrame.fillna() will do this for you. 我相信DataFrame.fillna()会为您做到这一点。

Link to Docs for a dataframe and for a Series . 链接到文档以获取数据框系列

Example: 例:

In [7]: df
Out[7]: 
          0         1
0       NaN       NaN
1 -0.494375  0.570994
2       NaN       NaN
3  1.876360 -0.229738
4       NaN       NaN

In [8]: df.fillna(0)
Out[8]: 
          0         1
0  0.000000  0.000000
1 -0.494375  0.570994
2  0.000000  0.000000
3  1.876360 -0.229738
4  0.000000  0.000000

To fill the NaNs in only one column, select just that column. 要仅将NaN填入一列,请仅选择该列。 in this case I'm using inplace=True to actually change the contents of df. 在这种情况下,我使用inplace = True实际更改df的内容。

In [12]: df[1].fillna(0, inplace=True)
Out[12]: 
0    0.000000
1    0.570994
2    0.000000
3   -0.229738
4    0.000000
Name: 1

In [13]: df
Out[13]: 
          0         1
0       NaN  0.000000
1 -0.494375  0.570994
2       NaN  0.000000
3  1.876360 -0.229738
4       NaN  0.000000

#3楼

I just wanted to provide a bit of an update/special case since it looks like people still come here. 我只想提供一些更新/特殊案例,因为看起来人们仍然来这里。 If you're using a multi-index or otherwise using an index-slicer the inplace=True option may not be enough to update the slice you've chosen. 如果您正在使用多索引或以其他方式使用索引切片器,则inplace = True选项可能不足以更新您选择的切片。 For example in a 2x2 level multi-index this will not change any values (as of pandas 0.15): 例如,在2x2级多索引中,这不会更改任何值(从熊猫0.15开始):

idx = pd.IndexSlice
df.loc[idx[:,mask_1],idx[mask_2,:]].fillna(value=0,inplace=True)

The "problem" is that the chaining breaks the fillna ability to update the original dataframe. “问题”是链接中断了fillna更新原始数据帧的能力。 I put "problem" in quotes because there are good reasons for the design decisions that led to not interpreting through these chains in certain situations. 我将“问题”用引号引起来,因为设计决策有充分的理由导致在某些情况下无法通过这些链条进行解释。 Also, this is a complex example (though I really ran into it), but the same may apply to fewer levels of indexes depending on how you slice. 另外,这是一个复杂的示例(尽管我确实遇到过),但是根据切片的方式,同样的情况可能适用于较少级别的索引。

The solution is DataFrame.update: 解决方案是DataFrame.update:

df.update(df.loc[idx[:,mask_1],idx[[mask_2],:]].fillna(value=0))

It's one line, reads reasonably well (sort of) and eliminates any unnecessary messing with intermediate variables or loops while allowing you to apply fillna to any multi-level slice you like! 它是一行,读起来相当好(某种),并消除了中间变量或循环的不必要混乱,同时允许您将fillna应用于您喜欢的任何多层次切片!

If anybody can find places this doesn't work please post in the comments, I've been messing with it and looking at the source and it seems to solve at least my multi-index slice problems. 如果有人可以找到行不通的地方,请在评论中发帖,我一直在弄乱它并查看源代码,它似乎至少解决了我的多索引切片问题。


#4楼

The below code worked for me. 下面的代码为我工作。

import pandas

df = pandas.read_csv('somefile.txt')

df = df.fillna(0)

#5楼

It is not guaranteed that the slicing returns a view or a copy. 不能保证切片会返回视图或副本。 You can do 你可以做

df['column'] = df['column'].fillna(value)

#6楼

You could use replace to change NaN to 0 : 您可以使用replaceNaN更改为0

import pandas as pd
import numpy as np

# for column
df['column'] = df['column'].replace(np.nan, 0)

# for whole dataframe
df = df.replace(np.nan, 0)

# inplace
df.replace(np.nan, 0, inplace=True)
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值