Python Pandas fillna在for循环中不起作用?(Python Pandas fillna doesn't work in for loop?)
给定如下设置:
import pandas as pd
import numpy as np
#Create random number dataframes
df1 = pd.DataFrame(np.random.rand(10,4))
df2 = pd.DataFrame(np.random.rand(10,4))
df3 = pd.DataFrame(np.random.rand(10,4))
#Create list of dataframes
data_frame_list = [df1, df2, df3]
#Introduce some NaN values
df1.iloc[4,3] = np.NaN
df2.iloc[1:4,2] = np.NaN
#Create loop to ffill any NaN values
for df in data_frame_list:
df = df.fillna(method='ffill')
这仍然留下df2(例如):
0 1 2 3
0 0.946601 0.492957 0.688421 0.582571
1 0.365173 0.507617 NaN 0.997909
2 0.185005 0.496989 NaN 0.962120
3 0.278633 0.515227 NaN 0.868952
4 0.346495 0.779571 0.376018 0.750900
5 0.384307 0.594381 0.741655 0.510144
6 0.499180 0.885632 0.13413 0.196010
7 0.245445 0.771402 0.371148 0.222618
8 0.564510 0.487644 0.121945 0.095932
9 0.401214 0.282698 0.0181196 0.689916
虽然个别代码行:
df2 = df2.fillna(method='ffill)
有用吗 我认为问题可能是由于我命名变量的方式所以我引入了global()[df],但这似乎也没有用。
想知道是否有可能在for循环中对整个数据帧进行填充,或者我在我的方法中某处出错?
Given a set up such as below:
import pandas as pd
import numpy as np
#Create random number dataframes
df1 = pd.DataFrame(np.random.rand(10,4))
df2 = pd.DataFrame(np.random.rand(10,4))
df3 = pd.DataFrame(np.random.rand(10,4))
#Create list of dataframes
data_frame_list = [df1, df2, df3]
#Introduce some NaN values
df1.iloc[4,3] = np.NaN
df2.iloc[1:4,2] = np.NaN
#Create loop to ffill any NaN values
for df in data_frame_list:
df = df.fillna(method='ffill')
This still leaves df2 (for example) as:
0 1 2 3
0 0.946601 0.492957 0.688421 0.582571
1 0.365173 0.507617 NaN 0.997909
2 0.185005 0.496989 NaN 0.962120
3 0.278633 0.515227 NaN 0.868952
4 0.346495 0.779571 0.376018 0.750900
5 0.384307 0.594381 0.741655 0.510144
6 0.499180 0.885632 0.13413 0.196010
7 0.245445 0.771402 0.371148 0.222618
8 0.564510 0.487644 0.121945 0.095932
9 0.401214 0.282698 0.0181196 0.689916
Although the individual line of code:
df2 = df2.fillna(method='ffill)
Does work. I thought the issue may be due to the way I was naming variables so I introduced global()[df], but this didn't seem to work either.
Wondering if it possible to do a ffill of an entire dataframe in a for loop, or am I going wrong somewhere in my approach?
原文:https://stackoverflow.com/questions/47035399
更新时间:2020-02-17 10:40
最满意答案
您只能在DataFrames列表中DataFrames ,因此不会使用ffill和参数ffill inplace=True更改df1 - df3 :
data_frame_list = [df1, df2, df3]
for df in data_frame_list:
df.ffill(inplace=True)
print (data_frame_list)
[ 0 1 2 3
0 0.506726 0.057531 0.627580 0.132553
1 0.131085 0.788544 0.506686 0.412826
2 0.578009 0.488174 0.335964 0.140816
3 0.891442 0.086312 0.847512 0.529616
4 0.550261 0.848461 0.158998 0.529616
5 0.817808 0.977898 0.933133 0.310414
6 0.481331 0.382784 0.874249 0.363505
7 0.384864 0.035155 0.634643 0.009076
8 0.197091 0.880822 0.002330 0.109501
9 0.623105 0.999237 0.567151 0.487938, 0 1 2 3
0 0.104856 0.525416 0.284066 0.658453
1 0.989523 0.644251 0.284066 0.141395
2 0.488099 0.167418 0.284066 0.097982
3 0.930415 0.486878 0.284066 0.192273
4 0.210032 0.244598 0.175200 0.367130
5 0.981763 0.285865 0.979590 0.924292
6 0.631067 0.119238 0.855842 0.782623
7 0.815908 0.575624 0.037598 0.532883
8 0.346577 0.329280 0.606794 0.825932
9 0.273021 0.503340 0.828568 0.429792, 0 1 2 3
0 0.491665 0.752531 0.780970 0.524148
1 0.635208 0.283928 0.821345 0.874243
2 0.454211 0.622611 0.267682 0.726456
3 0.379144 0.345580 0.694614 0.585782
4 0.844209 0.662073 0.590640 0.612480
5 0.258679 0.413567 0.797383 0.431819
6 0.034473 0.581294 0.282111 0.856725
7 0.352072 0.801542 0.862749 0.000285
8 0.793939 0.297286 0.441013 0.294635
9 0.841181 0.804839 0.311352 0.171094]
You can change only DataFrame in list of DataFrames, so df1 - df3 are not changed with ffill and parameter inplace=True:
data_frame_list = [df1, df2, df3]
for df in data_frame_list:
df.ffill(inplace=True)
print (data_frame_list)
[ 0 1 2 3
0 0.506726 0.057531 0.627580 0.132553
1 0.131085 0.788544 0.506686 0.412826
2 0.578009 0.488174 0.335964 0.140816
3 0.891442 0.086312 0.847512 0.529616
4 0.550261 0.848461 0.158998 0.529616
5 0.817808 0.977898 0.933133 0.310414
6 0.481331 0.382784 0.874249 0.363505
7 0.384864 0.035155 0.634643 0.009076
8 0.197091 0.880822 0.002330 0.109501
9 0.623105 0.999237 0.567151 0.487938, 0 1 2 3
0 0.104856 0.525416 0.284066 0.658453
1 0.989523 0.644251 0.284066 0.141395
2 0.488099 0.167418 0.284066 0.097982
3 0.930415 0.486878 0.284066 0.192273
4 0.210032 0.244598 0.175200 0.367130
5 0.981763 0.285865 0.979590 0.924292
6 0.631067 0.119238 0.855842 0.782623
7 0.815908 0.575624 0.037598 0.532883
8 0.346577 0.329280 0.606794 0.825932
9 0.273021 0.503340 0.828568 0.429792, 0 1 2 3
0 0.491665 0.752531 0.780970 0.524148
1 0.635208 0.283928 0.821345 0.874243
2 0.454211 0.622611 0.267682 0.726456
3 0.379144 0.345580 0.694614 0.585782
4 0.844209 0.662073 0.590640 0.612480
5 0.258679 0.413567 0.797383 0.431819
6 0.034473 0.581294 0.282111 0.856725
7 0.352072 0.801542 0.862749 0.000285
8 0.793939 0.297286 0.441013 0.294635
9 0.841181 0.804839 0.311352 0.171094]
2017-10-31
相关问答
dataframe()['code'].fillna('code') dataframe()['date'].fillna('date') 请看《利用Python进行数据分析》。
如果是上采样,您可以先使用resample + transform并聚合: #for testing 10Min
df = df.resample('10Min').transform('first')
print (df)
value_a value_b diff
index
2016-01-01 00:01:00 2.8 5.4 -2.6
2016-0
...
正如@thesilkworm建议的那样,首先将你的系列转换为数字。 下面是一个简单的例子: import pandas as pd, numpy as np
df = pd.DataFrame([[np.nan, np.nan, np.nan],
[5, 1, 2, 'hello'],
[1, 4, 3, 4],
[9, 8, 7, 6]], dtype=object)
df =
...
这是非常糟糕的,但迭代nulls的索引工作: In [11]: nulls = dfcolors[pd.isnull(dfcolors['Colors'])]
In [12]: for i, ni in enumerate(nulls.index[:len(dfalt)]):
dfcolors['Colors'].loc[ni] = dfalt['Alt'].iloc[i]
In [13]: dfcolors
Out[13]:
Colors
0 Blue
1
...
您只能在DataFrames列表中DataFrames ,因此不会使用ffill和参数ffill inplace=True更改df1 - df3 : data_frame_list = [df1, df2, df3]
for df in data_frame_list:
df.ffill(inplace=True)
print (data_frame_list)
[ 0 1 2 3
0 0.506726 0.0575
...
尝试data.fillna(value=0, inplace=True) try data.fillna(value=0, inplace=True)
您可以在df['att1']计算NaN ,减去1 ,然后将其用作fillna参数limits : import pandas as pd
import numpy as np
df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3] , columns=['att1'])
print df
att1
0 1
1 2
2 NaN
3 NaN
4 NaN
5 NaN
6 3
s = df['
...
如果您已经在强度为空的行中进行处理,我认为您甚至不需要fillna 。 df.loc[(df.Available) & (df.Intensity.isnull()), 'Intensity'] = 0
或者你可以做到 df.loc[df.Available, 'Intensity'] = df.loc[df.Available, 'Intensity'].fillna(0)
I think you don't even need fillna if you're already addres
...
对我而言: df['observations'] = df['observations'].fillna(0)
print (df)
values observations
time x1 x2 x3 x4 x1 x2 x3
t1 v1_1 NaN v3_1 v4_1 o1_1 0 o3_1 o4_1
t2 v1
...
最近有一个关于这个问题的讨论,它已在pandas master中修复: https : //github.com/pydata/pandas/issues/5703 (在0.13rc1发布之后,它将在最终的0.13中得到修复)。 注意:行为改变了! 这是pandas <= 0.12中不支持的行为,因为@ behzad.nouri指出(使用Series作为fillna输入)。 然而它确实有效,但显然是基于位置,这是错误的。 但只要两个系列(在你的情况下为df['sales']和df['net_pft
...