python pandas for循环_Python Pandas fillna在for循环中不起作用?(Python Pandas fillna doesn't work in for loop?)...

Python Pandas fillna在for循环中不起作用?(Python Pandas fillna doesn't work in for loop?)

给定如下设置:

import pandas as pd

import numpy as np

#Create random number dataframes

df1 = pd.DataFrame(np.random.rand(10,4))

df2 = pd.DataFrame(np.random.rand(10,4))

df3 = pd.DataFrame(np.random.rand(10,4))

#Create list of dataframes

data_frame_list = [df1, df2, df3]

#Introduce some NaN values

df1.iloc[4,3] = np.NaN

df2.iloc[1:4,2] = np.NaN

#Create loop to ffill any NaN values

for df in data_frame_list:

df = df.fillna(method='ffill')

这仍然留下df2(例如):

0 1 2 3

0 0.946601 0.492957 0.688421 0.582571

1 0.365173 0.507617 NaN 0.997909

2 0.185005 0.496989 NaN 0.962120

3 0.278633 0.515227 NaN 0.868952

4 0.346495 0.779571 0.376018 0.750900

5 0.384307 0.594381 0.741655 0.510144

6 0.499180 0.885632 0.13413 0.196010

7 0.245445 0.771402 0.371148 0.222618

8 0.564510 0.487644 0.121945 0.095932

9 0.401214 0.282698 0.0181196 0.689916

虽然个别代码行:

df2 = df2.fillna(method='ffill)

有用吗 我认为问题可能是由于我命名变量的方式所以我引入了global()[df],但这似乎也没有用。

想知道是否有可能在for循环中对整个数据帧进行填充,或者我在我的方法中某处出错?

Given a set up such as below:

import pandas as pd

import numpy as np

#Create random number dataframes

df1 = pd.DataFrame(np.random.rand(10,4))

df2 = pd.DataFrame(np.random.rand(10,4))

df3 = pd.DataFrame(np.random.rand(10,4))

#Create list of dataframes

data_frame_list = [df1, df2, df3]

#Introduce some NaN values

df1.iloc[4,3] = np.NaN

df2.iloc[1:4,2] = np.NaN

#Create loop to ffill any NaN values

for df in data_frame_list:

df = df.fillna(method='ffill')

This still leaves df2 (for example) as:

0 1 2 3

0 0.946601 0.492957 0.688421 0.582571

1 0.365173 0.507617 NaN 0.997909

2 0.185005 0.496989 NaN 0.962120

3 0.278633 0.515227 NaN 0.868952

4 0.346495 0.779571 0.376018 0.750900

5 0.384307 0.594381 0.741655 0.510144

6 0.499180 0.885632 0.13413 0.196010

7 0.245445 0.771402 0.371148 0.222618

8 0.564510 0.487644 0.121945 0.095932

9 0.401214 0.282698 0.0181196 0.689916

Although the individual line of code:

df2 = df2.fillna(method='ffill)

Does work. I thought the issue may be due to the way I was naming variables so I introduced global()[df], but this didn't seem to work either.

Wondering if it possible to do a ffill of an entire dataframe in a for loop, or am I going wrong somewhere in my approach?

原文:https://stackoverflow.com/questions/47035399

更新时间:2020-02-17 10:40

最满意答案

您只能在DataFrames列表中DataFrames ,因此不会使用ffill和参数ffill inplace=True更改df1 - df3 :

data_frame_list = [df1, df2, df3]

for df in data_frame_list:

df.ffill(inplace=True)

print (data_frame_list)

[ 0 1 2 3

0 0.506726 0.057531 0.627580 0.132553

1 0.131085 0.788544 0.506686 0.412826

2 0.578009 0.488174 0.335964 0.140816

3 0.891442 0.086312 0.847512 0.529616

4 0.550261 0.848461 0.158998 0.529616

5 0.817808 0.977898 0.933133 0.310414

6 0.481331 0.382784 0.874249 0.363505

7 0.384864 0.035155 0.634643 0.009076

8 0.197091 0.880822 0.002330 0.109501

9 0.623105 0.999237 0.567151 0.487938, 0 1 2 3

0 0.104856 0.525416 0.284066 0.658453

1 0.989523 0.644251 0.284066 0.141395

2 0.488099 0.167418 0.284066 0.097982

3 0.930415 0.486878 0.284066 0.192273

4 0.210032 0.244598 0.175200 0.367130

5 0.981763 0.285865 0.979590 0.924292

6 0.631067 0.119238 0.855842 0.782623

7 0.815908 0.575624 0.037598 0.532883

8 0.346577 0.329280 0.606794 0.825932

9 0.273021 0.503340 0.828568 0.429792, 0 1 2 3

0 0.491665 0.752531 0.780970 0.524148

1 0.635208 0.283928 0.821345 0.874243

2 0.454211 0.622611 0.267682 0.726456

3 0.379144 0.345580 0.694614 0.585782

4 0.844209 0.662073 0.590640 0.612480

5 0.258679 0.413567 0.797383 0.431819

6 0.034473 0.581294 0.282111 0.856725

7 0.352072 0.801542 0.862749 0.000285

8 0.793939 0.297286 0.441013 0.294635

9 0.841181 0.804839 0.311352 0.171094]

You can change only DataFrame in list of DataFrames, so df1 - df3 are not changed with ffill and parameter inplace=True:

data_frame_list = [df1, df2, df3]

for df in data_frame_list:

df.ffill(inplace=True)

print (data_frame_list)

[ 0 1 2 3

0 0.506726 0.057531 0.627580 0.132553

1 0.131085 0.788544 0.506686 0.412826

2 0.578009 0.488174 0.335964 0.140816

3 0.891442 0.086312 0.847512 0.529616

4 0.550261 0.848461 0.158998 0.529616

5 0.817808 0.977898 0.933133 0.310414

6 0.481331 0.382784 0.874249 0.363505

7 0.384864 0.035155 0.634643 0.009076

8 0.197091 0.880822 0.002330 0.109501

9 0.623105 0.999237 0.567151 0.487938, 0 1 2 3

0 0.104856 0.525416 0.284066 0.658453

1 0.989523 0.644251 0.284066 0.141395

2 0.488099 0.167418 0.284066 0.097982

3 0.930415 0.486878 0.284066 0.192273

4 0.210032 0.244598 0.175200 0.367130

5 0.981763 0.285865 0.979590 0.924292

6 0.631067 0.119238 0.855842 0.782623

7 0.815908 0.575624 0.037598 0.532883

8 0.346577 0.329280 0.606794 0.825932

9 0.273021 0.503340 0.828568 0.429792, 0 1 2 3

0 0.491665 0.752531 0.780970 0.524148

1 0.635208 0.283928 0.821345 0.874243

2 0.454211 0.622611 0.267682 0.726456

3 0.379144 0.345580 0.694614 0.585782

4 0.844209 0.662073 0.590640 0.612480

5 0.258679 0.413567 0.797383 0.431819

6 0.034473 0.581294 0.282111 0.856725

7 0.352072 0.801542 0.862749 0.000285

8 0.793939 0.297286 0.441013 0.294635

9 0.841181 0.804839 0.311352 0.171094]

2017-10-31

相关问答

dataframe()['code'].fillna('code') dataframe()['date'].fillna('date') 请看《利用Python进行数据分析》。

如果是上采样,您可以先使用resample + transform并聚合: #for testing 10Min

df = df.resample('10Min').transform('first')

print (df)

value_a value_b diff

index

2016-01-01 00:01:00 2.8 5.4 -2.6

2016-0

...

正如@thesilkworm建议的那样,首先将你的系列转换为数字。 下面是一个简单的例子: import pandas as pd, numpy as np

df = pd.DataFrame([[np.nan, np.nan, np.nan],

[5, 1, 2, 'hello'],

[1, 4, 3, 4],

[9, 8, 7, 6]], dtype=object)

df =

...

这是非常糟糕的,但迭代nulls的索引工作: In [11]: nulls = dfcolors[pd.isnull(dfcolors['Colors'])]

In [12]: for i, ni in enumerate(nulls.index[:len(dfalt)]):

dfcolors['Colors'].loc[ni] = dfalt['Alt'].iloc[i]

In [13]: dfcolors

Out[13]:

Colors

0 Blue

1

...

您只能在DataFrames列表中DataFrames ,因此不会使用ffill和参数ffill inplace=True更改df1 - df3 : data_frame_list = [df1, df2, df3]

for df in data_frame_list:

df.ffill(inplace=True)

print (data_frame_list)

[ 0 1 2 3

0 0.506726 0.0575

...

尝试data.fillna(value=0, inplace=True) try data.fillna(value=0, inplace=True)

您可以在df['att1']计算NaN ,减去1 ,然后将其用作fillna参数limits : import pandas as pd

import numpy as np

df = pd.DataFrame([1, 2, np.nan, np.nan, np.nan, np.nan, 3] , columns=['att1'])

print df

att1

0 1

1 2

2 NaN

3 NaN

4 NaN

5 NaN

6 3

s = df['

...

如果您已经在强度为空的行中进行处理,我认为您甚至不需要fillna 。 df.loc[(df.Available) & (df.Intensity.isnull()), 'Intensity'] = 0

或者你可以做到 df.loc[df.Available, 'Intensity'] = df.loc[df.Available, 'Intensity'].fillna(0)

I think you don't even need fillna if you're already addres

...

对我而言: df['observations'] = df['observations'].fillna(0)

print (df)

values observations

time x1 x2 x3 x4 x1 x2 x3

t1 v1_1 NaN v3_1 v4_1 o1_1 0 o3_1 o4_1

t2 v1

...

最近有一个关于这个问题的讨论,它已在pandas master中修复: https : //github.com/pydata/pandas/issues/5703 (在0.13rc1发布之后,它将在最终的0.13中得到修复)。 注意:行为改变了! 这是pandas <= 0.12中不支持的行为,因为@ behzad.nouri指出(使用Series作为fillna输入)。 然而它确实有效,但显然是基于位置,这是错误的。 但只要两个系列(在你的情况下为df['sales']和df['net_pft

...

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值