I need to fill NaN spaces with a calculation, that depends on the previous values in the dataframe = df. What I have so far is this:
df = pd.DataFrame({'a': [None] * 6, 'b': [2, 3, 10, 3, 5, 8]})
df["c"] =np.NaN
df["c"][0] = 1
df["c"][2] = 3
i = 1
while i<10:
df.c.fillna(df.c.shift(i)*df.b,inplace=True)
i+1
Unfortunately the solution with this while loop does not work and is certainly a very bad solution for pandas. So what I am looking for is kind of a
df.c.fillna(method='ffill'*df.b,inplace=True)
I know that also doesn't work, I just think that makes it clearer what I am looking for.
Before filling the dataframe it looks like this:
b c
0 2 1
1 3 NaN
2 10 3
3 3 NaN
4 5 NaN
5 8 NaN
The desired outcome should look like this:
b c
0 2 1 # nothing filled in since data is set from df["c"][0] = 1
1 3 3 # fill in previous c * b = 1 * 3 = 3
2 10 3 # nothing filled in since data is set from df["c"][2] = 3
3 3 9 # fill in previous c * b = 3 * 3 = 9
4 5 45 # fill in previous c * b = 9 * 5 = 45
5 8 360 # fill in previous c * b = 45 * 8 = 360
So basically: if there is no data availabe, it should be filled with a caculation.
解决方案
I can't figure out a way to do this in a single loop, the problem here is that you want some kind of rolling apply that can then look at the previous row, the problem here is that the previous row update will not be observable until the apply finishes so for instance the following works because we in run the apply 3 times. This isn't great IMO:
In [103]:
def func(x):
if pd.notnull(x['c']):
return x['c']
else:
return df.iloc[x.name - 1]['c'] * x['b']
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df['c'] = df.apply(func, axis =1)
df
Out[103]:
a b c
0 None 2 1
1 None 3 3
2 None 10 3
3 None 3 9
4 None 5 45
5 None 8 360