I have a 16x10 panda dataframe with 1x35000 arrays (or NaN) in each cell. I want to take the element-wise mean over rows for each column.
1 2 3 ... 10
1 1x35000 1x35000 1x35000 1x35000
2 1x35000 NaN 1x35000 1x35000
3 1x35000 NaN 1x35000 NaN
...
16 1x35000 1x35000 NaN 1x35000
To avoid misunderstandings: take the first element of each array in the first column and take the mean. Then take the second element of each array in the first column and take the mean again. In the end I want to have a 1x10 dataframe with one 1x35000 array each per column. The array should be the element-wise mean of my former arrays.
1 2 3 ... 10
1 1x35000 1x35000 1x35000 1x35000
Do you have an idea to get there elegantly preferably without for-loops?
解决方案
Setup
np.random.seed([3,14159])
df = pd.DataFrame(
np.random.randint(10, size=(3, 3, 5)).tolist(),
list('XYZ'), list('ABC')
).applymap(np.array)
df.loc['X', 'B'] = np.nan
df.loc['Z', 'A'] = np.nan
df
A B C
X [4, 8, 1, 1, 9] NaN [8, 2, 8, 4, 9]
Y [4, 3, 4, 1, 5] [1, 2, 6, 2, 7] [7, 1, 1, 7, 8]
Z NaN [9, 3, 8, 7, 7] [2, 6, 3, 1, 9]
Solution
g = df.stack().groupby(level=1)
g.apply(np.sum, axis=0) / g.size()
A [4.0, 5.5, 2.5, 1.0, 7.0]
B [5.0, 2.5, 7.0, 4.5, 7.0]
C [5.66666666667, 3.0, 4.0, 4.0, 8.66666666667]
dtype: object
If you insist on the shape you presented
g = df.stack().groupby(level=1)
(g.apply(np.sum, axis=0) / g.size()).to_frame().T
A B C
0 [4.0, 5.5, 2.5, 1.0, 7.0] [5.0, 2.5, 7.0, 4.5, 7.0] [5.66666666667, 3.0, 4.0, 4.0, 8.66666666667]
这篇博客介绍如何在Python中对含有1x35000数组的Pandas DataFrame进行操作,计算每列数组的按元素均值,生成一个新的1x10 DataFrame。示例展示了使用stack、groupby和apply函数实现这一目标,避免了使用for循环。
1万+

被折叠的 条评论
为什么被折叠?



