1)通过提供axis = 1来执行w.r.t列的groupby.根据Per @ Boud的评论,你可以通过分组数组中的小调整得到你想要的结果:
df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')
根据这种情况进行分组:
np.arange(len(df.columns)) // 2
# array([0, 0, 1, 1, 2, 2], dtype=int32)
df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
df.columns = df.columns + 1
df.add_prefix('s')
时间限制:
对于横跨20列的100万行DF:
from string import ascii_lowercase
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 10, (10**6,20)), columns=list(ascii_lowercase[:20]))
df.shape
(1000000, 20)
def with_groupby(df):
return df.groupby((np.arange(len(df.columns)) // 2) + 1, axis=1).sum().add_prefix('s')
def with_reduceat(df):
df = pd.DataFrame(np.add.reduceat(df.values, np.arange(len(df.columns))[::2], axis=1))
df.columns = df.columns + 1
return df.add_prefix('s')
# test whether they give the same o/p
with_groupby(df).equals(with_groupby(df))
True
%timeit with_groupby(df.copy())
1 loop, best of 3: 1.11 s per loop
%timeit with_reduceat(df.copy()) # 3X faster)
1 loop, best of 3: 345 ms per loop