这里有一种方法可以达到这个结果。此函数使用shift、concat和apply将数据运行到一个函数中,该函数可以根据匹配的_index值执行prod/sum操作。在
代码:import itertools as it
def crazy_prod_sum_thing(frame):
# get the labels which do not end with _index
labels = [(l, l + '_index')
for l in frame.columns.values if not l.endswith('_index')]
def func(row):
# get row n and row n-1
front = row[:len(row) >> 1]
back = row[len(row) >> 1:]
# loop through the labels
results = []
for l, i in labels:
x = front[l].split(',')
y = back[l].split(',')
if front[i] == back[i]:
results.append(x[0] + y[0] + ',' + x[1] + x[1])
else:
results.append(
','.join([x1 + y1 for x1, y1 in it.product(x, y)]))
return pd.Series(results)
# take this function and apply it to pandas dataframe:
df = pd.concat([frame, frame.shift(1)], axis=1)[1:].apply(
func, axis=1)
df.rename(columns={i: x[0] + '_cpst' for i, x in enumerate(labels)},
inplace=True)
return pd.concat([frame, df], axis=1)
测试代码:
^{pr2}$
结果:alfa alfa_index beta beta_index delta delta_index
0 a,b 23 c,d 36 a,c 32
1 a,c 23 b,e 37 c,d 32
2 g,h 28 d,f 37 e,g 32
3 a,b 28 c,d 39 a,c 34
4 c,e 28 b,g 39 d,k 34
1 [aa,cc, bc,bd,ec,ed, ca,dd]
2 [ga,gc,ha,hc, db,ff, ec,gg]
3 [ag,bb, cd,cf,dd,df, ae,ag,ce,cg]
4 [ca,ee, bc,gg, da,kk]
注意:
这并没有像问题中所指出的那样将结果封送回数据帧,因为我不知道当索引值不匹配时如何获取它们。在