python pandas for循环_python – 熊猫:如何在DataFrame中使用Pandas(不是用于循环)逐行列出列表列表?...

数据帧

df = pd.DataFrame({'A': [['gener'], ['gener'], ['system'], ['system'], ['gutter'], ['gutter'], ['gutter'], ['gutter'], ['gutter'], ['gutter'], ['aluminum'], ['aluminum'], ['aluminum'], ['aluminum'], ['aluminum'], ['aluminum'], ['aluminum'], ['aluminum'], ['aluminum'], ['aluminum', 'toledo']], 'B': [['gutter'], ['gutter'], ['gutter', 'system'], ['gutter', 'guard', 'system'], ['ohio', 'gutter'], ['gutter', 'toledo'], ['toledo', 'gutter'], ['gutter'], ['gutter'], ['gutter'], ['how', 'to', 'instal', 'aluminum', 'gutter'], ['aluminum', 'gutter'], ['aluminum', 'gutter', 'color'], ['aluminum', 'gutter'], ['aluminum', 'gutter', 'adrian', 'ohio'], ['aluminum', 'gutter', 'bowl', 'green', 'ohio'], ['aluminum', 'gutter', 'maume', 'ohio'], ['aluminum', 'gutter', 'perrysburg', 'ohio'], ['aluminum', 'gutter', 'tecumseh', 'ohio'], ['aluminum', 'gutter', 'toledo', 'ohio']]}, columns=['A', 'B'])

它看起来像什么

我有一个包含两列列表的数据框.

A B

0 [gener] [gutter]

1 [gener] [gutter]

2 [system] [gutter, system]

3 [system] [gutter, guard, system]

4 [gutter] [ohio, gutter]

5 [gutter] [gutter, toledo]

6 [gutter] [toledo, gutter]

7 [gutter] [gutter]

8 [gutter] [gutter]

9 [gutter] [gutter]

10 [aluminum] [how, to, instal, aluminum, gutter]

11 [aluminum] [aluminum, gutter]

12 [aluminum] [aluminum, gutter, color]

13 [aluminum] [aluminum, gutter]

14 [aluminum] [aluminum, gutter, adrian, ohio]

15 [aluminum] [aluminum, gutter, bowl, green, ohio]

16 [aluminum] [aluminum, gutter, maume, ohio]

17 [aluminum] [aluminum, gutter, perrysburg, ohio]

18 [aluminum] [aluminum, gutter, tecumseh, ohio]

19 [aluminum, toledo] [aluminum, gutter, toledo, ohio]

如果我有列的列,是否有一个pandas函数,让我操作整个列表数组来检查交集并返回一个布尔值或交叉值作为一个新的系列?

例如,我想让熊猫拥有相同的东西:

def intersection(df, col1, col2, return_type='boolean'):

if return_type == 'boolean':

df = df[[col1, col2]]

s = []

for idx in df.iterrows():

s.append(any([phrase in idx[1][0] for phrase in idx[1][1]]))

S = pd.Series(s)

return S

elif return_type == 'word':

df = df[[col1, col2]]

s = []

for idx in df.iterrows():

s.append(', '.join([word for word in list(set(idx[1][0]).intersection(set(idx[1][1])))]))

S = pd.Series(s)

return S

#Create column C in df

df['C'] = intersection(df, 'A', 'B', 'word')

…无需编写自己的函数或求助于循环.我觉得必须有一种更简单的方法来比较同一行中两列中的列表,看它们是否相交.

我可以用for循环来做,但这对我来说很难看

for循环返回一个布尔系列:

for idx in df.iterrows():

any([phrase in idx[1][0] for phrase in idx[1][1]])

生产:

False

False

True

True

True

True

True

True

True

True

True

True

True

True

True

True

True

True

True

True

或者,使用集合查找相交的单词:

for idx in df.iterrows():

', '.join([word for word in list(set(idx[1][0]).intersection(set(idx[1][1])))])

''

''

'system'

'system'

'gutter'

'gutter'

'gutter'

'gutter'

'gutter'

'gutter'

'aluminum'

'aluminum'

'aluminum'

'aluminum'

'aluminum'

'aluminum'

'aluminum'

'aluminum'

'aluminum'

'toledo, aluminum'

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值