我有一个包含句子的数据框和一个按主题分组的术语词典,我想在其中计算每个主题匹配的术语数。在import pandas as pd
terms = {'animals':["fox","deer","eagle"],
'people':['John', 'Rob','Steve'],
'games':['basketball', 'football', 'hockey']
}
df=pd.DataFrame({
'Score': [4,6,2,7,8],
'Foo': ['The quick brown fox was playing basketball today','John and Rob visited the eagles nest, the foxes ran away','Bill smells like a wet dog','Steve threw the football at a deer. But the football missed','Sheriff John does not like hockey']
})
到目前为止,我已经为主题创建了列,并通过遍历字典将其标记为1。在
^{pr2}$
我得到:>>>
Foo Score animals games \
0 The quick brown fox was playing basketball today 4 1 1
1 John and Rob visited the eagles nest, the foxe... 6 1 NaN
2 Bill smells like a wet dog 2 NaN NaN
3 Steve threw the football at a deer. But the fo... 7 1 1
4 Sheriff John does not like hockey 8 NaN 1
people
0 NaN
1 1
2 NaN
3 1
4 1
计算句子中出现的每个主题的字数的最佳方法是什么?还有没有一种更有效的方法可以不使用cython来遍历字典?在