用python提取不同的两列数据,python – 提取两个句子之间不同的单词

最新推荐文章于 2023-06-21 02:59:24 发布

肥魔

最新推荐文章于 2023-06-21 02:59:24 发布

阅读量931

点赞数

文章标签：用python提取不同的两列数据

我有一个非常大的数据框,有两列名为sentence1和sentence2.

我正在尝试使用两个句子之间不同的单词创建一个新列,例如：

sentence1=c("This is sentence one", "This is sentence two", "This is sentence three")

sentence2=c("This is the sentence four", "This is the sentence five", "This is the sentence six")

df = as.data.frame(cbind(sentence1,sentence2))

我的数据框架具有以下结构：

ID sentence1 sentence2

1 This is sentence one This is the sentence four

2 This is sentence two This is the sentence five

3 This is sentence three This is the sentence six

我的预期结果是：

ID sentence1 sentence2 Expected_Result

1 This is ... This is ... one the four

2 This is ... This is ... two the five

3 This is ... This is ... three the six

在R中我试图分割句子,并在得到列表之间不同的元素后,例如：

df$split_Sentence1

df$split_Sentence2

df$Dif

但是这种方法在应用setdiff时不起作用……

在Python中,我试图应用NLTK,尝试首先获取令牌,然后提取两个列表之间的差异,如：

from nltk.tokenize import word_tokenize

df['tokensS1'] = df.sentence1.apply(lambda x: word_tokenize(x))

df['tokensS2'] = df.sentence2.apply(lambda x: word_tokenize(x))

在这一点上,我没有找到一个功能,给我我需要的结果..

我希望你能帮助我.谢谢

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
用python提取不同的两列数据,python – 提取两个句子之间不同的单词

我有一个非常大的数据框,有两列名为sentence1和sentence2.我正在尝试使用两个句子之间不同的单词创建一个新列,例如：sentence1=c("This is sentence one", "This is sentence two", "This is sentence three")sentence2=c("This is the sentence four", "This is ...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。