python提取英文单词怎么写_python – 提取两个句子之间不同的单词

最新推荐文章于 2023-04-02 08:25:30 发布

weixin_39714191

最新推荐文章于 2023-04-02 08:25:30 发布

阅读量405

点赞数

文章标签： python提取英文单词怎么写

我有一个非常大的数据框,有两列名为sentence1和sentence2.

我正在尝试使用两个句子之间不同的单词创建一个新列,例如：

sentence1=c("This is sentence one", "This is sentence two", "This is sentence three")

sentence2=c("This is the sentence four", "This is the sentence five", "This is the sentence six")

df = as.data.frame(cbind(sentence1,sentence2))

我的数据框架具有以下结构：

ID sentence1 sentence2

1 This is sentence one This is the sentence four

2 This is sentence two This is the sentence five

3 This is sentence three This is the sentence six

我的预期结果是：

ID sentence1 sentence2 Expected_Result

1 This is ... This is ... one the four

2 This is ... This is ... two the five

3 This is ... This is ... three the six

在R中我试图分割句子,并在得到列表之间不同的元素后,例如：

df$split_Sentence1<-strsplit(df$sentence1, split=" ")

df$split_Sentence2<-strsplit(df$sentence2, split=" ")

df$Dif<-setdiff(df$split_Sentence1, df$split_Sentence2)

但是这种方法在应用setdiff时不起作用……

在Python中,我试图应用NLTK,尝试首先获取令牌,然后提取两个列表之间的差异,如：

from nltk.tokenize import word_tokenize

df['tokensS1'] = df.sentence1.apply(lambda x: word_tokenize(x))

df['tokensS2'] = df.sentence2.apply(lambda x: word_tokenize(x))

在这一点上,我没有找到一个功能,给我我需要的结果..

我希望你能帮助我.谢谢

最佳答案这是一个R解决方案.

我创建了一个exclusiveWords函数,用于查找两个集合之间的唯一单词,并返回由这些单词组成的“句子”.我将它包装在Vectorize()中,以便它可以同时处理data.frame的所有行.

df = as.data.frame(cbind(sentence1,sentence2), stringsAsFactors = F)

exclusiveWords <- function(x, y){

x <- strsplit(x, " ")[[1]]

y <- strsplit(y, " ")[[1]]

u <- union(x, y)

u <- union(setdiff(u, x), setdiff(u, y))

return(paste0(u, collapse = " "))

}

exclusiveWords <- Vectorize(exclusiveWords)

df$result <- exclusiveWords(df$sentence1, df$sentence2)

df

# sentence1 sentence2 result

# 1 This is sentence one This is the sentence four the four one

# 2 This is sentence two This is the sentence five the five two

# 3 This is sentence three This is the sentence six the six three

weixin_39714191

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。