python生成相似句子_有没有一种简单的方法可以从python中的无间隔句子生成单词的可能列表？...

weixin_39958559

于 2020-12-05 08:39:55 发布

阅读量76

点赞数

文章标签： python生成相似句子

I have some text:

s="Imageclassificationmethodscan beroughlydividedinto two broad families of approaches:"

I'd like to parse this into its individual words. I quickly looked into the enchant and nltk, but didn't see anything that looked immediately useful. If I had time to invest in this, I'd look into writing a dynamic program with enchant's ability to check if a word was english or not. I would have thought there'd be something to do this online, am I wrong?

解决方案

Greedy approach using trie

Try this using Biopython (pip install biopython):

from Bio import trie

import string

def get_trie(dictfile='/usr/share/dict/american-english'):

tr = trie.trie()

with open(dictfile) as f:

for line in f:

word = line.rstrip()

try:

word = word.encode(encoding='ascii', errors='ignore')

tr[word] = len(word)

assert tr.has_key(word), "Missing %s" % word

except UnicodeDecodeError:

pass

return tr

def get_trie_word(tr, s):

for end in reversed(range(len(s))):

word = s[:end + 1]

if tr.has_key(word):

return word, s[end + 1: ]

return None, s

def main(s):

tr = get_trie()

while s:

word, s = get_trie_word(tr, s)

print word

if __name__ == '__main__':

s = "Imageclassificationmethodscan beroughlydividedinto two broad families of approaches:"

s = s.strip(string.punctuation)

s = s.replace(" ", '')

s = s.lower()

main(s)

Results

>>> if __name__ == '__main__':

... s = "Imageclassificationmethodscan beroughlydividedinto two broad families of approaches:"

... s = s.strip(string.punctuation)

... s = s.replace(" ", '')

... s = s.lower()

... main(s)

...

image

classification

methods

can

be

roughly

divided

into

two

broad

families

of

approaches

Caveats

There are degenerate cases in English that this will not work for. You need to use backtracking to deal with those, but this should get you started.

Obligatory test

>>> main("expertsexchange")

experts

exchange

weixin_39958559

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python生成相似句子_有没有一种简单的方法可以从python中的无间隔句子生成单词的可能列表？...

I have some text:s="Imageclassificationmethodscan beroughlydividedinto two broad families of approaches:"I'd like to parse this into its individual words. I quickly looked into the enchant and nltk, b...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。