python tokenize_model_Python models.Phrases方法代碼示例

# 需要導入模塊: from gensim import models [as 別名]

# 或者: from gensim.models import Phrases [as 別名]

def tokenize(self, docs):

if self.lemmatize:

lem = WordNetLemmatizer()

#print('RAKE tokenizing...')

pre_tdocs = RAKETokenizer(n_jobs=self.n_jobs).tokenize(docs)

for i, tdoc in enumerate(pre_tdocs):

for t in tdoc:

if t.startswith('one'):

print(t)

print(i)

#print('Additional Tokenizing docs...')

if self.n_jobs == 1:

tdocs = [pre_tokenize(doc, tdoc, lem=lem) for doc, tdoc in zip(docs, pre_tdocs)]

else:

tdocs = parallel(partial(pre_tokenize, lem=lem), zip(docs, pre_tdocs), self.n_jobs, expand_args=True)

#print('Training bigram...')

if self.bigram is None:

self.bigram = Phrases(tdocs,

min_count=self.min_count,

threshold=self.threshold,

delimiter=b' ')

else:

self.bigram.add_vocab(tdocs)

#print('Training trigram...')

if self.trigram is None:

self.trigram = Phrases(self.bigram[tdocs],

min_count=self.min_count,

threshold=self.threshold,

delimiter=b' ')

else:

self.trigram.add_vocab(self.bigram[tdocs])

return [tdoc for tdoc in self.trigram[self.bigram[tdocs]]]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值