英语断句

spacy

import spacy

nlp = spacy.load('en')
sentences = nlp(u"That view is widespread in the custodial industry which has reacted with some alarm to the suggestion in the SIB paper.Instead, Mr Whitehill argues, the answer is to improve the drafting of sub-custodial agreements. Morgan Stanley asks its sub-contractors to guarantee them against wilful mismanagement, non-performance or negligence.In the event of such activities, Morgan Stanley will make its own clients whole and seek to recover from the sub-custodian. 'Increasingly, clients do ask for some protection,' he said. Morgan Stanley carries out six-monthly reviews of all its sub-custodial arrangements to reassure itself of the safety of its clients' money.")
for i, x in enumerate(sentences.sents):
    print(i, x)

>>>
0 That view is widespread in the custodial industry which has reacted with some alarm to the suggestion in the SIB paper.
1 Instead, Mr Whitehill argues, the answer is to improve the drafting of sub-custodial agreements.
2 Morgan Stanley asks its sub-contractors to guarantee them against wilful mismanagement, non-performance or negligence.
3 In the event of such activities, Morgan Stanley will make its own clients whole and seek to recover from the sub-custodian. '
4 Increasingly, clients do ask for some protection,' he said.
5 Morgan Stanley carries out six-monthly reviews of all its sub-custodial arrangements to reassure itself of the safety of its clients' money.

但是用spacy的时候,运行速度很慢,而且如果文档很长的话,就会超出内容限制,然后报错。所以还是用nltk比较好

nltk

import nltk
content = "That view is widespread in the custodial industry which has reacted with some alarm to the suggestion in the SIB paper.Instead, Mr Whitehill argues, the answer is to improve the drafting of sub-custodial agreements. Morgan Stanley asks its sub-contractors to guarantee them against wilful mismanagement, non-performance or negligence.In the event of such activities, Morgan Stanley will make its own clients whole and seek to recover from the sub-custodian. 'Increasingly, clients do ask for some protection,' he said. Morgan Stanley carries out six-monthly reviews of all its sub-custodial arrangements to reassure itself of the safety of its clients' money."
sen_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
sentences = sen_tokenizer.tokenize(content)
for i, x in enumerate(sentences.sents):
    print(i, x)
	
>>>
0 That view is widespread in the custodial industry which has reacted with some alarm to the suggestion in the SIB paper.
1 Instead, Mr Whitehill argues, the answer is to improve the drafting of sub-custodial agreements.
2 Morgan Stanley asks its sub-contractors to guarantee them against wilful mismanagement, non-performance or negligence.
3 In the event of such activities, Morgan Stanley will make its own clients whole and seek to recover from the sub-custodian.
4 'Increasingly, clients do ask for some protection,' he said.
5 Morgan Stanley carries out six-monthly reviews of all its sub-custodial arrangements to reassure itself of the safety of its clients' money.
使用nltk要注意的一点

对于句子中的点号,如果点号后面没有换行符或者其他符号,而且直接跟了字母的话,会被认为是缩写,而不把它判为一个句子的结束符
像这样,e.g.(表示举个例子)就不会用来分割句子

import nltk
content = 'The evaluation noted that the employee had frequently exhibited irresponsible behavior (e.g., coming to work late, failing to complete projects). '
sen_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
sentences = sen_tokenizer.tokenize(content)
for i, x in enumerate(sentences.sents):
    print(i, x)

>>>
0 The evaluation noted that the employee had frequently exhibited irresponsible behavior (i.e., coming to work late, failing to complete projects).
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值