Kiss和Strunk(2006)Punkt算法的可怕之处在于它是无监督的。所以给一个新的文本,你应该重新训练这个模型并将它应用到你的文本中,例如>>> from nltk.tokenize.punkt import PunktSentenceTokenizer, PunktParameters
>>> text = "An ambitious campus expansion plan was proposed by Fr. Vernon F. Gallagher in 1952. Assumption Hall, the first student dormitory, was opened in 1954, and Rockwell Hall was dedicated in November 1958, housing the schools of business and law. It was during the tenure of F. Henry J. McAnulty that Fr. Gallagher's ambitious plans were put to action."
# Training a new model with the text.
>>> tokenizer = PunktSentenceTokenizer()
>>> tokenizer.train(text)
# It automatically learns the abbreviations.
>>> tokenizer._params.abbrev_types
{'f', 'fr', 'j'}
# Use the customized tokenizer.
>>> tokenizer.tokenize(text)
['An ambitious campus expansion plan was proposed by Fr. Vernon F. Gallagher in 1952.', 'Assumption Hall, the first student dormitory, was opened in 1954, and Rockwell Hall was dedicated in November 1958, housing the schools of business and law.', "It was during the tenure of F. Henry J. McAnulty that Fr. Gallagher's ambitious plans were put to action."]