Original link: https://github.com/nltk/nltk/wiki/Porting-your-code-to-NLTK-3.0
NLTK 3.0 contains a number of interface changes. These are being incorporated into a new version of the NLTK book, updated for Python 3 and NLTK 3.
The way NLTK works with unicode is changed: NLTK3 requires all text input to be unicode and always return text as unicode. Previously, some functions and classes worked on unicode and others required encoded bytestrings. Please make sure you're passing unicode to NLTK and expecting unicode output from NLTK - existing code that assumes bytestrings may start to fail.
Here are some changes you may need to make:
grammar
:ContextFreeGrammar
→CFG
,WeightedGrammar
→PCFG
,StatisticalDependencyGrammar
→ProbabilisticDependencyGrammar
,WeightedProduction
→ProbabilisticProduction
draw.tree
:TreeSegmentWidget.node()
→TreeSegmentWidget.label()
,TreeSegmentWidget.set_node()
→TreeSegmentWidget.set_label()
- parsers:
nbest_parse()
→parse()
ccg.parse.chart
:EdgeI.next()
→EdgeI.nextsym()
- Chunk parser:
top_node
→root_label
;chunk_node
→chunk_label
- WordNet properties are now access methods, e.g.
Synset.definition
→Synset.definition()
sem.relextract
:mk_pairs()
→_tree2semi_rel()
,mk_reldicts()
→semi_rel2reldict()
,show_clause()
→clause()
,show_raw_rtuple()
→rtuple()
corpusname.tagged_words(simplify_tags=True)
→corpusname.tagged_words(tagset='universal')
util.clean_html()
→BeautifulSoup.get_text()
.clean_html()
is now dropped, install & use BeautifulSoup or some other html parser instead.util.ibigrams()
→util.bigrams()
util.ingrams()
→util.ngrams()
util.itrigrams()
→util.trigrams()
metrics.windowdiff
→metrics.segmentation.windowdiff()
,metrics.windowdiff.demo()
was removed.parse.generate2
was re-written and merged intoparse.generate
Creating objects from strings:
- Many objects now support a
fromstring()
method tree.Tree.parse()
→tree.Tree.fromstring()
tree.Tree()
→tree.Tree.fromstring()
chunk.RegexpChunkRule.parse()
→chunkRegexpChunkRule.fromstring()
grammar.parse_cfg()
→CFG.fromstring()
(same for other types of grammar)sem.LogicParser.parse()
→sem.Expression.fromstring()
sem.DrtParser.parse()
→sem.DrtExpression.fromstring()
sem.parse_valuation()
→sem.Valuation.fromstring()
sem.parse_type()
→sem.Type.fromstring()
Operations on lists of sentences or other items:
tokenize.batch_tokenize()
→tokenize.tokenize_sents()
tag.batch_tag()
→tag.tag_sents()
parse.batch_parse()
→parse.parse_sents()
classify.batch_classify()
→classify.classify_many()
sem.batch_interpret()
→sem.interpret_sents()
sem.batch_evaluate()
→sem.evaluate_sents()
chunk.batch_ne_chunk()
→chunk.ne_chunk_sents()
Changes in probability.FreqDist
:
fdist.keys()
→sorted(fdist)
fdist.inc(x)
→fdist[x] += 1
fdist.samples()
→sorted(fdist.keys())
fdist.Nr(r)
→fdist.Nr()[r]
fdist.Nr_nonzero()
→fdist.Nr().items()
cfdist.conditions()
→sorted(cfdist.conditions())
Porter stemmer changes:
adjust_case()
,cons()
,cvc()
,doublec()
,m()
,step1ab()
,step1c()
,step2()
,step3()
,step4()
,step5()
,vowelinstem()
made privateends()
,r()
,setto()
removed
Removed modules, classes and functions:
classify.svm
was removed. For classification based on support vector machines (SVMs) useclassify.scikitlearn
or scikit-learn directly. Seehttps://github.com/nltk/nltk/issues/450.probability.GoodTuringProbDist
class was removed. Seehttps://github.com/nltk/nltk/issues/381.HiddenMarkovModelTaggerTransformI
and its subclasses are removed. Seehttps://github.com/nltk/nltk/issues/374.classify.maxent
no longer support algorithms backed byscipy.maxentropy
. Seehttps://github.com/nltk/nltk/issues/321.misc.babelfish
was removed. See https://github.com/nltk/nltk/issues/265.sourcedstring
was removed. See https://github.com/nltk/nltk/issues/322.yamltags
was removed. JSON is now preferred instead. Seehttps://github.com/nltk/nltk/issues/540mallet
was removed, including thetag.crf
module. Seehttps://github.com/nltk/nltk/issues/104tag.simplify
was removed. See https://github.com/nltk/nltk/issues/483model
was removed. See https://github.com/nltk/nltk/issues?labels=modelcorpus.reader.wordnet._lcs_by_depth
was removed. Seehttps://github.com/nltk/nltk/issues/422.
Miscellaneous changes:
probability.ConditionalProbDist.default_factory
now inherits fromdict
instead ofdefaultdict
probability.ConditionalProbDistI.default_factory
now inherits fromdict
instead ofdefaultdict
probability.DictionaryConditionalProbDist.default_factory
now inherits fromdict
instead ofdefaultdict
Environment variables for third-party software:
- These have been normalised; please see Installing Third Party Software
More background on Python 3 and NLTK 3: