python中da_python – nltk不会在搜索路径中添加$NLTK_DA...

如果您不想在运行脚本之前设置$NLTK_DATA,则可以在python脚本中执行以下操作:

import nltk

nltk.path.append('/home/alvas/some_path/nltk_data/')

例如.让我们将nltk_data移动到NLTK无法自动找到的非标准路径:

alvas@ubi:~$ls nltk_data/

chunkers corpora grammars help misc models stemmers taggers tokenizers

alvas@ubi:~$mkdir some_path

alvas@ubi:~$mv nltk_data/ some_path/

alvas@ubi:~$ls nltk_data/

ls: cannot access nltk_data/: No such file or directory

alvas@ubi:~$ls some_path/nltk_data/

chunkers corpora grammars help misc models stemmers taggers tokenizers

现在,我们使用nltk.path.append()hack:

alvas@ubi:~$python

>>> import os

>>> import nltk

>>> nltk.path.append('/home/alvas/some_path/nltk_data/')

>>> nltk.pos_tag('this is a foo bar'.split())

[('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('foo', 'JJ'), ('bar', 'NN')]

>>> nltk.data

>>> nltk.data.path

['/home/alvas/some_path/nltk_data/', '/home/alvas/nltk_data', '/usr/share/nltk_data', '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']

>>> exit()

让我们把它移回去看它是否有效:

alvas@ubi:~$ls nltk_data

ls: cannot access nltk_data: No such file or directory

alvas@ubi:~$mv some_path/nltk_data/ .

alvas@ubi:~$python

>>> import nltk

>>> nltk.data.path

['/home/alvas/nltk_data', '/usr/share/nltk_data', '/usr/local/share/nltk_data', '/usr/lib/nltk_data', '/usr/local/lib/nltk_data']

>>> nltk.pos_tag('this is a foo bar'.split())

[('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('foo', 'JJ'), ('bar', 'NN')]

如果您真的想自动找到nltk_data,请使用以下内容:

import scandir

import os, sys

import time

import nltk

def find(name, path):

for root, dirs, files in scandir.walk(path):

if root.endswith(name):

return root

def find_nltk_data():

start = time.time()

path_to_nltk_data = find('nltk_data', '/')

print >> sys.stderr, 'Finding nltk_data took', time.time() - start

print >> sys.stderr, 'nltk_data at', path_to_nltk_data

with open('where_is_nltk_data.txt', 'w') as fout:

fout.write(path_to_nltk_data)

return path_to_nltk_data

def magically_find_nltk_data():

if os.path.exists('where_is_nltk_data.txt'):

with open('where_is_nltk_data.txt') as fin:

path_to_nltk_data = fin.read().strip()

if os.path.exists(path_to_nltk_data):

nltk.data.path.append(path_to_nltk_data)

else:

nltk.data.path.append(find_nltk_data())

else:

path_to_nltk_data = find_nltk_data()

nltk.data.path.append(path_to_nltk_data)

magically_find_nltk_data()

print nltk.pos_tag('this is a foo bar'.split())

我们称之为python脚本test.py:

alvas@ubi:~$ls nltk_data/

chunkers corpora grammars help misc models stemmers taggers tokenizers

alvas@ubi:~$python test.py

Finding nltk_data took 4.27330780029

nltk_data at /home/alvas/nltk_data

[('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('foo', 'JJ'), ('bar', 'NN')]

alvas@ubi:~$mv nltk_data/ some_path/

alvas@ubi:~$python test.py

Finding nltk_data took 4.75850391388

nltk_data at /home/alvas/some_path/nltk_data

[('this', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('foo', 'JJ'), ('bar', 'NN')]

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值