stanfordcorenlp 在做中文处理是返回值为
['', '', '', '', '']
[('', 'NR'), ('', 'NN'), ('', 'VV'), ('', 'NR'), ('', 'PU')]
[('', 'ORGANIZATION'), ('', 'ORGANIZATION'), ('', 'O'), ('', 'GPE'), ('', 'O')]
解决方法:
找到corenlp.py 文件
找到对应的方法ner()
word_tokenize()
pos_tag()
分别修改为:
def word_tokenize(self, sentence, span=False):
r_dict = self._request('ssplit,tokenize', sentence)
tokens = [token['word'] for s in r_dict['sentences'] for token in s['tokens']]
#print('======================'+str(r_dict))
# Whether