为了在windows上跑CRF,我需要安装sklearn-crfsuite,最开始想到的是使用pycharm进行环境配置,装上了sklearn-crfsuite。跑起来,额,出错了:
正在训练评估CRF模型...
Traceback (most recent call last):
File "C:/Users/cc/Documents/xxx/yyy/window_version/main.py", line 73, in <module>
main()
File "C:/Users/cc/Documents/xxx/yyy/window_version/main.py", line 31, in main
(test_word_lists, test_tag_lists)
File "C:\Users\cc\Documents\xxx\yyy\window_version\evaluate.py", line 43, in crf_train_eval
crf_model.train(train_word_lists, train_tag_lists)
File "C:\Users\cc\Documents\xxx\yyy\window_version\models\crf.py", line 23, in train
self.model.fit(features, tag_lists)
File "C:\Users\cc\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn_crfsuite\estimator.py", line 331, in fit
trainer.train(self.modelfile.name, holdout=-1 if X_dev is None else 1)
File "pycrfsuite\_pycrfsuite.pyx", line 359, in pycrfsuite._pycrfsuite.BaseTrainer.train
File "stringsource", line 15, in string.from_py.__pyx_convert_string_from_py_std__in_string
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-10: ordinal not in range(128)
Process finished with exit code 1
好的,不就是排错吗!我不怕,上网找经验,找源码,找到pycrfsuite_pycrfsuite.pyx的BaseTrainer.train函数,但是找不到stringsource,怀疑是C/C++实现,有点灰心~
通过找经验可以考虑的解决方案:
- 使用cmd,pip安装sklearn-crfsuite
- 换一个电脑,自己的电脑环境被搞得有点乱
- 将中文编码成index,重新训练模型
同事用pip成功了,搜索他的电脑上根本没有_pycrfsuite.pyx,但是有_pycrfsuite.pyd文件,不知道两者有什么区别。
换个电脑用pip装一下sklearn-crfsuite,搞定了。
备注:其实在linux上用pip装没问题,就可以想想用pip装了~