- 背景
今天想找个日语的分词工具,就看到了mecab,然后就在网上找到了相关的示例,运行一下各种报错。
先后安装的包有:
pip install mecab-python-windows
pip install mecab-python3
pip install mecab
pip install whoosh
Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/
pip install tiny_tokenizer[all]
pip install SudachiPy
pip install https://object-storage.tyo2.conoha.io/v1/nc_2520839e1f9641b08211a5c85243124a/sudachi/SudachiDict_core-20191030.tar.gz
sudachipy link -t core
pip install -U pytest
- 错误信息
return self.__parse_tostr(text, **kwargs)
File "C:\Users\lixianwei\venv\lib\site-packages\natto\mecab.py", line 318, in __parse_tostr
return self.__bytes2str(raw).strip()
File "C:\Users\lixianwei\venv\lib\site-packages\natto\support.py", line 26, in bytes2str
return b.decode(py3enc)
UnicodeDecodeError: 'shift_jis' codec can't decode byte 0x93 i