自己遇到的问题,仅供参考
错误:Could not read config.cfg from D:\soft\Anaconda\envs\py38\lib\site-packages\de_core_news_sm\de_core_news_sm-2.2.5\config.cfg
问题是spacy和de_core_news_sm-2.2.5的版本不匹配
我这里的spacy是3.0,需要下载对应的包
官网使用python -m spacy download en_core_web_sm下载
出现如下错误,连接失败requests.exceptions.ConnectionError: HTTPSConnectionPool(host=‘raw.githubusercontent.com’, port=443): Max retries exceeded with url: /explosion/spacy-models/master/compatibility.json (Caused by NewConnectionError(’<urllib3.connection.HTTPSConnection object at 0x000001638F6CEC40>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed’))
解决办法是直接在git上下载对应的tag.gz
地址https://github.com/explosion/spacy-models/releases/tag/en_core_web_sm-3.0.0
其他语言或版本直接修改包名和版本号即可下载
下一步是安装
pip install de_core_news_sm-3.0.0.tar.gz
pip install en_core_web_sm-3.0.0.tar.gz
代码中load
spacy_de = spacy.load(‘de_core_news_sm’)
spacy_en = spacy.load(‘en_core_web_sm’)