windows下安装Spacy:
直接安装
pip install spacy是会报错的
解决方法:
到
http://www.lfd.uci.edu/~gohlke/pythonlibs/ 下载spacy及其相关的包,挨个安装,最后安装spacy即可
![985935-20170911170704703-679520830.png](https://i-blog.csdnimg.cn/blog_migrate/4f0b0e45979ccdbecd088ea18918127d.png)
与nltk类似,spacy也需要下载一些常用的包或是模型之类的东西
python -m spacy download en
python -m spacy download de
python -m spacy download fr
python -m spacy download en_core_web_md
基础功能的测试:
1. 断词与断句
![985935-20170911170705297-425784800.png](https://i-blog.csdnimg.cn/blog_migrate/6d5241187e2ff99f44f0ef126e2866c8.png)
![985935-20170911170705578-298999899.png](https://i-blog.csdnimg.cn/blog_migrate/7890e47a83fcd7b08ea2e60338b36de1.png)
![985935-20170911170705828-627641571.png](https://i-blog.csdnimg.cn/blog_migrate/7626ca431ffa21ee2e62af8b3cdc4e48.png)
2. 词干化(Lemmatize)
![985935-20170911170705985-1449975920.png](https://i-blog.csdnimg.cn/blog_migrate/c3a872d60d4c4e9d1fe9a6e32d4cc541.png)
![985935-20170911170706125-1867506918.png](https://i-blog.csdnimg.cn/blog_migrate/51ede56f4dce0ec8612da607bf739a7d.png)
3.标注词性(POS Tagging)
![985935-20170911170706282-1044123504.png](https://i-blog.csdnimg.cn/blog_migrate/9e8e69134d2430cc81ad1b65b7bf8136.png)
![985935-20170911170706422-1997896021.png](https://i-blog.csdnimg.cn/blog_migrate/490b72d1416733c19087754e98a6c28b.png)
4.命名实体识别(NER)
![985935-20170911170706547-162179563.png](https://i-blog.csdnimg.cn/blog_migrate/4bedcbf44a3177c8ca77d2f9552a0b3b.png)
![985935-20170911170706672-1986999631.png](https://i-blog.csdnimg.cn/blog_migrate/b147f24cdbcf43d4948965fa8670e0c4.png)
5.名词短语提取
![985935-20170911170706907-547209335.png](https://i-blog.csdnimg.cn/blog_migrate/661abf637675fdb9e0d1433519d4a317.png)
![985935-20170911170707078-185448167.png](https://i-blog.csdnimg.cn/blog_migrate/ad1757a8717e396e1362ee10cb90bcf7.png)
6. 基于词向量计算词间相似度
![985935-20170911170707203-1134922800.png](https://i-blog.csdnimg.cn/blog_migrate/504510a28856497268064aaf72a18f0a.png)
![985935-20170911170707766-2123262148.png](https://i-blog.csdnimg.cn/blog_migrate/850fa17346b87bc84b496a731bfba8a4.png)
![985935-20170911170707907-1327948780.png](https://i-blog.csdnimg.cn/blog_migrate/1db6f5c7a86902de6f2ca29949cc30da.png)
Spacy与中文:
spacy对中文的支持调用的是jieba的接口,所以需要预先安装jieba,在调用时,使用
nlp=spacy.load('zh')
后面的操作与英文的类似
但是只有断词还能用,其他功能需要有依赖的包库,所以,还不如直接用jieba来得直接