NLP简介
NLP代表自然语言处理,是计算机科学和人工智能领域的一个分支。它涉及使用计算机来处理、分析和生成自然语言,例如英语、中文、西班牙语等等。
NLP的目标是使计算机能够理解人类语言的含义和意图,从而使其能够与人类进行有效的交互。这种交互可以是口头的,例如语音识别和语音合成,也可以是书面的,例如文本分类、文本摘要和情感分析。
简单点理解NLP就是我们可以使用软件来操作和理解口语或书面文本或自然语言的方式。
ES中的自然语言处理(NLP)
将 NLP 模型集成到 Elastic 平台时,为上传和管理模型提供出色的用户体验
NLP演示
下载ES对应的opennlp插件
下载地址:https://github.com/spinscale/elasticsearch-ingest-opennlp
将opennlp插件放在ESplugins路径中
下载NER模型
NER:从非结构化文本构建结构,尝试提取名称、位置或组织等细节
bin/ingest-opennlp/download-models
配置opennlp
修改配置文件:config/elasticsearch.yml
ingest.opennlp.model.file.persons: en-ner-persons.bin
ingest.opennlp.model.file.dates: en-ner-dates.bin
ingest.opennlp.model.file.locations: en-ner-locations.bin
重启ES、验证
-
创建一个支持NLP的pipeline
PUT _ingest/pipeline/opennlp-pipeline { "description": "A pipeline to do named entity extraction", "processors": [ { "opennlp": { "field": "message" } } ] }
-
添加数据
PUT my-nlp-index PUT my-nlp-index/_doc/1?pipeline=opennlp-pipeline { "message": "Shay Banon announced the release of Elasticsearch 6.0 in November 2017" } PUT my-nlp-index/_doc/2?pipeline=opennlp-pipeline { "message" : "Kobe Bryant was one of the best basketball players of all times. Not even Michael Jordan has ever scored 81 points in one game. Munich is really an awesome city, but New York is as well. Yesterday has been the hottest day of the year." }
-
查看数据
GET my-nlp-index/_doc/1 GET my-nlp-index/_doc/2