Elasticsearch7.x最新必看入门教程

最新推荐文章于 2024-08-08 07:38:20 发布

国宾山城乐堡

最新推荐文章于 2024-08-08 07:38:20 发布

阅读量694

点赞数

文章标签： elasticsearch java

本文链接：https://blog.csdn.net/qq_39231284/article/details/111600380

版权

ES安装过程这里就不重复了！如有需要评论留言噢！！！

以下为教程目录

elasticsearch 可视化工具
中文分词插件IK
- ik_max_word和ik_smart
索引
数据管理

elasticsearch 可视化工具

elasticsearch 的可视化工具有很多，比如 elasticsearch-head、Dejavu、ElasticHD等。平时习惯使用Google插件Elasticsearch Head比较方便打开浏览器就能使用。

中文分词插件IK

lasticsearch 本身对中文支持不够好，所以需要中文的分词插件，目前主流的都用 IK。

IK Analyzer是一个开源的，基于java语言开发的轻量级的中文分词工具包。从2006年12月推出1.0版开始， IKAnalyzer已经推出了4个大版本。最初，它是以开源项目Luence为应用主体的，结合词典分词和文法分析算法的中文分词组件。从3.0版本开始，IK发展为面向Java的公用分词组件，独立于Lucene项目，同时提供了对Lucene的默认优化实现。在2012版本中，IK实现了简单的分词歧义排除算法，标志着IK分词器从单纯的词典分词向模拟语义分词衍化。

ik_max_word和ik_smart

ik_max_word: 将文本按最细粒度的组合来拆分，比如会将“中华五千年华夏”拆分为“五千年、五千、五千年华、华夏、千年华夏”，总之是可能的组合；
ik_smart: 最粗粒度的拆分，比如会将“五千年华夏”拆分为“五千年、华夏”

当不添加分词类别，Elastic对于汉字默认使用standard只是将汉字拆分成一个个的汉字，而我们ik则更加的智能，下面通过几个案例来说明。

ik_smart分词：
在JSON格式中添加analyzer节点内容为ik_smart

[elastic@localhost elastic]$ curl -X GET -H "Content-Type: application/json"  "http://localhost:9200/_analyze?pretty=true" -d'{"text":"中华五千年华夏","analyzer": "ik_smart"}';

ik_max_word分词：
在JSON格式中添加analyzer节点内容为ik_max_word

[elastic@localhost elastic]$ curl -X GET -H "Content-Type: application/json"  "http://localhost:9200/_analyze?pretty=true" -d'{"text":"中华五千年华夏","analyzer": "ik_max_word"}';

自定义分词：
IK 很友好，为我们提供热更新 IK 分词，在配置文件{ES_HOME}/plugins/ik/config/IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
   <comment>IK Analyzer 扩展配置</comment>
   <!--用户可以在这里配置自己的扩展字典 -->
   <entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>
    <!--用户可以在这里配置自己的扩展停止词字典-->
   <entry key="ext_stopwords">custom/ext_stopword.dic</entry>
	<!--用户可以在这里配置远程扩展字典 -->
   <entry key="remote_ext_dict">location</entry>
	<!--用户可以在这里配置远程扩展停止词字典-->
   <entry key="remote_ext_stopwords">http://xxx.com/xxx.dic</entry>
</properties>

我们一般将需要自动更新的热词放在一个UTF8的txt文件里，再利用 nginx ，当 .txt 文件修改时，http server 会在客户端请求该文件时自动返回相应的 Last-Modified 和 ETag。可以另外做一个工具来从业务系统提取相关词汇，并更新这个 .txt 文件。

索引

终于到索引这一块了！以下内容将从基本概念以及索引具体是什么来讲

ElasticSearch 是文档型数据库，索引（Index）定义了文档的逻辑存储和字段类型，每个索引可以包含多个文档类型，文档类型是文档的集合，文档以索引定义的逻辑存储模型，比如，指定分片和副本的数量，配置刷新频率，分配分析器等，存储在索引中的海量文档分布式存储在ElasticSearch集群中。

ElasticSearch是基于Lucene框架的全文搜索引擎，将所有文档的信息写入到倒排索引（Inverted Index）的数据结构中，倒排索引建立的是索引中词和文档之间的映射关系，在倒排索引中，数据是面向词（Term）而不是面向文档的。

创建索引

由于在ElasticSearch 7.x之后就默认不在支持指定索引类型，所以在在elasticsearch7.x上执行：

curl -X PUT "localhost:9200/book" -H 'Content-Type: application/json' -d'
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, 
            "number_of_replicas" : 0 
        }
    }
}

-d指定了你的参数，这里将这些参数放到了 JSON 文件中
settings设置内容含义

name	Value
number_of_shards	分片数
number_of_replicas	副本数
mappings	结构化数据设置下面的一级属性是自定义的类型
properties	类型的属性设置节点，下面都是属性
epoch_millis	表示时间戳

查看全部索引

关键词_cat

[elastic@localhost elastic]$ curl -H "Content-Type: application/json" -X GET "http://localhost:9200/_cat/indices?v"

删除索引

[elastic@localhost elastic]$ curl -H "Content-Type: application/json" -X DELETE "http://localhost:9200/book?pretty=true"

数据管理

添加数据

这里演示PUT方式为book索引添加数据，并且指定id，应当注意此处的默认类型为_doc，还有一种就是采用POST方式添加数据，并且自动生成主键。

[elastic@localhost elastic]$ curl -H "Content-Type: application/json" -X PUT "http://localhost:9200/book/_doc/1?pretty=true" -d'
{
    "productid" : 1,
    "name" : "测试添加索引产品名称",
    "short_name" : "测试添加索引产品短标题",
    "desc" : "测试添加索引产品描述"
}
'

指定id为1，还可以加上参数op_type=create，这样在创建重复id时会报错导致创建失败，否则会更新该id的属性值。

[elastic@localhost elastic]$ curl -H "Content-Type: application/json" -X PUT "http://localhost:9200/book/_doc/1?op_type=create&pretty=true" -d'
{
    "productid" : 1,
    "name" : "测试添加索引产品名称",
    "short_name" : "测试添加索引产品短标题",
    "desc" : "测试添加索引产品描述"
}
'

查询数据

查询所有

[elastic@localhost elastic]$ curl -H "Content-Type: application/json" -X GET "http://localhost:9200/book/_search?pretty=true"

条件查询
条件查询会涉及到精确词查询、匹配查询、多条件查询、聚合查询四种，分别为"term"、“match”、“multi_match”、“multi_match”。

按找数据的名称作为条件查询匹配

[elastic@localhost elastic]$ curl -H "Content-Type: application/json" -X GET "http://localhost:9200/book/_search?pretty=true" -d'
{
    "query" : {
        "match" : { 
            "name" : "产品" 
        }
    }
}
'

按找数据的标识作为条件查询匹配

[elastic@localhost elastic]$ curl -H "Content-Type: application/json" -X GET "http://localhost:9200/book/_search?pretty=true" -d'
{
    "query" : {
        "match" : { 
            "productid" : 100
        }
    }
}
'

多条件匹配
选择匹配desc、short_name列作为多条件

[elastic@localhost elastic]$ curl -H "Content-Type: application/json" -X GET "http://localhost:9200/book/_search?pretty=true" -d'
{
    "query" : {
        "multi_match" : { 
            "query":"产品",
            "fields" : ["desc","short_name"]
        }
    }
}
'

当没有匹配任何数据适合则如下

[elastic@localhost elastic]$ curl -H "Content-Type: application/json" -X GET "http://localhost:9200/book/_search?pretty=true" -d'
 {
     "query" : {
         "match" : { 
             "productid" : 100
        }
     }
 }
 '
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

修改数据

curl -X POST "localhost:9200/book/_doc/HFn_2XABkofzJYzpQIy4" -H 'Content-Type: application/json' -d '{
  "productid" : 1,
    "name" : "测试更新",
}'

HFn_2XABkofzJYzpQIy4为你index索引的id

删除数据

根据id删除

curl -X DELETE "localhost:9200/book/_doc/GFn_2XABkofzJYzpQIy4"

GFn_2XABkofzJYzpQIy4为数据的_id
根据条件删除

curl -X POST "localhost:9200/book/_delete_by_query?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "short_name": "添加"
    }
  }
}
'

参考king-angmar/weathertop，文章如有欠缺，评论留言，后续改进

国宾山城乐堡

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫