文章目录
「章节总览」
【ElasticSearch 第一篇 https://blog.csdn.net/weixin_45404884/article/details/137402463】
【ElasticSearch 第二篇 https://blog.csdn.net/weixin_45404884/article/details/137505489】
【ElasticSearch 第三篇 https://blog.csdn.net/weixin_45404884/article/details/137548120】
一、初识ElasticSearch
1.什么是 elasticsearch
elasticsearch 是一款非常强大的开源搜索引擎,可以帮助我们从海量数据中快速找到需要的内容。结合 kibana 、 Logstash 、 Beats ,也就是 elastic stack ( ELK )。被广泛应用在日志数据分析、实时监控等领域。
2.发展历程
2004 年 Shay Banon 基于 Lucene 开发了 Compass
2010 年 Shay Banon 重写了 Compass ,取名为 Elasticsearch 。
官网地址:https://www.elastic.co/cn/
相比与 lucene , Elasticsearch具备下列优势:
- 支持分布式,可水平扩展
- 提供 Restful 接口,可被任何语言调用
2.正向索引和倒排索引
elasticsearch 采用倒排索引:
- 文档( document ):每条数据就是一个文档,文档数据会被序列化为 json 格式后存储在 elasticsearch 中
- 词条( term ):文档按照语义分成的词语
- 什么是倒排索引?
对文档内容分词,对词条创建索引,并记录词条所在文档的信息。查询时先根据词条查询到文档 id ,而后获取到文档。 - 什么是正向索引?
基于文档 id 创建索引。查询词条时必须先找到文档,而后判断是否包含词条。
传统数据库(如 MySQL )采用正向索引,例如给下表( tb_goods )中的 id 创建索引:
3.索引
- 索引( index ):相同类型的文档的集合
- 映射( mapping ):索引中文档的字段约束信息,类似表的结构约束
4.概念对比
5.分词器
(1)普通分词器
es 在创建倒排索引时需要对文档分词;在搜索时,需要对用户输入内容分词。
POST /_analyze
{
"text": "你好分词器",
"analyzer": "standard"
}
语法说明:
- POST :请求方式
- /_analyze :请求路径,这里省略了 http://192.168.150.101:9200,有 kibana 帮我们补充
- 请求参数, json 风格:
- analyzer :分词器类型,这里是默认的 standard 分词器
- text :要分词的内容
分词结果:
{
"tokens": [
{
"token": "你",
"start_offset": 0,
"end_offset": 1,
"type": "<IDEOGRAPHIC>",
"position": 0
},
{
"token": "好",
"start_offset": 1,
"end_offset": 2,
"type": "<IDEOGRAPHIC>",
"position": 1
},
{
"token": "分",
"start_offset": 2,
"end_offset": 3,
"type": "<IDEOGRAPHIC>",
"position": 2
},
{
"token": "词",
"start_offset": 3,
"end_offset": 4,
"type": "<IDEOGRAPHIC>",
"position": 3
},
{
"token": "器",
"start_offset": 4,
"end_offset": 5,
"type": "<IDEOGRAPHIC>",
"position": 4
}
]
}
(2)ik分词器
处理中文分词,一般会使用 IK 分词器。 ik分词器地址
ik 分词器包含两种模式:
- ik_smart :最少切分,粗粒度
POST /_analyze
{
"text": "你好分词器",
"analyzer": "ik_smart"
}
分词结果:
{
"tokens": [
{
"token": "你好",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "分词器",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 1
}
]
}
- ik_max_word :最细切分,细粒度
POST /_analyze
{
"text": "你好分词器",
"analyzer": "ik_max_word"
}
分词结果:
{
"tokens": [
{
"token": "你好",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "分词器",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 1
},
{
"token": "分词",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 2
},
{
"token": "器",
"start_offset": 4,
"end_offset": 5,
"type": "CN_CHAR",
"position": 3
}
]
}
- 支持拓展词库
找到ik分词器安装目录下的config路径下的IKAnalyzer.cfg.xml文件,配置自己的拓展词以及停用词的路径
例如打开ext_dic添加你好分词器,重启es
举例:
POST /_analyze
{
"text": "你好分词器",
"analyzer": "ik_smart"
}
分词结果:
{
"tokens": [
{
"token": "你好分词器",
"start_offset": 0,
"end_offset": 5,
"type": "CN_WORD",
"position": 0
}
]
}
6.索引库操作
(1)mapping 属性
mapping 是对索引库中文档的约束,常见的 mapping 属性包括:
- type :字段数据类型,常见的简单类型有:
- 字符串: text (可分词的文本)、 keyword (精确值,例如:品牌、国家、 ip 地址)
- 数值: long 、 integer 、 short 、 byte 、 double 、 float
- 布尔: boolean
- 日期: date
- 对象: object
- index :是否创建索引,默认为 true
- analyzer :使用哪种分词器
- properties :该字段的子字段
(2)创建索引库
ES 中通过 Restful 请求操作索引库、文档。请求内容用 DSL 语句来表示,
创建索引库和 mapping 的 DSL 语法如下:
PUT /索引库名称
{
"mappings": {
"properties": {
" 字段名 ": {
"type": "text",
"analyzer": "ik_smart"
},
" 字段名 2": {
"type": "keyword",
"index": "false"
},
" 字段名 3": {
"properties": {
" 子字段 ": {
"type": "keyword"
}
}
}
}
}
}
举例:
PUT /iam
{
"mappings": {
"properties": {
"info":{
"type": "text",
"analyzer": "ik_smart"
},
"email":{
"type": "keyword",
"index": "false"
},
"name":{
"properties": {
"firstName": {
"type": "keyword"
}
}
}
}
}
}
返回结果:
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "iam"
}
(3)查看索引库
GET /iam
(4)删除索引库
DELETE /iam
(5)修改索引库
索引库和 mapping 一旦创建无法修改,但是可以添加新的字段,语法如下:
添加字段: PUT / 索引库名 /_mapping
PUT /iam/_mapping
{
"properties": {
"age": {
"type": "integer"
}
}
}
7.文档操作
(1)新增文档
POST /索引库名/_doc/文档 id
{
" 字段 1": " 值 1",
" 字段 2": " 值 2",
" 字段 3": {
" 子属性 1": " 值 3",
" 子属性 2": " 值 4"
}
}
举例:
POST /iam/_doc/2
{
"info": "Java工程师",
"email": "zy@itcast.cn",
"age": 20,
"name": {
"firstName": "张",
"fullName": "张三"
}
}
返回结果:
{
"_index": "iam",
"_id": "2",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 3,
"_primary_term": 1
}
(1)查看文档
GET /索引库名/_doc/文档 id
GET /iam/_doc/2
返回结果:
{
"_index": "iam",
"_id": "2",
"_version": 1,
"_seq_no": 3,
"_primary_term": 1,
"found": true,
"_source": {
"info": "Java工程师",
"email": "zy@itcast.cn",
"age": 20,
"name": {
"firstName": "张",
"fullName": "张三"
}
}
}
(2)删除文档
DELETE /索引库名/_doc/文档 id
DELETE /iam/_doc/2
返回结果:
{
"_index": "iam",
"_id": "2",
"_version": 2,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 4,
"_primary_term": 1
}
(3)修改文档
方式一:全量修改,会删除旧文档,添加新文档
PUT /iam/_doc/1
{
"info": "Java攻城狮",
"email": "zy@itcast.cn",
"name": {
"firstName": "云",
"fullName": "赵云"
}
}
返回结果:
{
"_index": "iam",
"_id": "1",
"_version": 7,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 8,
"_primary_term": 1
}
方式二:增量修改,修改指定字段值
POST /iam/_update/1
{
"doc": {
"email": "ZhaoYun@itcast.cn"
}
}
8.RestClient 操作索引库
(1)什么是 RestClient
ES 官方提供了各种不同语言的客户端,用来操作 ES 。这些客户端的本质就是组装 DSL 语句,通过 http 请求发送给ES 。官方文档地址: https://www.elastic.co/guide/en/elasticsearch/client/index.html
利用 JavaRestClient 实现创建、删除索引库,判断索引库是否存在
(2)分析数据结构
mapping 要考虑的问题:
字段名、数据类型、是否参与搜索、是否分词、如果分词,分词器是什么?
create table tb_hotel (
id bigint(20) not null comment '酒店id',
name varchar(255) NOT NULL comment '酒店名称;例:7天酒店',
address varchar(255) NOT NULL comment ' 酒店地址;例:航头路 ',
price int(10) NOT NULL COMMENT ' 酒店价格;例: 329',
score int(2) NOT NULL COMMENT ' 酒店评分;例: 45 ,就是 4.5 分 ',
brand varchar(32) NOT NULL COMMENT ' 酒店品牌;例:如家 ',
city varchar(32) NOT NULL COMMENT ' 所在城市;例:上海 ',
star_name varchar(16) DEFAULT NULL COMMENT ' 酒店星级,从低到高分别是:1 星到 5 星, 1 钻到 5 钻 ',
business varchar(255) DEFAULT NULL COMMENT ' 商圈;例:虹桥 ',
latitude varchar(32) NOT NULL COMMENT ' 纬度;例: 31.2497',
longitude varchar(32) NOT NULL COMMENT ' 经度;例: 120.3925',
pic varchar(255) DEFAULT NULL COMMENT ' 酒店图片;例 :/img/1.jpg',
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
酒店索引:
PUT /hotel
{
"mappings": {
"properties": {
"id":{
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "ik_max_word"
},
"address":{
"type": "keyword",
"index": false
},
"price":{
"type": "integer"
},
"score":{
"type": "integer"
},
"brand":{
"type": "keyword"
},
"city":{
"type": "keyword"
},
"star_name":{
"type": "keyword"
},
"business":{
"type": "keyword"
},
"location":{
"type": "geo_point"
},
"pic":{
"type": "keyword",
"index": false
}
}
}
}
tips1:
ES支持两种地理坐标数据类型:
- geo_point:由纬度和经度确定的一个点,例如:“32.32132,110.323213”
- geo_shape:有多个geo_point组成的复杂几何图形,例如一条直线LINESTRING(-77.3434 38.34324,-77.23112 38.3232)
tips2:
字段拷贝可以使用copy_to属性将当前字段拷贝到指定字段,示例:
"all":{
"type": "text",
"analyzer": "ik_max_word"
}
"brand":{
"type": "keyword",
"copy_to": "all"
}
(3)初始化 JavaRestClient
引入依赖
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>
初始化RestHighLevelClient
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://192.168.150.101:9200")
));
(4)创建索引库
private static final String MAPPING_TEMPLATE = "{\n" +
" \"mappings\": {\n" +
" \"properties\": {\n" +
" \"id\":{\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"name\":{\n" +
" \"type\": \"text\",\n" +
" \"analyzer\": \"ik_max_word\"\n" +
" },\n" +
" \"address\":{\n" +
" \"type\": \"keyword\",\n" +
" \"index\": false\n" +
" },\n" +
" \"price\":{\n" +
" \"type\": \"integer\"\n" +
" },\n" +
" \"score\":{\n" +
" \"type\": \"integer\"\n" +
" },\n" +
" \"brand\":{\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"city\":{\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"star_name\":{\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"business\":{\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"location\":{\n" +
" \"type\": \"geo_point\"\n" +
" },\n" +
" \"pic\":{\n" +
" \"type\": \"keyword\",\n" +
" \"index\": false\n" +
" }\n" +
" }\n" +
" }\n" +
"}";
@Test
public void testCreateHotelIndex() throws IOException {
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://localhost:9200")
));
// 1.创建 Request 对象
CreateIndexRequest request = new CreateIndexRequest("hotel");
// 2.请求参数, MAPPING_TEMPLATE是静态常量字符串,内容是创建索引库的 DSL语句
request.source(MAPPING_TEMPLATE, XContentType.JSON);
// 3.发起请求
CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
System.out.println(response);
}
(5)删除索引库
@Test
public void testDeleteHotelIndex() throws IOException {
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://localhost:9200")
));
// 1.创建 Request对象
DeleteIndexRequest request = new DeleteIndexRequest("hotel");
// 2.发起请求
AcknowledgedResponse delete = client.indices().delete(request, RequestOptions.DEFAULT);
System.out.println(delete);
}
(6)判断索引库是否存在
@Test
public void testExistsHotelIndex() throws IOException {
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://localhost:9200")
));
// 1.创建 Request对象
GetIndexRequest request = new GetIndexRequest("hotel");
// 2.发起请求
boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
// 3.输出
System.out.println(exists);
}
9.RestClient 操作文档
(1)添加酒店数据到索引库
@Test
public void testIndexDocument() throws IOException {
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://localhost:9200")
));
String template = "{\n" +
" \"name\": \"张三\",\n" +
" \"email\": \"zy@itcast.cn\"\n" +
"}";
// 1.创建 request对象
IndexRequest request = new IndexRequest("hotel").id("1");
// 2.准备 JSON文档
request.source(template, XContentType.JSON);
// 3.发送请求\
IndexResponse response = client.index(request, RequestOptions.DEFAULT);
System.out.println(response);
client.close();
}
(2)根据 id 查询酒店数据
@Test
public void testGetDocumentById() throws IOException {
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://localhost:9200")
));
// 1.创建 request对象
GetRequest request = new GetRequest("hotel", "1");
// 2.发送请求,得到结果
GetResponse response = client.get(request, RequestOptions.DEFAULT);
// 3.解析结果
String json = response.getSourceAsString();
System.out.println(json);
client.close();
}
(3)根据 id 修改酒店数据
修改文档数据有两种方式:
方式一:全量更新。再次写入 id 一样的文档,就会删除旧文档,添加新文档
方式二:局部更新。只更新部分字段,我们演示方式二
@Test
public void testUpdateDocumentById() throws IOException {
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://localhost:9200")
));
// 1.创建 request对象
UpdateRequest request = new UpdateRequest("hotel", "1");
// 2.准备参数,每 2个参数为一对 key value
request.doc(
"email", "2312123@163.com"
);
// 3.更新文档
client.update(request, RequestOptions.DEFAULT);
client.close();
}
(4)根据 id 删除文档数据
@Test
public void testDeleteDocumentById() throws IOException {
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://localhost:9200")
));
// 1.创建 request对象
DeleteRequest request = new DeleteRequest("hotel", "1");
// 2.删除文档
client.delete(request, RequestOptions.DEFAULT);
client.close();
}
下一篇
https://blog.csdn.net/weixin_45404884/article/details/137505489