Elasticsearch

最新推荐文章于 2024-07-30 01:17:14 发布

飞翔的小->子>弹->

最新推荐文章于 2024-07-30 01:17:14 发布

阅读量284

点赞数

分类专栏： elasticsearch 文章标签： elasticsearch 大数据

本文链接：https://blog.csdn.net/feixiangdexiaozhidan/article/details/111631056

版权

elasticsearch 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

官网：https://www.elastic.co/cn/elasticsearch/

中文文档：https://www.elastic.co/guide/cn/elasticsearch/guide/current/getting-started.html

一、基础概念

1.index 索引

动词：插入相当于MySQL的insert

名称：相当于MySQL database

2.Type 类型

相当于MySQL数据库中的table表，-每一种类型的数据放在一起

3.Document 文档相当于MySQL中的一条数据里面是json格式里面的属性相当于MySQL的字段

倒排索引，相关性得分快速查询

4.docker 安装eclasticsearch sudo docker pull elasticsearch:7.4.2

安装eclasticsearch可视化检索工具 sudo docker pull kibana:7.4.2

查看内存使用情况 free -m

配置docker 下的elacticsearch

sudo mkdir -p /mydata/elasticsearch/config

sudo mkdir -p /mydata/elasticsearch/data

echo "http.host: 0.0.0.0">>/mydata/elasticsearch/config/elasticsearch.yml

docker run  -p 9200:9200  -p 9300:9300 --name elasticsearch  \
-e "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v  /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v  /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d  elasticsearch:7.4.2

修改文件夹权限
chmod -R 777 /mydata/elasticsearch/

docker ps -a  显示所有容器 包括未运行的



elasticsearch用户拥有的内存权限太小，至少需要262144；

解决：

切换到root用户

执行命令：

sysctl -w vm.max_map_count=262144

查看结果：

sysctl -a|grep vm.max_map_count

显示：

vm.max_map_count = 262144

 

上述方法修改之后，如果重启虚拟机将失效，所以：

解决办法：

在   /etc/sysctl.conf文件最后添加一行

vm.max_map_count=262144

即可永久修改

5.安装 kibana 可视化工具

docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.56.10:9200 -p 5601:5601 \
-d kibana:7.4.2

设置为自动重启

sudo docker update 0a824ddf52d0 --restart=always

安装完成后可访问http://192.168.56.10:5601访问

6.初步检索

GET /_cat/nodes 查看所有节点

GET /_cat/health 查看es健康状况

GET /_cat/indices 查看所有索引相当于show databases;

7.索引一个文档

保存一个数据，保存在哪个索引的哪个类型下，指定用哪个唯一标识

PUT customer/external/1 在customer索引external类型下保存 id为1的数据 { “name” : "JD"}

POST请求如果不指定id，会自动生成id 新增，如果指定id 会修改

PUT请求可新增必须人为指定id，不指定id新增会报错，也可指定id修改

8.查询文档

GET customer/external

{
    "_index": "customer",//在哪个索引下
    "_type": "external",//在哪个类型
    "_id": "39CzmHYBZcKtI7qwXzJa",//记录id
    "_version": 3,//版本号
    "_seq_no": 2,//并发控制字段，每次更新就会加一，用来做乐观锁
    "_primary_term": 1,//同上，主分片重新分配，如重启，就会变化
    "found": true,
    "_source": {//真正的内容
        "name": "JD",
        "ren": "乔丹"
    }
}

9.乐观锁字段请求上拼接上下面参数

?if_seq_no=0&if_primary_term=1

例如：http://192.168.56.10:9200/customer/external/39CzmHYBZcKtI7qwXzJa?if_seq_no=3&if_primary_term=1

如果_seq_no不是最新的就会报错409

{
    "error": {
        "root_cause": [
            {
                "type": "version_conflict_engine_exception",
                "reason": "[39CzmHYBZcKtI7qwXzJa]: version conflict, required seqNo [4], primary term [1]. current document has seqNo [5] and primary term [1]",
                "index_uuid": "QyKTy5LkRLm_2-zhNM7RSg",
                "shard": "0",
                "index": "customer"
            }
        ],
        "type": "version_conflict_engine_exception",
        "reason": "[39CzmHYBZcKtI7qwXzJa]: version conflict, required seqNo [4], primary term [1]. current document has seqNo [5] and primary term [1]",
        "index_uuid": "QyKTy5LkRLm_2-zhNM7RSg",
        "shard": "0",
        "index": "customer"
    },
    "status": 409
}

10.更新文档

POST customer/external/1/_update

{
"doc":{
    "name":"JohnDoew"
}

}

此种请求，内容不变的情况下 版本号，序列号不变

或着

POST customer/external/1

{
   "name":"JohnDoew2"
}
此种请求，内容不变的情况下 版本号，序列号都会变
或着
PUT customer/external/1
{
   "name":"JohnDoew2"
}
此种请求，内容不变的情况下 版本号，序列号都会变

11.删除文档

DELETE customer/external/1

DELETE customer
可以删除索引，文档不能删除类型

12.bulk批量api

POST customer/external/_bulk
{"index":{"_id":"1"}}//index表示新增id为1 内容为JD
{"name":"JD"}

{"index":{"_id":"2"}}//表示新增id为2 内容为JD
{"name":"JD"}

POST /_bulk
{"delete":{"_index":"website","_type": "blog","_id":"123"}}
{"create":{"_index":"website","_type": "blog","_id":"123"}}
{"title": "My First blog"}
{"index": {"_index":"website","_type": "blog"}}
{"title": "My Second blog"}
{"update": {"_index":"website","_type": "blog","_id":"123"}}
{"doc": {"title": "My update blog"}}

13.match_all查询全部， sort排序，_source 只显示想显示的字段

GET bank/_search
{
  "from": 0,
  "size": 10,
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "account_number": "asc"
    },
    {
      "balance": "desc"
    }
  ],
  "_source": ["balance","firstname"]

14.match条件匹配，分词匹配，match_phrase短语匹配包含该短语才匹配，multi_match段字段匹配

GET bank/_search
{
  "query": {
    "match": {
      "account_number": 20
    }
  }
}
GET bank/_search
{
  "query": {
    "match": {
      "address": "282 Kings Place"
    }
  }
}
GET bank/_search
{
    "query": {
        "match_phrase" : {
            "address": "Kings Hwy"
        }
    }
}
GET bank/_search
{
    "query": {
        "multi_match":{
          "query": "Roberts",
          "fields": [ "firstname", "lastname" ]
        }
    }
}

15.复杂查询must必须满足，must_not必须不包含，should可以包含也可以不包含影响的是得分，filter过滤只显示满足条件的数据不影响得分

GET bank/_search
{
    "query": {
        "bool":{
         "must": [
           {
             "match": {
               "address": "mill"
             }
           },
           {"match":{
             "gender": "M"
           }}
         ],
         "must_not": [
           {"match": {
             "age": "18"
           }}
         ],
         "should": [
           {"match": {
             "lastname": "Hines"
           }}
         ],
         "filter": {
           "range": {
             "age": {
               "gte": 10,
               "lte": 28
             }
           }
         }
        }
    }
}

16.复杂查询term 只适合精确查询如价格，数字，id等

GET bank/_search
{
  "query":{
    "term": {
      "age": {
        "value": 28
      }
    }
  }
}

17.复杂查询 field.keyword 精确查询只筛选完全匹配的数据

GET bank/_search
{
  "query":{
    "match": {
      "address.keyword":"694 Jefferson"
    }
  }
}

18.复杂查询 ##搜索address中包含mill的所有人的年龄分布以及平均年龄（aggs聚合，term分组，avg求平均，sum求和，max最大值等，size=0表示不显示查询出来的数据只显示聚合）

GET bank/_search
{
  "query": {
    "match": {
      "address": "mill"
    }
  },
  "aggs": {
    "ageAgg": {
      "terms": {
        "field": "age",
        "size": 10
      }
    },
    "ageAvg":{
      "avg": {
        "field": "age"
      }
    },
    "ageSum":{
      "sum": {
        "field": "age"
      }
    },
    "ageMax":{
      "max": {
        "field": "age"
      }
    },
    "ageMin":{
      "min": {
        "field": "age"
      }
    }
  },
    "size":0
}

19.复杂查询 ##按照年龄聚合，并且请求这些年龄段的平均薪资

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "ageagg": {
      "terms": {
        "field": "age",
        "size": 10
      },
      "aggs": {
        "ageAvg": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

20.复杂查询##查出所有年龄分布，并且统计出每个年龄段，男女分别的平均薪资和总体平均薪资

GET bank/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "ageagg": {
      "terms": {
        "field": "age",
        "size": 10
      },
      "aggs": {
        "genderAgg": {
          "terms": {
            "field": "gender.keyword",
            "size": 10
          },
          "aggs": {
            "balanceAvg": {
              "avg": {
                "field": "balance"
              }
            }
          }
        },
        "agebanlanceAvg":{
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

21.映射 #mapping 查看映射

GET bank/_mapping

22..映射###新增映射

PUT /my_index
{
  "mappings": {
    "properties": {
      "age":{
        "type":"integer"
      },
      "email":{
        "type":"keyword"
      }
    }
  }
}

 ##添加映射字段，不能修改映射index不需要被索引
PUT /my_index/_mapping
{
    "properties": {
      "employ_id":{
        "type":"keyword",
        "index":false 
      }
    }
}
GE

23.##想要修改映射只能数据迁移

新建一个索引自己想要的样子，并查看
PUT /newbank
{
     "mappings" : {
      "properties" : {
        "account_number" : {
          "type" : "long"
        },
        "address" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "age" : {
          "type" : "integer"
        },
        "balance" : {
          "type" : "long"
        },
        "city" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "email" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "employer" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "firstname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "lastname" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "state" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  
}
GET /newbank/_mapping


###数据迁移
POST _reindex
{
  "source": {
    "index": "bank",##原来想要修改的索引
    "type": "account"##新版本没有8.*不在有type已废弃
  },
  "dest": {
    "index": "newbank"##新建的索引
  }
}

24.ik分词器-基础

#使用默认分词器，一般是用于英文语句
POST _analyze
{
  "analyzer": "standard",
  "text":      "我是中国人"
}

#下载用于中文的ik分词器，在github找到相应的版本下载解压到elastic search 下的plugin目录下的ik目录，重启elasticsearch ，下载地址参考：https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v7.4.2
####智能分词
POST _analyze
{
  "analyzer": "ik_smart",
  "text":      "我是中国人"
}

####最大分词
POST _analyze
{
  "analyzer": "ik_max_word",
  "text":      "我是中国人"
}

25.ik分词器-自定义词库

安装nginx

mkdir /mydata/nginx
#安装nginx
docker run -p 80:80 --name nginx -d nginx:1.10
#复制到nginx文件夹
docker container cp nginx:/etc/nginx .
#停止nginx
docker stop nginx
#删除nginx容器
docker rm nginx
#移动/mydata下的nginx到conf
mv nginx conf
#重新新建mydata/nginx文件夹,移动conf到nginx
mv conf nginx/
#重新安装nginx
docker run -p 80:80 --name nginx \
-v /mydata/nginx/html:/usr/share/nginx/html \
-v /mydata/nginx/logs:/var/log/nginx \
-v /mydata/nginx/conf:/etc/nginx \
-d nginx:1.10

在nginx/html目录新建文件夹es里面新建fenci.txt文件写入自己想要的分词短语

修改/mydata/elasticsearch/plugins/ik/config 文件夹下IKAnalyzer.cfg.xml配置文件

配置自己定义的分词地址

重启es；

再次查询就会发现根据自定义短语分词

26.Java操作ES

9300:tcp 端口不建议使用8.*以后会废弃

建议使用9200端口

访问请求es的几种方式

1).JestClient:非官方，更新慢

2).RestTemplate:模拟http请求，es很多操作需要自己封装麻烦

3).httpClient :同上

4).Elasticsearch-Rest-Client:官方RestClient,封装了ES操作，api层次分明，上手简单

使用参考https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.4/index.html

a.引入依赖（注意有springboot有自带的elasticsearch版本，在

<properties>标签下添加

<elasticsearch.version>7.4.2</elasticsearch.version>

）

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>7.4.2</version>
        </dependency>

b.编写配置

容器中注入一个

RestHighLevelClient

@Configuration
public class GulimailElasticSearchConfig {

    public RestHighLevelClient esRestClient(){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("192.168.56.10", 9200, "http")));
        return client;
    }

}

c.测试

import org.elasticsearch.client.RestHighLevelClient;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

@RunWith(SpringRunner.class)
@SpringBootTest
public class GulimailSearchApplicationTests {
    @Autowired
private RestHighLevelClient client;
    @Test
    public  void contextLoads() {
        System.out.println(client);
    }

}

d.调用es

配置请求头，放到配置文件

import org.apache.http.HttpHost;
import org.elasticsearch.client.HttpAsyncResponseConsumerFactory;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Configuration;

@Configuration
public class GulimailElasticSearchConfig {

    public static final RequestOptions COMMON_OPTIONS;
    static {
        RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
//        builder.addHeader("Authorization", "Bearer " + TOKEN);
//        builder.setHttpAsyncResponseConsumerFactory(
//                new HttpAsyncResponseConsumerFactory
//                        .HeapBufferedResponseConsumerFactory(30 * 1024 * 1024 * 1024));
       COMMON_OPTIONS = builder.build();
    }
@Bean
    public RestHighLevelClient esRestClient(){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("192.168.56.10", 9200, "http")));
        return client;
    }

}

e.测试

import com.alibaba.fastjson.JSON;
import com.zzw.gulimail.search.config.GulimailElasticSearchConfig;
import lombok.Data;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;

import java.io.IOException;

@RunWith(SpringRunner.class)
@SpringBootTest
public class GulimailSearchApplicationTests {
    @Autowired
private RestHighLevelClient client;
    @Test
    public  void contextLoads() {
        System.out.println(client);
    }
    @Test
    /*保存更新二合一*/
    public  void indexData() throws IOException {
       IndexRequest request = new IndexRequest("users");
        User user=new User();
        user.setUsername("张山");
        user.setAge(11);
        user.setGender("男");
        request.source(JSON.toJSONString(user), XContentType.JSON);
        IndexResponse index=    client.index(request, GulimailElasticSearchConfig.COMMON_OPTIONS);
        System.out.println(index);
    }
    @Data
    class  User{
        private String username;
        private String gender;
        private Integer  age;
    }
}

f.复杂检索

  @Test
    /*复杂检索*/
    public  void searchData() throws IOException {
        //创建检索
        SearchRequest sr=new SearchRequest();
        //指定索引
        sr.indices("bank");
        //指定DSL检索条件
        //SearchBuilder sourceBuilder
        SearchSourceBuilder  sourceBuilder=new SearchSourceBuilder();
        //构建检索条件
        sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
/*按照年龄的值进行分布聚合*/
        TermsAggregationBuilder ageAgg = AggregationBuilders.terms("ageAgg").field("age").size(10);
        sourceBuilder.aggregation(ageAgg);
//计算平均薪资
        AvgAggregationBuilder balanceAvg = AggregationBuilders.avg("balanceAvg").field("balance");
        sourceBuilder.aggregation(balanceAvg);
        //sourceBuilder.from();
        //sourceBuilder.size();
        sr.source(sourceBuilder);
        System.out.println(sourceBuilder);
        //执行检索
        SearchResponse  searchResponse=client.search(sr,GulimailElasticSearchConfig.COMMON_OPTIONS);
        //分析结果
        System.out.println(searchResponse.toString());
        //获取查询到的数据
        SearchHits hits = searchResponse.getHits();
        SearchHit[] hitsHits = hits.getHits();
        for (SearchHit hit:hits){
            String sourceAsString = hit.getSourceAsString();
           // Account o = JSON.parseObject(sourceAsString, Account.Class);
           // System.out.println("o="+o);
        }

        //获取检索到的分析信息
        Aggregations aggregations = searchResponse.getAggregations();//拿到检索信息
       /* for (Aggregation aggregation : aggregations.asList()) {
            System.out.println(aggregation.getName());
        }*/
        Terms ageAgg1 = aggregations.get("ageAgg");
        for (Terms.Bucket bucket : ageAgg1.getBuckets()) {
            String keyAsString = bucket.getKeyAsString();
            System.out.println("年龄："+keyAsString+"人数："+bucket.getDocCount());
        }
        Avg balanceAvg1 = aggregations.get("balanceAvg");
        System.out.println("平均薪资:"+ balanceAvg1.getValue());

    }