官网:https://www.elastic.co/cn/elasticsearch/
中文文档:https://www.elastic.co/guide/cn/elasticsearch/guide/current/getting-started.html
一、基础概念
1.index 索引
动词:插入相当于MySQL的insert
名称:相当于MySQL database
2.Type 类型
相当于MySQL数据库中的table表,-每一种类型的数据放在一起
3.Document 文档 相当于MySQL中的一条数据 里面是json格式 里面的属性相当于MySQL的字段
倒排索引,相关性得分 快速查询
4.docker 安装eclasticsearch sudo docker pull elasticsearch:7.4.2
安装eclasticsearch可视化检索工具 sudo docker pull kibana:7.4.2
查看内存使用情况 free -m
配置docker 下的elacticsearch
sudo mkdir -p /mydata/elasticsearch/config
sudo mkdir -p /mydata/elasticsearch/data
echo "http.host: 0.0.0.0">>/mydata/elasticsearch/config/elasticsearch.yml
docker run -p 9200:9200 -p 9300:9300 --name elasticsearch \
-e "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.4.2
修改文件夹权限
chmod -R 777 /mydata/elasticsearch/
docker ps -a 显示所有容器 包括未运行的
elasticsearch用户拥有的内存权限太小,至少需要262144;
解决:
切换到root用户
执行命令:
sysctl -w vm.max_map_count=262144
查看结果:
sysctl -a|grep vm.max_map_count
显示:
vm.max_map_count = 262144
上述方法修改之后,如果重启虚拟机将失效,所以:
解决办法:
在 /etc/sysctl.conf文件最后添加一行
vm.max_map_count=262144
即可永久修改
5.安装 kibana 可视化工具
docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.56.10:9200 -p 5601:5601 \
-d kibana:7.4.2
设置为自动重启
sudo docker update 0a824ddf52d0 --restart=always
安装完成后可访问http://192.168.56.10:5601访问
6.初步检索
GET /_cat/nodes 查看所有节点
GET /_cat/health 查看es健康状况
GET /_cat/indices 查看所有索引 相当于show databases;
7.索引一个文档
保存一个数据,保存在哪个索引的哪个类型下,指定用哪个唯一标识
PUT customer/external/1 在customer索引external类型下保存 id为1的数据 { “name” : "JD"}
POST请求 如果不指定id,会自动生成id 新增,如果指定id 会修改
PUT请求 可新增必须人为指定id,不指定id新增会报错,也可指定id修改
8.查询文档
GET customer/external
{
"_index": "customer",//在哪个索引下
"_type": "external",//在哪个类型
"_id": "39CzmHYBZcKtI7qwXzJa",//记录id
"_version": 3,//版本号
"_seq_no": 2,//并发控制字段,每次更新就会加一,用来做乐观锁
"_primary_term": 1,//同上,主分片重新分配,如重启,就会变化
"found": true,
"_source": {//真正的内容
"name": "JD",
"ren": "乔丹"
}
}
9.乐观锁字段请求上拼接上下面参数
?if_seq_no=0&if_primary_term=1
例如:http://192.168.56.10:9200/customer/external/39CzmHYBZcKtI7qwXzJa?if_seq_no=3&if_primary_term=1
如果_seq_no不是最新的就会报错409
{
"error": {
"root_cause": [
{
"type": "version_conflict_engine_exception",
"reason": "[39CzmHYBZcKtI7qwXzJa]: version conflict, required seqNo [4], primary term [1]. current document has seqNo [5] and primary term [1]",
"index_uuid": "QyKTy5LkRLm_2-zhNM7RSg",
"shard": "0",
"index": "customer"
}
],
"type": "version_conflict_engine_exception",
"reason": "[39CzmHYBZcKtI7qwXzJa]: version conflict, required seqNo [4], primary term [1]. current document has seqNo [5] and primary term [1]",
"index_uuid": "QyKTy5LkRLm_2-zhNM7RSg",
"shard": "0",
"index": "customer"
},
"status": 409
}
10.更新文档
POST customer/external/1/_update
{
"doc":{
"name":"JohnDoew"
}
}
此种请求,内容不变的情况下 版本号,序列号不变
或着
POST customer/external/1
{
"name":"JohnDoew2"
}
此种请求,内容不变的情况下 版本号,序列号都会变
或着
PUT customer/external/1
{
"name":"JohnDoew2"
}
此种请求,内容不变的情况下 版本号,序列号都会变
11.删除文档
DELETE customer/external/1
DELETE customer
可以删除索引,文档不能删除类型
12.bulk批量api
POST customer/external/_bulk
{"index":{"_id":"1"}}//index表示新增id为1 内容为JD
{"name":"JD"}
{"index":{"_id":"2"}}//表示新增id为2 内容为JD
{"name":"JD"}
POST /_bulk
{"delete":{"_index":"website","_type": "blog","_id":"123"}}
{"create":{"_index":"website","_type": "blog","_id":"123"}}
{"title": "My First blog"}
{"index": {"_index":"website","_type": "blog"}}
{"title": "My Second blog"}
{"update": {"_index":"website","_type": "blog","_id":"123"}}
{"doc": {"title": "My update blog"}}
13.match_all查询全部, sort排序,_source 只显示想显示的字段
GET bank/_search
{
"from": 0,
"size": 10,
"query": {
"match_all": {}
},
"sort": [
{
"account_number": "asc"
},
{
"balance": "desc"
}
],
"_source": ["balance","firstname"]
14.match条件匹配,分词匹配,match_phrase短语匹配包含该短语才匹配,multi_match段字段匹配
GET bank/_search
{
"query": {
"match": {
"account_number": 20
}
}
}
GET bank/_search
{
"query": {
"match": {
"address": "282 Kings Place"
}
}
}
GET bank/_search
{
"query": {
"match_phrase" : {
"address": "Kings Hwy"
}
}
}
GET bank/_search
{
"query": {
"multi_match":{
"query": "Roberts",
"fields": [ "firstname", "lastname" ]
}
}
}
15.复杂查询must必须满足,must_not必须不包含,should可以包含也可以不包含 影响的是得分,filter过滤只显示满足条件的数据不影响得分
GET bank/_search
{
"query": {
"bool":{
"must": [
{
"match": {
"address": "mill"
}
},
{"match":{
"gender": "M"
}}
],
"must_not": [
{"match": {
"age": "18"
}}
],
"should": [
{"match": {
"lastname": "Hines"
}}
],
"filter": {
"range": {
"age": {
"gte": 10,
"lte": 28
}
}
}
}
}
}
16.复杂查询term 只适合精确查询如价格,数字,id等
GET bank/_search
{
"query":{
"term": {
"age": {
"value": 28
}
}
}
}
17.复杂查询 field.keyword 精确查询 只筛选完全匹配的数据
GET bank/_search
{
"query":{
"match": {
"address.keyword":"694 Jefferson"
}
}
}
18.复杂查询 ##搜索address中包含mill的所有人的年龄分布以及平均年龄(aggs聚合,term分组,avg求平均,sum求和,max最大值等,size=0表示不显示查询出来的数据只显示聚合)
GET bank/_search
{
"query": {
"match": {
"address": "mill"
}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age",
"size": 10
}
},
"ageAvg":{
"avg": {
"field": "age"
}
},
"ageSum":{
"sum": {
"field": "age"
}
},
"ageMax":{
"max": {
"field": "age"
}
},
"ageMin":{
"min": {
"field": "age"
}
}
},
"size":0
}
19.复杂查询 ##按照年龄聚合,并且请求这些年龄段的平均薪资
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageagg": {
"terms": {
"field": "age",
"size": 10
},
"aggs": {
"ageAvg": {
"avg": {
"field": "balance"
}
}
}
}
}
}
20.复杂查询##查出所有年龄分布,并且统计出每个年龄段,男女分别的平均薪资和总体平均薪资
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageagg": {
"terms": {
"field": "age",
"size": 10
},
"aggs": {
"genderAgg": {
"terms": {
"field": "gender.keyword",
"size": 10
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
},
"agebanlanceAvg":{
"avg": {
"field": "balance"
}
}
}
}
}
}
21.映射 #mapping 查看映射
GET bank/_mapping
22..映射###新增映射
PUT /my_index
{
"mappings": {
"properties": {
"age":{
"type":"integer"
},
"email":{
"type":"keyword"
}
}
}
}
##添加映射字段,不能修改映射index不需要被索引
PUT /my_index/_mapping
{
"properties": {
"employ_id":{
"type":"keyword",
"index":false
}
}
}
GE
23.##想要修改映射 只能数据迁移
新建一个索引自己想要的样子,并查看
PUT /newbank
{
"mappings" : {
"properties" : {
"account_number" : {
"type" : "long"
},
"address" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"age" : {
"type" : "integer"
},
"balance" : {
"type" : "long"
},
"city" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"employer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"firstname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"gender" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"lastname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"state" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
GET /newbank/_mapping
###数据迁移
POST _reindex
{
"source": {
"index": "bank",##原来想要修改的索引
"type": "account"##新版本没有8.*不在有type已废弃
},
"dest": {
"index": "newbank"##新建的索引
}
}
24.ik分词器-基础
#使用默认分词器,一般是用于英文语句
POST _analyze
{
"analyzer": "standard",
"text": "我是中国人"
}
#下载用于中文的ik分词器,在github找到相应的版本下载解压到elastic search 下的plugin目录下的ik目录,重启elasticsearch ,下载地址参考:https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v7.4.2
####智能分词
POST _analyze
{
"analyzer": "ik_smart",
"text": "我是中国人"
}
####最大分词
POST _analyze
{
"analyzer": "ik_max_word",
"text": "我是中国人"
}
25.ik分词器-自定义词库
安装nginx
mkdir /mydata/nginx
#安装nginx
docker run -p 80:80 --name nginx -d nginx:1.10
#复制到nginx文件夹
docker container cp nginx:/etc/nginx .
#停止nginx
docker stop nginx
#删除nginx容器
docker rm nginx
#移动/mydata下的nginx到conf
mv nginx conf
#重新新建mydata/nginx文件夹,移动conf到nginx
mv conf nginx/
#重新安装nginx
docker run -p 80:80 --name nginx \
-v /mydata/nginx/html:/usr/share/nginx/html \
-v /mydata/nginx/logs:/var/log/nginx \
-v /mydata/nginx/conf:/etc/nginx \
-d nginx:1.10
在nginx/html目录新建文件夹es里面新建fenci.txt文件 写入自己想要的分词短语
修改/mydata/elasticsearch/plugins/ik/config 文件夹下IKAnalyzer.cfg.xml配置文件
配置自己定义的分词地址
重启es;
再次查询就会发现根据自定义短语分词
26.Java操作ES
9300:tcp 端口不建议使用8.*以后会废弃
建议使用9200端口
访问请求es的几种方式
1).JestClient:非官方,更新慢
2).RestTemplate:模拟http请求,es很多操作需要自己封装麻烦
3).httpClient :同上
4).Elasticsearch-Rest-Client:官方RestClient,封装了ES操作,api层次分明,上手简单
使用参考https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.4/index.html
a.引入依赖(注意有springboot有自带的elasticsearch版本,在
<properties>标签下添加
<elasticsearch.version>7.4.2</elasticsearch.version>
)
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.4.2</version>
</dependency>
b.编写配置
容器中注入一个
RestHighLevelClient
@Configuration
public class GulimailElasticSearchConfig {
public RestHighLevelClient esRestClient(){
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(
new HttpHost("192.168.56.10", 9200, "http")));
return client;
}
}
c.测试
import org.elasticsearch.client.RestHighLevelClient;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;
@RunWith(SpringRunner.class)
@SpringBootTest
public class GulimailSearchApplicationTests {
@Autowired
private RestHighLevelClient client;
@Test
public void contextLoads() {
System.out.println(client);
}
}
d.调用es
配置请求头,放到配置文件
import org.apache.http.HttpHost;
import org.elasticsearch.client.HttpAsyncResponseConsumerFactory;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Configuration;
@Configuration
public class GulimailElasticSearchConfig {
public static final RequestOptions COMMON_OPTIONS;
static {
RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
// builder.addHeader("Authorization", "Bearer " + TOKEN);
// builder.setHttpAsyncResponseConsumerFactory(
// new HttpAsyncResponseConsumerFactory
// .HeapBufferedResponseConsumerFactory(30 * 1024 * 1024 * 1024));
COMMON_OPTIONS = builder.build();
}
@Bean
public RestHighLevelClient esRestClient(){
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(
new HttpHost("192.168.56.10", 9200, "http")));
return client;
}
}
e.测试
import com.alibaba.fastjson.JSON;
import com.zzw.gulimail.search.config.GulimailElasticSearchConfig;
import lombok.Data;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;
import java.io.IOException;
@RunWith(SpringRunner.class)
@SpringBootTest
public class GulimailSearchApplicationTests {
@Autowired
private RestHighLevelClient client;
@Test
public void contextLoads() {
System.out.println(client);
}
@Test
/*保存更新二合一*/
public void indexData() throws IOException {
IndexRequest request = new IndexRequest("users");
User user=new User();
user.setUsername("张山");
user.setAge(11);
user.setGender("男");
request.source(JSON.toJSONString(user), XContentType.JSON);
IndexResponse index= client.index(request, GulimailElasticSearchConfig.COMMON_OPTIONS);
System.out.println(index);
}
@Data
class User{
private String username;
private String gender;
private Integer age;
}
}
f.复杂检索
@Test
/*复杂检索*/
public void searchData() throws IOException {
//创建检索
SearchRequest sr=new SearchRequest();
//指定索引
sr.indices("bank");
//指定DSL检索条件
//SearchBuilder sourceBuilder
SearchSourceBuilder sourceBuilder=new SearchSourceBuilder();
//构建检索条件
sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
/*按照年龄的值进行分布聚合*/
TermsAggregationBuilder ageAgg = AggregationBuilders.terms("ageAgg").field("age").size(10);
sourceBuilder.aggregation(ageAgg);
//计算平均薪资
AvgAggregationBuilder balanceAvg = AggregationBuilders.avg("balanceAvg").field("balance");
sourceBuilder.aggregation(balanceAvg);
//sourceBuilder.from();
//sourceBuilder.size();
sr.source(sourceBuilder);
System.out.println(sourceBuilder);
//执行检索
SearchResponse searchResponse=client.search(sr,GulimailElasticSearchConfig.COMMON_OPTIONS);
//分析结果
System.out.println(searchResponse.toString());
//获取查询到的数据
SearchHits hits = searchResponse.getHits();
SearchHit[] hitsHits = hits.getHits();
for (SearchHit hit:hits){
String sourceAsString = hit.getSourceAsString();
// Account o = JSON.parseObject(sourceAsString, Account.Class);
// System.out.println("o="+o);
}
//获取检索到的分析信息
Aggregations aggregations = searchResponse.getAggregations();//拿到检索信息
/* for (Aggregation aggregation : aggregations.asList()) {
System.out.println(aggregation.getName());
}*/
Terms ageAgg1 = aggregations.get("ageAgg");
for (Terms.Bucket bucket : ageAgg1.getBuckets()) {
String keyAsString = bucket.getKeyAsString();
System.out.println("年龄:"+keyAsString+"人数:"+bucket.getDocCount());
}
Avg balanceAvg1 = aggregations.get("balanceAvg");
System.out.println("平均薪资:"+ balanceAvg1.getValue());
}