ElasticSearch-全文检索
Elastic 的底层是开源库 Lucene。但是,你没法直接用 Lucene,必须自己写代码去调用它的接口。Elastic 是 Lucene 的封装,提供了 REST API 的操作接口,开箱即用。REST API:天然的跨平台。
官方文档:https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
1、Index(索引)
动词,相当于 MySQL 中的 insert;
名词,相当于 MySQL 中的 Database
2、Type(类型)
在 Index(索引)中,可以定义一个或多个类型。
类似于 MySQL 中的 Table;每一种类型的数据放在一起;
3、Document(文档)
保存在某个索引(Index)下,某种类型(Type)的一个数据(Document),文档是 JSON 格
式的,Document 就像是 MySQL 中的某个 Table 里面的内容;
4、倒排索引机制
docker安装ElasticSearch
1、下载镜像文件
docker pull elasticsearch:7.4.2 存储和检索数据
docker pull kibana:7.4.2 可视化检索数据
2、创建实例
mkdir -p /mydata/elasticsearch/config
mkdir -p /mydata/elasticsearch/data
echo "http.host: 0.0.0.0" >> /mydata/elasticsearch/config/elasticsearch.yml
chmod -R 777 /mydata/elasticsearch/ 保证权限
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx512m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.4.2
特别注意:
-e ES_JAVA_OPTS="-Xms64m -Xmx256m" \ 测试环境下,设置 ES 的初始内存和最大内存,否则导
致过大启动不了 ES
2、Kibana
docker run --name kibana -e ELASTICSEARCH_HOSTS=http://192.168.***.***:9200 -p 5601:5601 \
-d kibana:7.4.2
http://192.168.***.***:9200 一定改为自己虚拟机的地址
初步检索
1、_cat
GET /_cat/nodes:查看所有节点
GET /_cat/health:查看 es 健康状况
GET /_cat/master:查看主节点
GET /_cat/indices:查看所有索引 show databases;
索引文档 (保存)
PUT
请求
http://192.168.217.128:9200/person/woman/1
json数据
{
"name":"迪丽热巴"
}
返回:
{
"_index": "person",
"_type": "woman",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
POST
请求
http://192.168.217.128:9200/person/woman/{id}
{id} 可不携带
POST
新增。如果不指定 id,会自动生成 id。指定 id 就会修改这个数据,并新增版本号。
PUT
可以新增可以修改,多次发送相同请求,版本号会增加
PUT
必须指定 id;由于 PUT
需要指定 id,我们一般都用来做修改操作,不指定 id 会报错。
查询文档
GET
http://192.168.217.128:9200/person/woman/1
#http://104.1**.4444.**.81*
返回:
{
"_index": "person",
"_type": "woman",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"name": "迪丽热巴"
}
}
修改文档
需携带 ?if_seq_no=0&if_primary_term=1
PUT
http://192.168.217.128:9200/person/woman/1?if_seq_no=1&if_primary_term=1
更新文档
POST
http://192.168.217.128:9200/person/woman/1/_update
指定id并携带 /_update
请求参数:
{
"doc":{
"name": "古力娜扎"
}
}
返回:
{
"_index": "person",
"_type": "woman",
"_id": "1",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1
}
POST 路径携带/_update 会先检查元数据,多次发送相同请求,版本号、seq_no等都不变,响应为 "result": "noop",
POST 不路径携带/_update ,多次发送相同请求,数据就会进行更新,与PUT相同
可以直接增加属性
删除文档
DELETE
http://192.168.217.128:9200/person/woman/1
删除索引
DELETE
http://192.168.217.128:9200/person
用kibana批量导入索引
POST /person/woman/_bulk
{"index":{"_id":"1"}}
{"name": "迪丽热巴" }
{"index":{"_id":"2"}}
{"name": "古力娜扎" }
批量API
POST /_bulk
{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title": "My first blog post" }
{ "index": { "_index": "website", "_type": "blog" }}
{ "title": "My second blog post" }
{ "update": { "_index": "website", "_type": "blog", "_id": "123"} }
{ "doc" : {"title" : "My updated blog post"} }
高级测试
导入数据
POST bank/account/_bulk
https://github.com/elastic/elasticsearch/blob/v7.4.2/docs/src/test/resources/accounts.json
========================
全文检索:
SearchAPI
GET请求 对account_number升序
GET bank/_search?q=*&sort=account_number:asc
==
Query DSL
GET请求 对account_number降序
GET bank/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"account_number": {
"order": "desc"
}
}
]
}
match【匹配查询】
GET bank/_search
{
"query": {
"match": {
"address": "River Street" // 查询包含 River 或 Street
}
}
}
match_phrase【短语匹配】
GET bank/_search
{
"query": {
"match_phrase": {
"address": "River Street" //查询包含 River Street
}
}
}
multi_match【多字段匹配】
GET bank/_search
{
"query": {
"multi_match": {
"query": "Street Dante", // 分词
"fields": ["address","city"] //在address包含Street或Dante 或 在city包含Street或Dante
}
}
}
bool【复合查询】
GET bank/_search
{
"query": {
"bool": {
"must": [
{"match": { "age": 31 } },
{"match": { "gender": "F"} }
],
"must_not": [
{ "match": { "balance": 14097 } }
],
"should": [
{"match": { "firstname": "Josephine" } } //可不满足 一旦满足则加分
]
}
}
}
filter【结果过滤】
只是过滤结果,不会贡献相关得分
GET bank/_search
{
"query": {
"bool": {
"filter": {
"range": {
"age": {
"gte": 30,
"lte": 35
}
}
}
}
}
}
term
全文检索字段用 match,其他非 text 字段匹配用 term。
GET bank/_search
{
"query": {
"term": { // 对于精确值 要用term
"age": {
"value": 28
}
}
}
}
GET bank/_search
{
"query": {
"match_phrase": { // 短语匹配 匹配 '789 Madison' 分词
"address": "789 Madison"
}
}
}
GET bank/_search
{
"query": {
"match": {
"address.keyword": "789 Madison" //精确匹配 '789 Madison' 不分词
}
}
}
aggregations(执行聚合)
按照年龄聚合,并且请求这些年龄段的这些人的平均薪资
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageAgg": {
"terms": {
"field": "age", // 按照年龄聚合
"size": 100,
"order": {
"_term": "asc"
}
},
"aggs": { // 根据上次聚合的结果 再次聚合
"blanceAvg": {
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
查出所有年龄分布,并且这些年龄段中 M 的平均薪资和 F 的平均薪资以及这个年龄段的总体平均薪资
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageTerms": {
"terms": {
"field": "age",
"size": 1000,
"order": {
"_key": "asc"
}
},
"aggs": {
"genderTerms": {
"terms": {
"field": "gender.keyword",
"size": 1000
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
},
"bananceAvgs":{
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
Mapping
GET bank/_mapping
查看映射信息
## 创建映射
PUT /my_index
{
"mappings": {
"properties": {
"age":{"type": "integer"},
"email":{"type": "keyword"},
"name":{"type": "text"}
}
}
}
对已创建的映射增加新的字段
PUT /my_index/_mapping
{
"properties": {
"id":{"type":"keyword","index":false} // index默认为true 可被索引 false不可被索引
}
}
更新映射
对于已经存在的映射字段,我们不能更新。更新必须创建新的索引进行数据迁移
数据迁移
先创建出 新索引
的正确映射。然后使用如下方式进行数据迁移
POST _reindex [固定写法]
{
"source": { "index": "old_index"},
"dest": { "index": "new_index"}
}
将旧索引的 type 下的数据进行迁移
POST _reindex
{
"source": {"index": "old_index", "type": "old_type"},
"dest": { "index": "new_index"}
}
分词
https://github.com/medcl/elasticsearch-analysis-ik/releases
下载对应版本的ik分词器,放置对应的目录
nginx
mkdir nginx
docker run -p 80:80 --name nginx -d nginx:1.10
将容器内的配置文件拷贝到当前目录
docker container cp nginx:/etc/nginx .
创建nginx
docker run -p 80:80 --name nginx \
-v /mydata/nginx/html:/usr/share/nginx/html \
-v /mydata/nginx/logs:/var/log/nginx \
-v /mydata/nginx/conf:/etc/nginx \
-d nginx:1.10
在此目录下配置自定义分词规则。
(配置完重启es)
SpringBoot 整合 ES
<elasticsearch.version>7.4.2</elasticsearch.version>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.4.2</version>
</dependency>
配置类
@Configuration
public class ElasticConfig {
public static final RequestOptions COMMON_OPTIONS;
static {
RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
// builder.addHeader("Authorization", "Bearer " + TOKEN);
// builder.setHttpAsyncResponseConsumerFactory(
// new HttpAsyncResponseConsumerFactory
// .HeapBufferedResponseConsumerFactory(30 * 1024 * 1024 * 1024));
COMMON_OPTIONS = builder.build();
}
@Bean
public RestHighLevelClient client() {
RestClientBuilder builder = RestClient.builder(new HttpHost("192.168.217.128", 9200, "http"));
return new RestHighLevelClient(builder);
}
}
单元测试索引
@Autowired
RestHighLevelClient client;
/**
* 索引数据
*/
@Test
void indexTest() throws IOException {
IndexRequest indexRequest = new IndexRequest("person");
indexRequest.id("1");
//用json方式进行索引
Person person = new Person("迪丽热巴","M",18);
String json = JSON.toJSONString(person);
indexRequest.source(json, XContentType.JSON);
IndexResponse index = client.index(indexRequest, ElasticConfig.COMMON_OPTIONS);
System.out.println(index);
}
@Data
class Person{
private String userName;
private String Gender;
private Integer age;
public Person(String userName, String gender, Integer age) {
this.userName = userName;
Gender = gender;
this.age = age;
}
}
聚合测试
@Autowired
RestHighLevelClient client;
/**
* 复杂检索
*/
@Test
void searchTest() throws IOException {
//创建检索请求
SearchRequest searchRequest = new SearchRequest();
//指定索引
searchRequest.indices("bank","person");
/*
指定DSL
创建SearchSourceBuilder
构造检索条件
*/
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchQuery("address","mill"));
//聚合
//(1)按照平均年龄聚合
TermsAggregationBuilder ageAgg = AggregationBuilders.terms("ageAgg").field("age").size(10);
searchSourceBuilder.aggregation(ageAgg);
//(2)按照平均薪资聚合
AvgAggregationBuilder blanceAgg = AggregationBuilders.avg("balanceAvg").field("balance");
searchSourceBuilder.aggregation(blanceAgg);
System.out.println(searchSourceBuilder.toString());
searchRequest.source(searchSourceBuilder);
//执行检索
SearchResponse searchResponse = client.search(searchRequest, ElasticConfig.COMMON_OPTIONS);
//System.out.println(searchResponse.toString());
SearchHits hits = searchResponse.getHits();
SearchHit[] searchHits = hits.getHits();
for (SearchHit searchHit : searchHits) {
String sourceAsString = searchHit.getSourceAsString();
AccoutVO accoutVO = JSON.parseObject(sourceAsString, AccoutVO.class);
System.out.println(accoutVO);
}
Aggregations aggregations = searchResponse.getAggregations();
//按类型获取 aggregations
Terms ageAggRes = aggregations.get("ageAgg");
for (Terms.Bucket bucket : ageAggRes.getBuckets()) {
String key = bucket.getKeyAsString();
long docCount = bucket.getDocCount();
System.out.println("年龄段:"+key+" 人数:"+docCount);
}
Avg balanceAvg = aggregations.get("balanceAvg");
double value = balanceAvg.getValue();
System.out.println(value);
}
@Data
static class AccoutVO {
private int account_number;
private int balance;
private String firstname;
private String lastname;
private int age;
private String gender;
private String address;
private String employer;
private String email;
private String city;
private String state;
}