Elasticsearch原理浅析及常见操作
前言
初步整理了ES实操部分的学习笔记,原理部分一带而过,后续消化后再进行补充,文章如有不当,欢迎大佬们指教呀,笔芯~~
简介
特点
- 分布式全文搜索引擎,基于Lucene进行封装
- 倒排索引又叫反向索引,根据文章内容中的关键字建立索引
- Master-slave 架构,实现了数据的分片和备份
- 集群,可扩展
对比
- 关系型数据库
- Solr
- Elasticsearch
基本概念
- 索引(类比:mysql的库)
- 类型(类比:mysql的表)
- 文档(类比:mysql的行)
- 倒排索引
- Keyword类型与Text类型的区别
- keyword类型可以进行排序和聚合、检索过滤
- text类型可以不能够进行排序和聚合
三大过程
- 爬取内容
- 分词过滤
- 建立倒排索引
基本操作
ES对外提供的了REST风格的API(GET、POST、PUT、DELETE、HEAD),我们可通过客户端操作ES
参考博文:基础增删改查
参考博文:常用查询与聚合
- 创建索引
PUT /example/
{
"settings":{
"index": {
"number_of_shards":5, //分片数
"number_of_replicas":1 //复制数
}
}
}
- 查询索引
//查询索引example的设置
GET /example/_settings
//查询所有的索引设置
GET _all/_settings
- 添加文档
PUT /example/student/1
{
"name" : "shwuan",
"age" : 18,
"createTime":"2020-12-10 00:41:00"
}
//使用PUT添加,其中student表示类型(type),1代表文档主键
//使用POST添加,id可不传,ES会自动生成主键
- 修改文档
//PUT方式,将会把原来对应文档覆盖掉
PUT /example/student/1
{
"name" : "shwuan01",
"age" : 22,
"createTime":"2020-12-10 00:41:00"
}
//POST方式,可针对field来修改,比PUT要轻量
{
POST /example/student/1/_update
"doc": {
"age" :24
}
}
- 删除索引或文档
DELETE /example/student/1 //删除文档
DELETE example //删除索引
- 查询文档
//1.查询ID为1的数据
GET /example/student/1
//2.查询全部
GET /example/student/_search
{
"query":{
"match_all":{}
}
}
或
GET /example/student/_search
//3.分页查询以term为例)
GET /example/student/_search
{
"from":0,
"size":100,
"query":{
"term":{
"name":"huan"
}
}
}
//4.排序
GET /example/student/_search
{
"query":{
"term":{
"name":"swhuan"
}
},
"sort":[
{"age":{"order":"asc"}}
]
}
//5.全文查询
//查询字段会被索引和分析,在执行之前将每个字段的分词器(或搜索分词器)应用于查询字符串。
//(1)match query
{
"query": {
"match": {
"name": {
"query": "人类与自然",
"operator": "and" //默认是or:表示分词后所有词项只要出现一个就会被搜索 and:所有词项同时出现才会被搜索
}
}
}
}
//(2)match_phrase query
//文档同时满足下面两个条件才会被搜索到:(i)分词后所有词项都要出现在该字段中 (ii)字段中的词项顺序要一致
{
"query": {
"match_phrase": {
"name": "人类与自然"
}
}
}
//6. 词项查询
//词项搜索时对倒排索引中存储的词项进行精确匹配,词项级别的查询通常用于结构化数据,如数字、日期和枚举类型
//(1)term query
{
"query": {
"term": {
"createTime": "2020-12-10 00:41:00"
}
}
}
//(2)terms query
{
"query": {
"terms": {
"createTime": [
"2015-12-10 00:41:00",
"2016-02-01 01:39:00"
]
}
}
}
//(3)range query
//匹配某一范围内的数据型、日期类型或者字符串型字段的文档,注意只能查询一个字段,不能作用在多个字段上
//支持的操作符=》 gt:大于,gte:大于等于,lt:小于,lte:小于等于
//(i)数值
{
"query": {
"range": {
"age": {
"gte": 16,
"lte": 50
}
}
}
}
//(i)日期
{
"query": {
"range": {
"createTime": {
"gte": "2016-09-01 00:00:00",
"lte": "2016-09-30 23:59:59",
"format": "yyyy-MM-dd HH:mm:ss" //如果写的时间格式正确,format可不加
}
}
}
}
//(4)exists query
//返回对应字段中至少有一个非空值的文档
{
"query": {
"exists": {
"field": "name"
}
}
}
//(5)ids query
//查询具有指定id的文档
{
"query": {
"ids": {
"type": "student", //类型可选
"values": "1"
}
}
}
//7.复合查询
//(1)bool query(实际工作中用得多)
//must:文档必须匹配must选项下的查询条件,相当于逻辑运算的AND
//should:文档可以匹配should选项下的查询条件,也可以不匹配,相当于逻辑运算的OR
//must_not:与must相反,匹配该选项下的查询条件的文档不会被返回
//filter:和must一样,匹配filter选项下的查询条件的文档才会被返回,但是filter不评分,只起到过滤功能
//注意:搜索字段类型,若为keyword,term查询可以精确匹配,若为text,则不一定能匹配(如果有添加分词器,则可以搜索到;如果没有,而是使用默认的分词器,只是将其分为一个一个的字,就不会被搜索到)
{
"size": 1,
"query": {
"bool": {
"must": [
{
"match": {
"name": "swhuan"
}
},
{
"match": {
"name": "人类"
}
}
]
}
},
"sort": [
{
"id": {
"order": "desc"
}
}
]
}
//8.滚动查询scroll
GET spnews/news/_search?scroll=1m
{
"query": {
"match_all": {}
},
"size": 10,
"_source": ["id"]
}
GET _search/scroll
{
"scroll":"1m",
"scroll_id":"DnF1ZXJ5VGhlbkZldGNoAwAAAAAAADShFmpBMjJJY2F2U242RFU5UlAzUzA4MWcAAAAAAAA0oBZqQTIySWNhdlNuNkRVOVJQM1MwODFnAAAAAAAANJ8WakEyMkljYXZTbjZEVTlSUDNTMDgxZw==" //scrollId只在这个时间窗口内有效
}
- 聚合
- 指标聚合(类比MySQL的聚合函数)
//1.max
{
"size": 0, //若不为0,除了返回聚合结果外,还会返回其它所有的数据
"aggs": {
"max_id": {
"max": {
"field": "id"
}
}
}
}
//2.min
{
"size": 0,
"aggs": {
"min_id": {
"min": {
"field": "id"
}
}
}
}
//3.avg
{
"size": 0,
"aggs": {
"avg_id": {
"avg": {
"field": "id"
}
}
}
}
//4.sum
{
"size": 0,
"aggs": {
"sum_id": {
"sum": {
"field": "id"
}
}
}
}
//5.stats
{
"size": 0,
"aggs": {
"stats_id": {
"stats": {
"field": "id"
}
}
}
}
- 桶聚合(类别MySQL的group by操作)
不要尝试对es中text的字段进行桶聚合,否则会失败
//1.Terms
//相当于分组查询,根据字段做聚合,在桶聚合的过程中还可以进行指标聚合
{
"size": 0,
"aggs": {
"per_count": {
"terms": {
"field": "age"
},
"aggs": {
"sum_id": {
"sum": {
"field": "id"
}
}
}
}
}
}
//2.Filter
//相当于MySQL根据where条件过滤出结果,然后再做各种max、min、avg、sum、stats操作
{
"size": 0,
"aggs": {
"gender_1": {
"filter": {
"term": {
"gender": 0
}
},
"aggs": {
"sum_age": {
"sum": {
"field": "age"
}
}
}
}
}
}
//3.Range
//to:小于 from:大于等于
{
"size": 0,
"aggs": {
"age_ranges": {
"range": {
"field": "age",
"ranges": [
{
"to": 12
},
{
"from": 15,
"to": 20
}
]
}
}
}
}
//4.Date Range
GET /example/student/_search
{
"size": 0,
"aggs": {
"agg_year": {
"date_histogram": {
"field": "createTime",
"interval": "day", //可按年(year)月(month)日(day)对数据进行聚合
"order": {
"_key": "asc"
}
}
}
}
}
应用场景
- 搜索引擎
- ELK系统(日志分析系统)
- E(Elasticsearch) L(Logstash) K(Kibana)
- 图解(图片摘自网络,侵删!)
安装
Java连接ES
Transport
通过TCP方式访问ES(只支持java),ES官方的发展规划中在将在7.0版本开始废弃TransportClient,8.0版本中完全移除TransportClient
REST
通过http API的方式访问ES(没有语言限制)
Low Level REST Client (少用)
High Level REST Client(常用,推荐使用)
- 引入依赖
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>6.8.10</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>6.8.10</version>
</dependency>
- 配置项配置项(.properties文件)
#es搜索引擎配置
es.host=localhost
es.port=9200
es.scheme=http
- 加载配置类
@Configuration
public class ESConfig {
@Value("${es.host}")
private String host;
@Value("${es.port}")
private Integer port;
@Value("${es.scheme}")
private String scheme;
@Bean
public RestHighLevelClient restHighLevelClient() {
return new RestHighLevelClient(
RestClient.builder(
new HttpHost(host, port, scheme) ));
}
}
- 工具类
@Component
@Slf4j
public class EsUtil<T> {
public static final char UNDERLINE = '_';
@Autowired
@Qualifier(value = "restHighLevelClient")
private RestHighLevelClient client;
/**
* 单个添加
*
* @param t
* @return
*/
public boolean save(T t) {
String indexName = camelToUnderline(t.getClass().getSimpleName(), 1);
// 获取@Id注解内容
String id = JSON.parseObject(JSON.toJSONString(t)).getString(getIdName(t));
IndexRequest indexRequest = new IndexRequest(indexName, indexName, id);
indexRequest.source(JSON.toJSONString(t), XContentType.JSON);
try {
IndexResponse indexResponse = client.index(indexRequest, RequestOptions.DEFAULT);
log.info("restHighLevelClient save index success and result is : {}", indexResponse.getResult());
return true;
} catch (IOException e) {
log.error("restHighLevelClient save index failed");
}
return false;
}
/**
* 批量新增
*
* @param ts
* @return
*/
public boolean saveAll(List<T> ts) {
BulkRequest bulkRequest = new BulkRequest();
for (T t : ts) {
String indexName = camelToUnderline(t.getClass().getSimpleName(), 1);
// 获取@Id注解内容
String id = JSON.parseObject(JSON.toJSONString(t)).getString(getIdName(t));
IndexRequest indexRequest = new IndexRequest(indexName, indexName, id);
indexRequest.source(JSON.toJSONString(t), XContentType.JSON);
bulkRequest.add(indexRequest);
}
try {
// 4.调用方法进行数据通信
client.bulk(bulkRequest, RequestOptions.DEFAULT);
return true;
} catch (IOException e) {
log.error("restHighLevelClient saveAll index failed");
}
return false;
}
/**
* 删除
*
* @param id
* @param classT
* @return
*/
public boolean deleteById(String id, Class<T> classT) {
String indexName = camelToUnderline(classT.getSimpleName(), 1);
try {
// 1.构建删除请求对象,指定索引库、类型、id
DeleteRequest deleteRequest = new DeleteRequest(indexName, indexName, id);
// 2.调用方法进行数据通信
DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);
return true;
} catch (IOException e) {
log.error("restHighLevelClient saveAll index failed");
}
return false;
}
/**
* 查询
*
* @param queryBuilder
* @param t
* @return
*/
public JSONArray find(QueryBuilder queryBuilder, Class<T> t) {
JSONArray results = new JSONArray();
String indexName = camelToUnderline(t.getSimpleName(), 1);
SearchRequest searchRequest = new SearchRequest(indexName);
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(queryBuilder);
searchRequest.source(sourceBuilder);
try {
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] hits = searchResponse.getHits().getHits();
for (SearchHit hit : hits) {
String sourceAsString = hit.getSourceAsString();
results.add(JSONObject.parseObject(sourceAsString));
}
return results;
} catch (IOException e) {
}
return results;
}
/**
* 分页查询
* @param queryBuilder
* @param sortBuilderList
* @param pageNum
* @param pageSize
* @param t
* @return
*/
public JSONObject findPage(QueryBuilder queryBuilder,List<SortBuilder> sortBuilderList,Integer pageNum,Integer pageSize, Class<T> t) {
JSONObject result = new JSONObject();
JSONObject pageInfo = new JSONObject();
JSONArray list = new JSONArray();
String indexName = camelToUnderline(t.getSimpleName(), 1);
SearchRequest searchRequest = new SearchRequest(indexName);
//封装查询条件
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(queryBuilder);
sourceBuilder.from(pageNum-1);
sourceBuilder.size(pageSize);
// sourceBuilder.sort("_score", SortOrder.DESC)
// .sort("heat", SortOrder.DESC);
if(sortBuilderList.size()>0){
for (SortBuilder sortBuilder : sortBuilderList) {
sourceBuilder.sort(sortBuilder);
}
}
searchRequest.source(sourceBuilder);
try {
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] hits = searchResponse.getHits().getHits();
for (SearchHit hit : hits) {
String sourceAsString = hit.getSourceAsString();
list.add(JSONObject.parseObject(sourceAsString));
}
pageInfo.put("totalPages",(searchResponse.getHits().totalHits+pageSize-1)/pageSize);
pageInfo.put("totalElements",searchResponse.getHits().totalHits);
pageInfo.put("pageNum",pageNum);
pageInfo.put("pageSize",pageSize);
result.put("pageInfo",pageInfo);
result.put("list",list);
return result;
} catch (IOException e) {
}
return result;
}
/**
* 驼峰转下划线
*
* @param param
* @param charType
* @return
*/
public static String camelToUnderline(String param, Integer charType) {
if (param == null || "".equals(param.trim())) {
return "";
}
int len = param.length();
StringBuilder sb = new StringBuilder(len);
for (int i = 0; i < len; i++) {
char c = param.charAt(i);
if (Character.isUpperCase(c) && i > 0) {
sb.append(UNDERLINE);
}
if (charType == 2) {
//统一都转大写
sb.append(Character.toUpperCase(c));
} else {
//统一都转小写
sb.append(Character.toLowerCase(c));
}
}
return sb.toString();
}
/**
* 获取@id字段
*
* @param instance
* @return
*/
public static String getIdName(Object instance) {
try {
Class<?> clazz = instance.getClass();
Field[] fields = clazz.getDeclaredFields();
for (int i = 0; i < fields.length; i++) {
boolean annotationPresent = fields[i].isAnnotationPresent(Id.class);
if (annotationPresent) {
// 获取注解值
String idName = fields[i].getName();
return idName;
}
}
} catch (Exception e) {
log.error("not found id");
}
return "";
}
}
- 使用示例
- Bean
@Data
public class EsTest {
@Id
private String id;
private String name;
private Integer age;
public EsTest(String id, String name, Integer age) {
this.id = id;
this.name = name;
this.age = age;
}
- 测试
/**
* es方法测试
* @author swhuan
*/
@RequestMapping("/test")
@RestController
public class TestEsController {
@Autowired
private EsUtil esUtil;
/**
* 插入
* @return
*/
@GetMapping(value = "testCrud")
public Object testCrud() {
List<EsTest> esTests = new ArrayList<>();
esTests.add(new EsTest("1", "张三", 12));
esTests.add(new EsTest("2", "李四", 18));
esTests.add(new EsTest("3", "王五", 22));
esTests.add(new EsTest("4", "赵六", 25));
esTests.add(new EsTest("5", "赵六", 27));
esUtil.saveAll(esTests);
return "success";
}
/**
* 普通查询
* @return
*/
@GetMapping(value = "testQuery")
public Object testQuery(){
//条件=> name="张三" or (name like "%赵六%" and age=27) or (age between 18 and 27)
BoolQueryBuilder builder = QueryBuilders.boolQuery()
//词项匹配查询
.should(QueryBuilders.termQuery("name.keyword","张三"))
.should(QueryBuilders.boolQuery()
//匹配查询
.must(QueryBuilders.matchQuery("name","赵六"))
//精确查询
.must(QueryBuilders.termQuery("age","27")))
//范围查询
.should(QueryBuilders.rangeQuery("age").from(18).to(27));
return esUtil.find(builder,EsTest.class);
}
/**
* 分页查询
* @return
*/
@GetMapping(value = "testPageQuery")
public Object testPageQuery(){
BoolQueryBuilder builder = QueryBuilders.boolQuery()
.should(QueryBuilders.rangeQuery("age").from(18).to(27));
List<SortBuilder> sortBuilderList = new ArrayList<>();
sortBuilderList.add(SortBuilders.fieldSort("age").order(SortOrder.DESC));
return esUtil.findPage(builder,sortBuilderList,1,6,EsTest.class);
}
}
SpringBoot集成ES
ElasticsearchTemplate方式
ElasticsearchRepository方式
- 依赖引入(注意spring版本兼容问题)
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
- 配置项(.properties文件)
spring.elasticsearch.rest.uris=http://localhost:9200
- 使用示例
- Bean
@Data
@Document(indexName = "test",shards = 5,replicas = 0,createIndex = true)
public class EsTest {
@Id
private String id;
@Field(type = FieldType.Text)
private String name;
@Field(type = FieldType.Text)
private Integer age;
public EsTest(String id, String name, Integer age) {
this.id = id;
this.name = name;
this.age = age;
}
}
- Dao
public interface EsTestRepository extends ElasticsearchRepository<EsTest,String> {
List<EsTest> findAllByNameIn(List<String> names);
}
- 测试
使用方法与同普通jpa的操作
/**
*新增
*/
public Object testCrud() {
List<EsTest> esTests = new ArrayList<>();
esTests.add(new EsTest("1", "张三", 12));
esTests.add(new EsTest("2", "李四", 18));
esTests.add(new EsTest("3", "王五", 22));
Iterable<EsTest> esTests1 = esTestRepository.saveAll(esTests);
return esTests1;
}
/**
*查询
*/
public Object testQuery() {
List<String> names = new ArrayList<>();
names.add("张三");
names.add("李四");
Iterable<EsTest> esTests1 = esTestRepository.findAllByNameIn(names);
return esTests1;
}