Elasticsearch原理浅析及常见使用

Elasticsearch原理浅析及常见操作

前言

初步整理了ES实操部分的学习笔记,原理部分一带而过,后续消化后再进行补充,文章如有不当,欢迎大佬们指教呀,笔芯~~

简介

es2

特点

  1. 分布式全文搜索引擎,基于Lucene进行封装
  2. 倒排索引又叫反向索引,根据文章内容中的关键字建立索引
  3. Master-slave 架构,实现了数据的分片和备份
  4. 集群,可扩展

对比

  1. 关系型数据库
  2. Solr
  3. Elasticsearch

基本概念

  1. 索引(类比:mysql的库)
  2. 类型(类比:mysql的表)
  3. 文档(类比:mysql的行)
  4. 倒排索引
  5. Keyword类型与Text类型的区别
  • keyword类型可以进行排序和聚合、检索过滤
  • text类型可以不能够进行排序和聚合

三大过程

  1. 爬取内容
  2. 分词过滤
  3. 建立倒排索引

基本操作

ES对外提供的了REST风格的API(GET、POST、PUT、DELETE、HEAD),我们可通过客户端操作ES
参考博文:基础增删改查
参考博文:常用查询与聚合

  1. 创建索引
PUT /example/
{
  "settings":{
    "index": {
      "number_of_shards":5, //分片数
      "number_of_replicas":1 //复制数
    }
  }
}
  1. 查询索引
//查询索引example的设置
GET /example/_settings 

//查询所有的索引设置
GET _all/_settings 
  1. 添加文档
PUT /example/student/1
{
  "name" : "shwuan",
  "age"  :  18,
  "createTime":"2020-12-10 00:41:00"
}
//使用PUT添加,其中student表示类型(type),1代表文档主键
//使用POST添加,id可不传,ES会自动生成主键
  1. 修改文档
//PUT方式,将会把原来对应文档覆盖掉  
PUT /example/student/1    
{
  "name" : "shwuan01", 
  "age"  :  22,
  "createTime":"2020-12-10 00:41:00"
}

//POST方式,可针对field来修改,比PUT要轻量
{ 
POST /example/student/1/_update  
  "doc": {
    "age" :24
  }
}
  1. 删除索引或文档
DELETE /example/student/1  //删除文档
DELETE example    //删除索引
  1. 查询文档
 //1.查询ID为1的数据
GET /example/student/1  


//2.查询全部
GET /example/student/_search
{
    "query":{
        "match_all":{}
    }
}
或
GET /example/student/_search

//3.分页查询以term为例)
GET /example/student/_search
{
    "from":0,
    "size":100,
    "query":{
        "term":{
            "name":"huan"
        }
    }
}

//4.排序
GET /example/student/_search
{
    "query":{
        "term":{
            "name":"swhuan"
        }
    },
    "sort":[
        {"age":{"order":"asc"}}
    ]
}
//5.全文查询
//查询字段会被索引和分析,在执行之前将每个字段的分词器(或搜索分词器)应用于查询字符串。

//(1)match query
{
  "query": {
    "match": {
      "name": {
        "query": "人类与自然",
        "operator": "and" //默认是or:表示分词后所有词项只要出现一个就会被搜索 and:所有词项同时出现才会被搜索
      }
    }
  }
}

//(2)match_phrase query  
//文档同时满足下面两个条件才会被搜索到:(i)分词后所有词项都要出现在该字段中 (ii)字段中的词项顺序要一致

{
  "query": {
    "match_phrase": {
      "name": "人类与自然"
    }
  }
}

//6. 词项查询
//词项搜索时对倒排索引中存储的词项进行精确匹配,词项级别的查询通常用于结构化数据,如数字、日期和枚举类型

//(1)term query
{
  "query": {
    "term": {
      "createTime": "2020-12-10 00:41:00"
    }
  }
}

//(2)terms query
{
  "query": {
    "terms": {
      "createTime": [
        "2015-12-10 00:41:00",
        "2016-02-01 01:39:00"
      ]
    }
  }
}

//(3)range query
//匹配某一范围内的数据型、日期类型或者字符串型字段的文档,注意只能查询一个字段,不能作用在多个字段上
//支持的操作符=》 gt:大于,gte:大于等于,lt:小于,lte:小于等于
//(i)数值
{
  "query": {
    "range": {
      "age": {
        "gte": 16,
        "lte": 50
      }
    }
  }
}

//(i)日期
{
  "query": {
    "range": {
      "createTime": {
        "gte": "2016-09-01 00:00:00",
        "lte": "2016-09-30 23:59:59",
        "format": "yyyy-MM-dd HH:mm:ss" //如果写的时间格式正确,format可不加
      }
    }
  }
}

//(4)exists query
//返回对应字段中至少有一个非空值的文档
{
  "query": {
    "exists": {
      "field": "name"
    }
  }
}

//(5)ids query
//查询具有指定id的文档
{
  "query": {
    "ids": {
      "type": "student", //类型可选
      "values": "1"
    }
  }
}

//7.复合查询
//(1)bool query(实际工作中用得多)
//must:文档必须匹配must选项下的查询条件,相当于逻辑运算的AND
//should:文档可以匹配should选项下的查询条件,也可以不匹配,相当于逻辑运算的OR
//must_not:与must相反,匹配该选项下的查询条件的文档不会被返回
//filter:和must一样,匹配filter选项下的查询条件的文档才会被返回,但是filter不评分,只起到过滤功能
//注意:搜索字段类型,若为keyword,term查询可以精确匹配,若为text,则不一定能匹配(如果有添加分词器,则可以搜索到;如果没有,而是使用默认的分词器,只是将其分为一个一个的字,就不会被搜索到)
{
  "size": 1,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "swhuan"
          }
        },
        {
          "match": {
            "name": "人类"
          }
        }
      ]
    }
  },
  "sort": [
    {
      "id": {
        "order": "desc"
      }
    }
  ]
}

//8.滚动查询scroll
GET spnews/news/_search?scroll=1m
{
  "query": {
    "match_all": {}
  },
  "size": 10,
  "_source": ["id"]
}

GET _search/scroll
{
  "scroll":"1m",
  "scroll_id":"DnF1ZXJ5VGhlbkZldGNoAwAAAAAAADShFmpBMjJJY2F2U242RFU5UlAzUzA4MWcAAAAAAAA0oBZqQTIySWNhdlNuNkRVOVJQM1MwODFnAAAAAAAANJ8WakEyMkljYXZTbjZEVTlSUDNTMDgxZw==" //scrollId只在这个时间窗口内有效
}

  1. 聚合
  • 指标聚合(类比MySQL的聚合函数)
//1.max
{
  "size": 0, //若不为0,除了返回聚合结果外,还会返回其它所有的数据
  "aggs": {
    "max_id": {
      "max": {
        "field": "id"
      }
    }
  }
}

//2.min
{
  "size": 0,
  "aggs": {
    "min_id": {
      "min": {
        "field": "id"
      }
    }
  }
}
//3.avg
{
  "size": 0,
  "aggs": {
    "avg_id": {
      "avg": {
        "field": "id"
      }
    }
  }
}

//4.sum
{
  "size": 0,
  "aggs": {
    "sum_id": {
      "sum": {
        "field": "id"
      }
    }
  }
}
//5.stats
{
  "size": 0,
  "aggs": {
    "stats_id": {
      "stats": {
        "field": "id"
      }
    }
  }
}
  • 桶聚合(类别MySQL的group by操作)
    不要尝试对es中text的字段进行桶聚合,否则会失败
//1.Terms
//相当于分组查询,根据字段做聚合,在桶聚合的过程中还可以进行指标聚合
{
  "size": 0,
  "aggs": {
    "per_count": {
      "terms": {
        "field": "age"
      },
      "aggs": {
        "sum_id": {
          "sum": {
            "field": "id"
          }
        }
      }
    }
  }
}

//2.Filter
//相当于MySQL根据where条件过滤出结果,然后再做各种max、min、avg、sum、stats操作
{
  "size": 0,
  "aggs": {
    "gender_1": {
      "filter": {
        "term": {
          "gender": 0
        }
      },
      "aggs": {
        "sum_age": {
          "sum": {
            "field": "age"
          }
        }
      }
    }
  }
}

//3.Range
//to:小于 from:大于等于
{
  "size": 0,
  "aggs": {
    "age_ranges": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "to": 12
          },
          {
            "from": 15,
            "to": 20
          }
        ]
      }
    }
  }
}

//4.Date Range
GET /example/student/_search
{
  "size": 0, 
  "aggs": {
    "agg_year": {
      "date_histogram": {
        "field": "createTime",
        "interval": "day", //可按年(year)月(month)日(day)对数据进行聚合 
        "order": {
          "_key": "asc"
        }
      }
    }
  }
}

应用场景

  1. 搜索引擎
  2. ELK系统(日志分析系统)
  • E(Elasticsearch) L(Logstash) K(Kibana)
  • 图解(图片摘自网络,侵删!)
    elk

安装

参考我的博文

Java连接ES

Transport

通过TCP方式访问ES(只支持java),ES官方的发展规划中在将在7.0版本开始废弃TransportClient,8.0版本中完全移除TransportClient

REST

通过http API的方式访问ES(没有语言限制)

Low Level REST Client (少用)
High Level REST Client(常用,推荐使用)
  1. 引入依赖
   <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>6.8.10</version>
   </dependency>
   <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>6.8.10</version>
   </dependency>
  1. 配置项配置项(.properties文件)
#es搜索引擎配置
es.host=localhost
es.port=9200
es.scheme=http
  1. 加载配置类
@Configuration
public class ESConfig {

    @Value("${es.host}")
    private String host;
    @Value("${es.port}")
    private Integer port;
    @Value("${es.scheme}")
    private String scheme;

    @Bean
    public RestHighLevelClient restHighLevelClient() {
        return new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost(host, port, scheme) ));
    }

}

  1. 工具类
@Component
@Slf4j
public class EsUtil<T> {

    public static final char UNDERLINE = '_';

    @Autowired
    @Qualifier(value = "restHighLevelClient")
    private RestHighLevelClient client;

    /**
     * 单个添加
     *
     * @param t
     * @return
     */
    public boolean save(T t) {
        String indexName = camelToUnderline(t.getClass().getSimpleName(), 1);
        // 获取@Id注解内容
        String id = JSON.parseObject(JSON.toJSONString(t)).getString(getIdName(t));
        IndexRequest indexRequest = new IndexRequest(indexName, indexName, id);
        indexRequest.source(JSON.toJSONString(t), XContentType.JSON);
        try {
            IndexResponse indexResponse = client.index(indexRequest, RequestOptions.DEFAULT);
            log.info("restHighLevelClient save index success and result is : {}", indexResponse.getResult());
            return true;
        } catch (IOException e) {
            log.error("restHighLevelClient save index failed");
        }
        return false;
    }

    /**
     * 批量新增
     *
     * @param ts
     * @return
     */
    public boolean saveAll(List<T> ts) {
        BulkRequest bulkRequest = new BulkRequest();
        for (T t : ts) {
            String indexName = camelToUnderline(t.getClass().getSimpleName(), 1);
            // 获取@Id注解内容
            String id = JSON.parseObject(JSON.toJSONString(t)).getString(getIdName(t));
            IndexRequest indexRequest = new IndexRequest(indexName, indexName, id);
            indexRequest.source(JSON.toJSONString(t), XContentType.JSON);
            bulkRequest.add(indexRequest);
        }
        try {
            // 4.调用方法进行数据通信
            client.bulk(bulkRequest, RequestOptions.DEFAULT);
            return true;
        } catch (IOException e) {
            log.error("restHighLevelClient saveAll index failed");
        }
        return false;
    }

    /**
     * 删除
     *
     * @param id
     * @param classT
     * @return
     */
    public boolean deleteById(String id, Class<T> classT) {
        String indexName = camelToUnderline(classT.getSimpleName(), 1);
        try {
            // 1.构建删除请求对象,指定索引库、类型、id
            DeleteRequest deleteRequest = new DeleteRequest(indexName, indexName, id);

            // 2.调用方法进行数据通信
            DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);
            return true;
        } catch (IOException e) {
            log.error("restHighLevelClient saveAll index failed");
        }
        return false;
    }

    /**
     * 查询
     *
     * @param queryBuilder
     * @param t
     * @return
     */
    public JSONArray find(QueryBuilder queryBuilder, Class<T> t) {
        JSONArray results = new JSONArray();
        String indexName = camelToUnderline(t.getSimpleName(), 1);
        SearchRequest searchRequest = new SearchRequest(indexName);
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        sourceBuilder.query(queryBuilder);
        searchRequest.source(sourceBuilder);
        try {
            SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
            SearchHit[] hits = searchResponse.getHits().getHits();
            for (SearchHit hit : hits) {
                String sourceAsString = hit.getSourceAsString();
                results.add(JSONObject.parseObject(sourceAsString));
            }
            return results;
        } catch (IOException e) {
        }
        return results;
    }

/**
     * 分页查询
     * @param queryBuilder
     * @param sortBuilderList
     * @param pageNum
     * @param pageSize
     * @param t
     * @return
     */
    public JSONObject findPage(QueryBuilder queryBuilder,List<SortBuilder> sortBuilderList,Integer pageNum,Integer pageSize, Class<T> t) {
        JSONObject result  = new JSONObject();
        JSONObject pageInfo = new JSONObject();
        JSONArray list = new JSONArray();
        String indexName = camelToUnderline(t.getSimpleName(), 1);
        SearchRequest searchRequest = new SearchRequest(indexName);
        //封装查询条件
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        sourceBuilder.query(queryBuilder);
        sourceBuilder.from(pageNum-1);
        sourceBuilder.size(pageSize);
//        sourceBuilder.sort("_score", SortOrder.DESC)
//                .sort("heat", SortOrder.DESC);
        if(sortBuilderList.size()>0){
            for (SortBuilder sortBuilder : sortBuilderList) {
                sourceBuilder.sort(sortBuilder);
            }
        }
        searchRequest.source(sourceBuilder);
        try {
            SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
            SearchHit[] hits = searchResponse.getHits().getHits();
            for (SearchHit hit : hits) {
                String sourceAsString = hit.getSourceAsString();
                list.add(JSONObject.parseObject(sourceAsString));
            }
            pageInfo.put("totalPages",(searchResponse.getHits().totalHits+pageSize-1)/pageSize);
            pageInfo.put("totalElements",searchResponse.getHits().totalHits);
            pageInfo.put("pageNum",pageNum);
            pageInfo.put("pageSize",pageSize);
            result.put("pageInfo",pageInfo);
            result.put("list",list);
            return result;
        } catch (IOException e) {
        }
        return result;
    }

    /**
     * 驼峰转下划线
     *
     * @param param
     * @param charType
     * @return
     */
    public static String camelToUnderline(String param, Integer charType) {
        if (param == null || "".equals(param.trim())) {
            return "";
        }
        int len = param.length();
        StringBuilder sb = new StringBuilder(len);
        for (int i = 0; i < len; i++) {
            char c = param.charAt(i);
            if (Character.isUpperCase(c) && i > 0) {
                sb.append(UNDERLINE);
            }
            if (charType == 2) {
                //统一都转大写
                sb.append(Character.toUpperCase(c));
            } else {
                //统一都转小写
                sb.append(Character.toLowerCase(c));
            }
        }

        return sb.toString();
    }

    /**
     * 获取@id字段
     *
     * @param instance
     * @return
     */
    public static String getIdName(Object instance) {
        try {
            Class<?> clazz = instance.getClass();
            Field[] fields = clazz.getDeclaredFields();
            for (int i = 0; i < fields.length; i++) {
                boolean annotationPresent = fields[i].isAnnotationPresent(Id.class);
                if (annotationPresent) {
                    // 获取注解值
                    String idName = fields[i].getName();
                    return idName;
                }
            }
        } catch (Exception e) {
            log.error("not found id");
        }
        return "";
    }

}

  1. 使用示例
  • Bean
@Data
public class EsTest {
    @Id
    private String id;


    private String name;

    private Integer age;


    public EsTest(String id, String name, Integer age) {
        this.id = id;
        this.name = name;
        this.age = age;
    }
  • 测试
/**
 * es方法测试
 * @author swhuan
 */
@RequestMapping("/test")
@RestController
public class TestEsController {
    @Autowired
    private EsUtil esUtil;

    /**
     * 插入
     * @return
     */
    @GetMapping(value = "testCrud")
    public Object testCrud() {
        List<EsTest> esTests = new ArrayList<>();
        esTests.add(new EsTest("1", "张三", 12));
        esTests.add(new EsTest("2", "李四", 18));
        esTests.add(new EsTest("3", "王五", 22));
        esTests.add(new EsTest("4", "赵六", 25));
        esTests.add(new EsTest("5", "赵六", 27));
        esUtil.saveAll(esTests);
        return "success";
    }

    /**
     * 普通查询
     * @return
     */
    @GetMapping(value = "testQuery")
    public Object testQuery(){

        //条件=> name="张三" or (name like "%赵六%" and age=27) or (age between 18 and 27)
        BoolQueryBuilder builder = QueryBuilders.boolQuery()
                //词项匹配查询
                .should(QueryBuilders.termQuery("name.keyword","张三"))
                .should(QueryBuilders.boolQuery()
                         //匹配查询
                        .must(QueryBuilders.matchQuery("name","赵六"))
                        //精确查询
                        .must(QueryBuilders.termQuery("age","27")))
                //范围查询
                .should(QueryBuilders.rangeQuery("age").from(18).to(27));
        return esUtil.find(builder,EsTest.class);

    }

    /**
     * 分页查询
     * @return
     */
    @GetMapping(value = "testPageQuery")
    public Object testPageQuery(){
        BoolQueryBuilder builder = QueryBuilders.boolQuery()
                .should(QueryBuilders.rangeQuery("age").from(18).to(27));
        List<SortBuilder> sortBuilderList = new ArrayList<>();
        sortBuilderList.add(SortBuilders.fieldSort("age").order(SortOrder.DESC));
        return esUtil.findPage(builder,sortBuilderList,1,6,EsTest.class);
    }

}

SpringBoot集成ES

ElasticsearchTemplate方式

ElasticsearchRepository方式

  1. 依赖引入(注意spring版本兼容问题)
 <dependency>
           <groupId>org.springframework.boot</groupId>
           <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
 </dependency>
  1. 配置项(.properties文件)
spring.elasticsearch.rest.uris=http://localhost:9200
  1. 使用示例
  • Bean

@Data
@Document(indexName = "test",shards = 5,replicas = 0,createIndex = true)
public class EsTest {
    @Id
    private String id;

    @Field(type = FieldType.Text)
    private String name;

    @Field(type = FieldType.Text)
    private Integer age;


    public EsTest(String id, String name, Integer age) {
        this.id = id;
        this.name = name;
        this.age = age;
    }
}

  • Dao
public interface EsTestRepository extends ElasticsearchRepository<EsTest,String> {

List<EsTest> findAllByNameIn(List<String> names);

}
  • 测试
使用方法与同普通jpa的操作

/**
*新增
*/
 public Object testCrud() {
        List<EsTest> esTests = new ArrayList<>();
        esTests.add(new EsTest("1", "张三", 12));
        esTests.add(new EsTest("2", "李四", 18));
        esTests.add(new EsTest("3", "王五", 22));
        Iterable<EsTest> esTests1 = esTestRepository.saveAll(esTests);
        return esTests1;
 }

 /**
 *查询
 */
  public Object testQuery() {
        List<String> names = new ArrayList<>();
        names.add("张三");
        names.add("李四");
        Iterable<EsTest> esTests1 = esTestRepository.findAllByNameIn(names);
        return esTests1;
    }

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值