面向文档存储的,文档数据会被序列化为json格式。倒排索引:
文档:每条数据就是一个文档
词条:文档按照语义分成的词语
字段:Json文档中的字段
索引:同类型文档的集合
映射:索引中文档的约束,比如字段名称、类型
数据库负责事务类型操作
es负责海量数据的搜索,分析,计算
ElasticSearch安装
因为我们还需要部署kibana容器,因此需要让es和kibana容器互联。这里先创建一个网络:
docker network create es-net
将es.tar到虚拟机中,运行命令加载:
# 导入数据
docker load -i es.tar
同理还有kibana
的tar包也需要这样做。
运行docker命令,部署单点es:
docker run -d \
--name es \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
-e "discovery.type=single-node" \
-v es-data:/usr/share/elasticsearch/data \
-v es-plugins:/usr/share/elasticsearch/plugins \
--privileged \
--network es-net \
-p 9200:9200 \
-p 9300:9300 \
elasticsearch:7.12.1
命令解释:
-
-e "cluster.name=es-docker-cluster"
:设置集群名称 -
-e "http.host=0.0.0.0"
:监听的地址,可以外网访问 -
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m"
:内存大小 -
-e "discovery.type=single-node"
:非集群模式 -
-v es-data:/usr/share/elasticsearch/data
:挂载逻辑卷,绑定es的数据目录 -
-v es-logs:/usr/share/elasticsearch/logs
:挂载逻辑卷,绑定es的日志目录 -
-v es-plugins:/usr/share/elasticsearch/plugins
:挂载逻辑卷,绑定es的插件目录 -
--privileged
:授予逻辑卷访问权 -
--network es-net
:加入一个名为es-net的网络中 -
-p 9200:9200
:端口映射配置
在浏览器中输入:http://你的ip:9200
部署kibana
运行docker命令,部署kibana
docker run -d \
--name kibana \
-e ELASTICSEARCH_HOSTS=http://es:9200 \
--network=es-net \
-p 5601:5601 \
kibana:7.12.1
-
--network es-net
:加入一个名为es-net的网络中,与elasticsearch在同一个网络中 -
-e ELASTICSEARCH_HOSTS=http://es:9200"
:设置elasticsearch的地址,因为kibana已经与elasticsearch在一个网络,因此可以用容器名直接访问elasticsearch -
-p 5601:5601
:端口映射配置
kibana启动一般比较慢,需要多等待一会,可以通过命令:
docker logs -f kibana
查看运行日志,
ik分词器
安装:
# 进入容器内部
docker exec -it elasticsearch /bin/bash
# 在线下载并安装
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.12.1/elasticsearch-analysis-ik-7.12.1.zip
#退出
exit
#重启容器
docker restart elasticsearch
本地load
安装插件需要知道elasticsearch的plugins目录位置,而我们用了数据卷挂载,因此需要查看elasticsearch的数据卷目录,通过下面命令查看:
docker volume inspect es-plugins
显示结果:
[
{
"CreatedAt": "2022-05-06T10:06:34+08:00",
"Driver": "local",
"Labels": null,
"Mountpoint": "/var/lib/docker/volumes/es-plugins/_data",
"Name": "es-plugins",
"Options": null,
"Scope": "local"
}
]
说明plugins目录被挂载到了:/var/lib/docker/volumes/es-plugins/_data
这个目录中。
把ik文件夹上传到-data目录
重启容器
# 4、重启容器
docker restart es
# 查看es日志
docker logs -f es
测试:
IK分词器包含两种模式:
-
ik_smart
:最少切分,智能切分,粗粒度 -
ik_max_word
:最细切分,细粒度
在目录:/var/lib/docker/volumes/es-plugins/_data/ik/config
下,IKAnalyzer.cfg.xml文件可以配置扩展词及停用词条文件,在文件中设置要扩展或者停用的词条
索引库操作
type:字段数据类型
字符串:text,keyword
数值:long、integer、shot、byte、double、float
布尔:boolean
日期:date
对象:object
index:是否创建索引,默认为true
analyzer:使用那种分词器
properties:该字段的子字段
创建索引库:
#创建索引库
PUT /索引库名称
{
"mappings":{
"properties":{
"info":{
"type":"text",
"analyzer":"ik_smart"
},
"email":{
"type":"keyword",
"index":false
},
"name":{
"type":"object",
"properties":{
"firstname":{
"type":"keyword"
},
"lastName":{
"type":"keyword"
}
}
}
}
}
}
#查询索引库
GET /索引库名称
#删除索引库
DELETE /索引库名称
eg:DELETE /zz
#往索引库新添字段
PUT /zz/_mapping
{
"properties":{
"age":{
"type":"long"
}
}
}
eg:
PUT /zz/_mapping
{
"properties":{
"age":{
"type":"long"
}
}
}
新增文档
DSL语法:
#新增文档索引
POST /索引库名/_doc/文档id
{
"字段1":"值1",
"字段2":"值2",
"字段3":{
"子属性1":"值3",
"子属性2":"值4",
}
//...
}
eg:
POST /zz/_doc/1
{
"info":"码农的历程",
"email":"23651@12",
"name":{
"firstName":"云",
"lastName":"赵"
}
}
#查看文档语法
GET /索引库名/_doc/文档id
eg:GET /zz/_doc/1
#删除文档语法
DELETE /索引库名/_doc/文档id
eg:DELETE /zz/_doc/1
修改文档
方式一:全量修改,会删除旧文档,添加新文档
PUT /索引库名/_doc/文档id
{
"字段1":"值1",
"字段2":"值2",
}
eg:
PUT /zz/_doc/1
{
"info":"码农的33历程",
"email":"23651@12",
"name":{
"firstName":"云",
"lastName":"赵"
}
}
方式二:增量修改,修改指定字段值
POST /索引库名/_update/文档id
{
"doc":{
"字段名":"新的值"
}
}
eg:
POST /zz/_update/1
{
"doc":{
"email":"3652621@ddd"
}
}
RestClient操作索引库
1.引入es的restHighlevel依赖
<properties>
<java.version>1.8</java.version>
<elasticsearch.version>7.12.1</elasticsearch.version>
</properties>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
</dependency>
2.创建客户端
package cn.itcast.hotel;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
/**
* @ClassName HotelIndexTest
* @Description TODO
* @date 2024/9/2 16:52
* @Version 1.0
*/
public class HotelIndexTest {
private RestHighLevelClient client;
@Test
void testInit(){
System.out.println(client);
}
@BeforeEach
void setUp() throws Exception {
this.client = new RestHighLevelClient(RestClient.builder(HttpHost.create("http://192.168.216.132:9200")));
}
@AfterEach
void tearDown() throws Exception {
this.client.close();
}
}
创建索引
@Test
void createHotelIndex() throws IOException {
//创建request对象
CreateIndexRequest request = new CreateIndexRequest("hotel");
//2.准备请求的参数:dsl语句
request.source(MAPPING_TEMPLATE, XContentType.JSON);
//3.发起请求
client.indices().create(request, RequestOptions.DEFAULT);
}
删除索引
@Test
void createIndex() throws IOException {
DeleteIndexRequest request = new DeleteIndexRequest("hotel");
//3.发起请求
client.indices().delete(request, RequestOptions.DEFAULT);
}
查看索引是否存在
@Test
void testExistsIndex() throws IOException {
GetIndexRequest request = new GetIndexRequest("hotel");
//3.发起请求
boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
System.err.println(exists?"索引库已经存在":"索引库已经不存在");
}
创建文档索引
@Test
void testAddDocument() throws Exception {
Hotel hotel = hotelService.getById(61083L);
HotelDoc hotelDoc = new HotelDoc(hotel);
IndexRequest request=new IndexRequest("hotel").id(hotel.getId().toString());
request.source(JSON.toJSONString(hotelDoc), XContentType.JSON);
client.index(request, RequestOptions.DEFAULT);
}
查询文档索引
@Test
void testGetDocument() throws Exception {
GetRequest request=new GetRequest("hotel","61083");
GetResponse response=client.get(request, RequestOptions.DEFAULT);
String sourceAsString = response.getSourceAsString();
//结果反序列化
HotelDoc jsonObject = JSON.parseObject(sourceAsString,HotelDoc.class);
System.out.println(jsonObject);
}
更新文档
@Test
void testUpdateDocument() throws Exception {
UpdateRequest request=new UpdateRequest("hotel","61083");
request.doc(
"price","666"
);
client.update(request, RequestOptions.DEFAULT);
}
删除文档
@Test
void testDeleteDocument() throws Exception {
client.delete(new DeleteRequest("hotel","61083"), RequestOptions.DEFAULT);
}
DSL查询语法
查询所有:查询出所有数据,一般测试用
全文检索(full text)查询:利用分词器对用户输入内容分词,然后倒排索引到库中匹配,例如:ids、range、term
地理查询:根据经纬度查询,例如:geo_distance、geo_bounding_box
复合(compound)查询:复合查询可以将上述各种查询条件组合起来,合并查询条件。例如:bool、function_score
查询基本语法:
GET /hotel/_search
{
"query": {
"查询类型":{
"查询条件":"条件值"
}
}
}
#match单字段查询
GET /hotel/_search
{
"query": {
"match": {
"查询字段":"条件值"
}
}
}
#多字段multi_match查询
GET /hotel/_search
{
"query": {
"multi_match": {
"query": "如家",
"fields": ["brand","name","business"]
}
}
}
#精确查询
GET /hotel/_search
{
"query": {
"term": {
"city": {
"value": "上海"
}
}
}
}
#范围查询
GET /hotel/_search
{
"query": {
"range": {
"price": {
"gte": 1000,
"lte": 2000
}
}
}
}
#经纬度查询:根据范围查
GET /hotel/_search
{
"query": {
"geo_bounding_box": {
"FIELD": {
"top_left": {
"lat":31.1,
"lon":121.5
},
"bottom_right":{
"lat":30.9,
"lon":121.7
}
}
}
}
}
#经纬度查询:根据距离查
GET /hotel/_search
{
"query": {
"geo_distance":{
"distance":"2km",
"location": "31.21,121.5"
}
}
}
#复合查询
ES相关性打分
三要素:过滤条件,算分函数,加权方式
TF(词条频率)=词条出现次数/文档中词条总数
TF-IDF(逆文档频率):log(文档总数/包含词条的文档总数)
BM25算法
score=词条频率*IDF
FunctionScoreQuery,可以修改文档的相关性算分
-
script_score(自定义脚本打分,通过写脚本的方法自定义打分)
-
weight(权重,符合某条件时会打多少分)
-
random_score(随机)
-
field_value_factor(字段因子)
-
decay functions:
gauss
,linear
,exp(越近越好)
GET /hotel/_search
{
"query": {
"function_score": {
"query": {
"match": {
"all": "外滩"
},
"functions": [
{
"filter":{
"term":{
"brand":"如家"
}
},
"weight":10 #加分
}
],
"boost_mode":"sum"
}
}
}
}
复合BooleanQuery查询:
(1)must
:必须匹配每个子查询,类似“与”
(2)should
:选择性匹配子查询,类似“或”
(3)must_not
:必须不匹配,不参与算分,类似“非”
(4)filter
:必须匹配,不参与算分
GET /hotel/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "如家"
}
}
],
"must_not": [
{
"range": {
"price": {
"gt": 400
}
}
}
],
"filter": [
{
"geo_distance": {
"distance": "10km",
"location": {
"lat": 31.21,
"lon": 121.5
}
}
}
]
}
}
}
排序:
#常规排序
GET /hotel/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"score": {
"order": "desc"
},
"price": {
"order": "asc"
}
}
]
}
按照某个地址距离排序
GET /hotel/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 31.03,
"lon": 121.61228
},
"order": "asc",
"unit": "km"
}
}
]
}
es默认返回10条数据。
可以通过修改from(文档开始的位置),size(期望获取的文档总数)参数控制
GET /hotel/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"price": {
"order": "asc"
}
}
],
"from": 10,
"size": 10
}
es是分布式的,所以会面临深度分页的问题。ES设定结果集上限是10000条,解决方案:
search after:分页时需要排序,原理是从上一次的排序值开始,查询下一页数据。
scroll:原理将排序数据形成快照,保存在内存
高亮
在搜索结果中把搜索关键字突出显示
将搜索结果中的关键字用标签标记出来,在页面中给标签添加css
GET /hotel/_search
{
"query": {
"match": {
"name": "如家"#这个name默认要和下边的name一致,如不一致,require_field_match改false
}
},
"highlight": {
"fields": {
"name": {
"require_field_match": "true"
}
}
}
}
RestClient查询文档
查询文档
@Test
void testMatchAll() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.matchAllQuery());
SearchResponse response=client.search(request, RequestOptions.DEFAULT);
SearchHits hit=response.getHits();
long total=hit.getTotalHits().value;
System.out.println("共搜索到"+total+"条数据");
SearchHit[] hits=hit.getHits();
Arrays.stream(hits).map(item->item.getSourceAsString()).forEach(System.out::println);
}
分页
@Test
void testPageAndSort() throws IOException {
int page=1,size=5;
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.matchAllQuery());
request.source().sort("price", SortOrder.ASC);//设置价格排序
request.source().from((page-1)*size).size(5);//页码,每页大小
SearchResponse response=client.search(request, RequestOptions.DEFAULT);
SearchHits hit=response.getHits();
long total=hit.getTotalHits().value;
System.out.println("共搜索到"+total+"条数据");
SearchHit[] hits=hit.getHits();
Arrays.stream(hits).map(item->item.getSourceAsString()).forEach(System.out::println);
}
高亮
@Test
void testHighlight() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.matchQuery("name","如家"));
request.source().highlighter(new HighlightBuilder().field("name").requireFieldMatch(false));
SearchResponse response=client.search(request, RequestOptions.DEFAULT);
testHighlight(response);
}
public void testHighlight(SearchResponse response) throws IOException {
SearchHits searchHits=response.getHits();
long total=searchHits.getTotalHits().value;
System.out.println("共搜索到"+total+"条数据");
SearchHit[] hits=searchHits.getHits();
for (SearchHit hit:hits){
String json=hit.getSourceAsString();
HotelDoc hotelDoc= JSON.parseObject(json, HotelDoc.class);
Map<String, HighlightField> highlightFields = hit.getHighlightFields();
if(!CollectionUtils.isEmpty(new Map[]{highlightFields})){
HighlightField highlightField = highlightFields.get("name");
String name=highlightField.getFragments()[0].toString();
hotelDoc.setName(name);
}
System.out.println("hotelDoc:"+hotelDoc);
}
}