ElasticSearch入门1

最新推荐文章于 2024-05-16 12:41:37 发布

GalenZhang888

最新推荐文章于 2024-05-16 12:41:37 发布

阅读量296

点赞数

分类专栏： JAVA 文章标签： elasticsearch 搜索引擎大数据

本文链接：https://blog.csdn.net/zf14840/article/details/126068240

版权

JAVA 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

测试环境准备：3台CentOS7服务器
windows10 安装 VirtualBox 及 vagrant

ElasticSearch简介

ElasticSearch是一个基于Lucene的搜索服务器，它提供了一个分布式的全文搜索引擎，基于RESTful web接口。
ElasticSearch使用Java开发，是当前流行的企业级搜索引擎。

为用户提供按关键字查询的全文搜索功能
实现企业海量数据的处理分析的解决方案(ElasticSearch,Logstash,Kibana)
作为OLAP数据库，对数据进行统计分析

ElasticSearch基于分布式存储数据
ES把数据分成多个shard（分片），多个shard可以组成一份完整的数据，这些shard可以分布在集群中的各个机器节点中。随着数据的不断增加，集群可以增加多个分片，把多个分片放到多个服务器上，以达到负载均衡，横向扩展。

倒排索引
ES所有数据都是默认进行索引的，ES只有不加索引才需要说明

传统的保存数据的方式是记录->单词
倒排索引的保存数据的方式是单词->记录

基于分词技术构建倒排索引
首先每个记录保存数据时，都不会直接存入数据库。
系统先会对数据进行分词，然后以倒排索引结构保存。
然后等到用户搜索的时候，会把搜索的关键词也进行分词，再进行匹配；还会根据匹配程度进行打分排序

lucene倒排索引结构

lucene为倒排索引(Term Dictionary)部分又增加一层Term Index结构，用于快速定位
Term Index是缓存在内存中的
Term index -> Term dictionary -> posting list

正排索引(Doc Value列式存储)
倒排索引在搜索包含指定词条的文档时非常高效，但是在相反的操作时表现很差，如查询一个文档中包含哪些词条
倒排索引在搜索时最为高效，但在排序、聚合等与指定字段相关的操作时效率低下，需要用doc_values
Doc Values是一种列式存储结构，默认情况下每个字段的Doc Values都是激活的

列式存储结构非常适合排序、聚合以及字段相关的脚本操作。而且这种存储方式便于压缩，尤其是数字类型。压缩后能够大大减少磁盘空间，提升访问速度

官方下载地址
https://www.elastic.co/cn/downloads/past-releases/elasticsearch-7-8-0
https://github.com/medcl/elasticsearch-analysis-ik/releases
https://www.elastic.co/cn/downloads/past-releases/kibana-7-8-0
https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.8.0/elasticsearch-analysis-ik-7.8.0.zip

https://blog.csdn.net/wpc2018/article/details/121156880
ElasticSearch——IK分词器的下载及使用

IK分词器下载完后，解压安装包到ElasticSearch所在文件夹中的plugins目录中即可

修改系统配置

打开的最大文件数需要修改成65536

sudo vi /etc/security/limits.conf
#在文件最后 添加如下内容
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 65536

修改一个进程可以拥有的虚拟内存区域的数量

sudo vi /etc/sysctl.conf
#在文件最后添加如下内容
vm.max_map_count=262144


#重启生效
sysctl -p

修改允许最大线程数为4096

sudo vi /etc/security/limits.d/20-nproc.conf
#修改如下内容
* soft nproc 4096

分发修改的文件
sudo xsync /etc/security/limits.conf /etc/sysctl.conf
重启linux服务器使配置生效

部署ElasticSearch集群

tar -zxvf elasticsearch-7.8.0-linux-x86_64.tar.gz -C /opt/module/
cd /opt/module/elasticsearch-7.8.0/config


vi elasticsearch.yml
cluster.name: my-es
#当前节点名称，不能重复，例如: node-1,node-2,node-3
node.name: node-1
path.data: /opt/module/elasticsearch-7.8.0/data
path.logs: /opt/module/elasticsearch-7.8.0/log

#把bootstrap自检程序关掉
bootstrap.memory_lock: false

#允许任意ip访问
network.host: 0.0.0.0
#数据服务端口
http.port: 9200
#集群间通信端口
transport.tcp.port: 9301


#自发现配置：新节点向集群报到的主机名
#集群的介绍人节点
discovery.seed_hosts: ["hadoop101:9301", "hadoop102:9301", "hadoop103:9301"]
#默认候选master节点
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]
#集群检测的超时时间和次数
discovery.zen.fd.ping_timeout: 1m
discovery.zen.fd.ping_retries: 5

如果机器内存不够，调整ES内存

vi jvm.options
-Xms512m
-Xmx512m

分发ES
xsync /opt/module/elasticsearch-7.8.0
修改hadoop102和hadoop103上的 elasticsearch.yml 节点名 node.name: node-2/3

启动ES

cd /opt/module/elasticsearch-7.8.0
bin/elasticsearch

启动时可能遇到的问题：

future versions of Elasticsearch will require Java 11;

#注意：这种是在本地已有了jdk8，且配置了JAVA_HOME
vi /opt/module/elasticsearch-7.8.0/bin/elasticsearch-env
# now set the path to java
if [ ! -z "$JAVA_HOME" ]; then
  JAVA="/opt/module/elasticsearch-7.8.0/jdk/bin/java"
  JAVA_TYPE="JAVA_HOME"

分发修改的配置文件
xsync /opt/module/elasticsearch-7.8.0/bin/elasticsearch-env

vagrant内网地址导致集群无法通信

#org.elasticsearch.transport.ConnectTransportException: [node-2][10.0.2.15:9301] handshake failed. 
#vagrant中有一个默认地址10.0.2.15导致启动出错，指定需要绑定的具体IP
#每台机器的配置文件修改成自己的IP地址
vi /opt/module/elasticsearch-7.8.0/config/elasticsearch.yml
network.bind_host: ["192.168.56.101"]
network.publish_host: 192.168.56.101

3台ES启动成功后

curl http://hadoop101:9200
curl http://hadoop101:9200/_cat/nodes
#输出表头信息
curl http://hadoop101:9200/_cat/nodes?v

单机部署kibana

tar -zxvf kibana-7.8.0-linux-x86_64.tar.gz -C /opt/module/
mv /opt/module/kibana-7.8.0-linux-x86_64/ /opt/module/kibana-7.8.0
cd /opt/module/kibana-7.8.0/config

vi kibana.yml
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://hadoop101:9200", "http://hadoop102:9200"]

启动kibana

bin/kibana
#kibana使用nodejs开发的，使用jps是查不到对应进程的
sudo netstat -nltp | grep 5601
#可以使用命令kill掉kibana
sudo netstat -nltp | grep 5601 | awk '{print $7}' | awk -F / '{print $1}' | xargs -n1 kill

浏览器访问
http://hadoop101:5601/
点击左侧菜单Dev Tools

GET _cat/nodes?v

GET _cat/health?v
GET _cat/indices
GET _cat/shards/.kibana_1?v
GET _cat/allocation?v

#h指定要显示的列
GET _cat/count?v&h=count

GET _cat/count?v&format=json

#s指定排序
GET _cat/indices?v&s=docs.count:desc

编写ES启动停止脚本

sudo yum install -y net-tools
cd ~/bin
vi es.sh

#!/bin/bash
es_home=/opt/module/elasticsearch-7.8.0
kibana_home=/opt/module/kibana-7.8.0
if [ $# -lt 1 ]
then
	echo "USAGE:es.sh {start|stop}"
	exit
fi

case $1 in
"start")
	#启动 ES
	for i in hadoop101 hadoop102 hadoop103
	do
		echo "--- 启动 $i elasticsearch ---"
		ssh $i nohup ${es_home}/bin/elasticsearch >/dev/null 2>&1 &
	done
	#启动kibana
	ssh hadoop101 nohup ${kibana_home}/bin/kibana >/dev/null 2>&1 &
;;
"stop")
	#kibana是nodejs编写的
	#sudo netstat -nltp | grep 5601 | awk '{print $7}' | awk -F / '{print $1}' | xargs -n1 kill
	ssh hadoop101 "sudo netstat -nltp | grep 5601 | awk '{print \$7}' | awk -F / '{print \$1}' | xargs -n1 kill"
	#停止 ES
	for i in hadoop101 hadoop102 hadoop103
	do
		echo "--- 停止 $i elasticsearch ---"
		ssh $i "ps -ef|grep $es_home |grep -v grep| awk '{print\$2}'|xargs -n1 kill" >/dev/null 2>&1
	done
;;

*)
	echo "USAGE:es.sh {start|stop}"
	exit
;;
esac

修改文件权限

chmod 777 es.sh
es.sh start
es.sh stop

ES数据操作

在ES中是用一个json来表示一个document

{
    "id":"1",
    "name":"operation red",
    "doubanScore":"8.5",
    "actorList":[
        {
            "id":"1",
            "name":"zs"
        },
        {
            "id":"2",
            "name":"ls"
        },
        {
            "id":"3",
            "name":"w5"
        }
    ]
}

#创建索引不指定字段，按照第一条数据自动推断
PUT /movie_index

#删除索引
DELETE /movie_index

#查看索引的mapping(类似mysql中的desc table)
GET /movie_index/_mapping

#往索引中写入数据，幂等写入要指定docid
PUT /movie_index/_doc/1
{
    "id": 1,
    "name":"operation red sea",
    "doubanScore":8.5,
    "actorList":[
        {
            "id":1,
            "name":"zhang yi"
        },
        {
            "id":2,
            "name":"hai qing"
        },
        {
            "id":3,
            "name":"zhang han yu"
        }
    ]
}

PUT /movie_index/_doc/2
{
    "id": 2,
    "name":"incident red sea",
    "doubanScore":5,
    "actorList":[
        {
            "id":4,
            "name":"intmall"
        }
    ]
}

PUT /movie_index/_doc/3
{
    "id": 3,
    "name":"operation meigong river",
    "doubanScore":8.0,
    "actorList":[
        {
            "id":3,
            "name":"zhang han yu"
        }
    ]
}

字符串的索引类型有2种：text(倒排索引)、keyword(列式存储)
列式存储不使用分词，完整匹配字段内容

#查询索引中所有的doc
GET /movie_index/_search

#非幂等写入不指定docid
POST /movie_index/_doc
{
    "id": 1,
    "name":"operation meigong river",
    "doubanScore":8.0,
    "actorList":[
        {
            "id":3,
            "name":"zhang han yu"
        }
    ]
}

#修改，整体替换
PUT /movie_index/_doc/3
{
    "id": 3,
    "name":"hello world"
}

#部分更新
POST /movie_index/_update/1
{
	"doc": {
		"name":"operation red sea haha"
	}
}

#查询一个doc
GET /movie_index/_doc/1

#删除一个doc
DELETE /movie_index/_doc/1

#分词查找
GET /movie_index/_search
{
  "query": {
    "match": {
      "name": "red"
    }
  }
}

#分词查找
GET /movie_index/_search
{
  "query": {
    "match": {
      "actorList.name": "zhang yi"
    }
  }
}

#分词查找
GET /movie_index/_search
{
  "query": {
    "term": {
      "name": {
        "value": "red"
      }
    }
  }
}


#注意：使用短语无法匹配完全数据
GET /movie_index/_search
{
  "query": {
    "term": {
      "name": {
        "value": "operation red sea"
      }
    }
  }
}


#按列式查找(不再分词)
GET /movie_index/_search
{
  "query": {
    "match": {
      "actorList.name.keyword": "zhang yi"
    }
  }
}


#短语匹配(不基于分词，包含完整的短语)
GET /movie_index/_search
{
  "query": {
    "match_phrase": {
      "name": "operation red"
    }
  }
}

#条件过滤等值判断
GET /movie_index/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "name.keyword": "operation red sea"
          }
        }
      ]
    }
  }
}

#filter 必须满足
#must   必须满足
#should 不是必须满足，如果能满足会打分；如果不满足，不会打分，但都会出现到结果中
#分词匹配 + 条件过滤
GET /movie_index/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "red sea"
        }}
      ], 
      "filter": [
        {
          "term": {
            "actorList.name.keyword": "zhang yi"
          }
        }
      ]
    }
  }
}

GET /movie_index/_search
{
  "query": {
    "bool": {
      "should": [
        {"match": {
          "name": "red sea"
        }}
      ], 
      "filter": [
        {
          "term": {
            "actorList.name.keyword": "zhang yi"
          }
        }
      ]
    }
  }
}

#范围过滤
GET /movie_index/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "doubanScore": {
              "gte": 5,
              "lte": 8
            }
          }
        }
      ]
    }
  }
}


#过滤 - 修改
POST /movie_index/_update_by_query
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "actorList.name.keyword": "zhang han yu"
          }
        }
      ]
    }
  },
  "script": {
    "source": "ctx._source['doubanScore']=params.newName",
    "params": {
      "newName": 9.0
    }, 
    "lang": "painless"
  }
}


#过滤 删除
POST /movie_index/_delete_by_query
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "actorList.name.keyword": "zhang han yu"
          }
        }
      ]
    }
  }
}


#排序
GET /movie_index/_search
{
  "sort": [
    {
      "doubanScore": {
        "order": "desc"
      }
    }
  ]
}

#查询 + 排序
GET /movie_index/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "actorList.name.keyword": "zhang han yu"
          }
        }
      ]
    }
  }, 
  "sort": [
    {
      "doubanScore": {
        "order": "desc"
      }
    }
  ]
}


#分页查询
GET /movie_index/_search
{
  "from": 0,
  "size": 2
}

#高亮
GET /movie_index/_search
{
  "query": {
    "match": {
      "name": "red sea"
    }
  },
  "highlight": {
    "fields": {
      "name": {}
    }
  }
}

聚合操作

#聚合，按演员名称分组
GET /movie_index/_search
{
  "aggs": {
    "groupByActionName": {
      "terms": {
        "field": "actorList.name.keyword",
        "size": 10
      }
    }
  }
}


#分组，组内求平均分，再排序
GET /movie_index/_search
{
  "aggs": {
    "groupByActionName": {
      "terms": {
        "field": "actorList.name.keyword",
        "size": 10,
        "order": {
          "doubanScoreAvg": "desc"
        }
      },
      "aggs": {
        "doubanScoreAvg": {
          "avg": {
            "field": "doubanScore"
          }
        }
      }
    }
  }
}


#在做聚合时如果不想要返回数据明细，可以设置"size": 0
GET /movie_index/_search
{
  "aggs": {
    "groupByActionName": {
      "terms": {
        "field": "actorList.name.keyword",
        "size": 10,
        "order": {
          "doubanScoreAvg": "desc"
        }
      },
      "aggs": {
        "doubanScoreAvg": {
          "avg": {
            "field": "doubanScore"
          }
        }
      }
    }
  },
  "size": 0
}

使用SQL查询

GET _sql?format=txt
{
  "query": """
  SELECT actorList.name.keyword, avg(doubanScore) doubanScoreAvg
  FROM "movie_index"
  group by actorList.name.keyword
  order by doubanScoreAvg desc
  """
}


#SQL 分词匹配
GET _sql?format=txt
{
  "query": """
  SELECT actorList.name.keyword, avg(doubanScore) doubanScoreAvg
  FROM "movie_index"
  where match(name, 'red sea')
  group by actorList.name.keyword
  order by doubanScoreAvg desc
  """
}

中文分词测试

PUT /movie_index_cn/_doc/1
{
    "id":1,
    "name":"红海行动",
    "doubanScore":8.5,
    "actorList":[
        {
            "id":1,
            "name":"张译"
        },
        {
            "id":2,
            "name":"海清"
        },
        {
            "id":3,
            "name":"张涵予"
        }
    ]
}

PUT /movie_index_cn/_doc/2
{
    "id": 2,
    "name":"湄公河行动",
    "doubanScore":8,
    "actorList":[
        {
            "id":3,
            "name":"张涵予"
        }
    ]
}

PUT /movie_index_cn/_doc/3
{
    "id": 3,
    "name":"红海事件",
    "doubanScore":5.0,
    "actorList":[
        {
            "id":4,
            "name":"张三"
        }
    ]
}


GET /movie_index_cn/_search
{
  "query": {
    "match": {
      "name": "红海"
    }
  }
}


#ES默认对中文的分词(按字拆分)
GET _analyze
{
  "text": ["上海银行"]
}

安装分词器ik

sudo yum install -y unzip
mkdir /opt/module/elasticsearch-7.8.0/plugins/ik
unzip /opt/software/elasticsearch-analysis-ik-7.8.0.zip -d /opt/module/elasticsearch-7.8.0/plugins/ik/
xsync /opt/module/elasticsearch-7.8.0/plugins/ik/
es.sh stop
es.sh start

分词测试

#ik_smart、ik_max_word
GET _analyze
{
  "text": ["上海银行"],
  "analyzer": "ik_smart"
}

GET _analyze
{
  "text": ["我是中国人"],
  "analyzer": "ik_max_word"
}

创建索引并手动指定分词器

DELETE movie_index_cn
PUT movie_index_cn
{
    "settings":{
        "number_of_shards":1
    },
    "mappings":{
        "properties":{
            "actorList":{
                "properties":{
                    "id":{
                        "type":"long"
                    },
                    "name":{
                        "type":"keyword"
                    }
                }
            },
            "doubanScore":{
                "type":"float"
            },
            "id":{
                "type":"long"
            },
            "name":{
                "type":"text",
                "analyzer":"ik_smart"
            }
        }
    }
}


GET /movie_index_cn/_mapping
#重新创建数据，再次查询数据
GET /movie_index_cn/_search
{
  "query": {
    "match": {
      "name": "上海银行"
    }
  }
}

ES分割索引

ES不允许对索引结构进行修改；分割索引就是根据时间间隔把一个业务索引切分成多个索引。
例如将某个业务存储数据使用到的索引，设计成以小时、天、周等分割后的多个索引。这样，每次分割都可以应对一次字段的变更。

查询范围优化
结构变化的灵活性

索引别名
索引别名就像一个快捷方式或软连接，可以指向一个或多个索引，也可以给任何一个需要索引名的API来使用。

给多个索引分组
分割索引可以解决数据结构变更的场景，但是分割的频繁，如果想要统计一个大周期，数据是分散到不同的索引中的，统计比较麻烦。
我们可以将分割的索引取相同的别名，这样，我们在统计时直接指定别名即可。
给索引的一个子集创建视图：将一个索引中的部分数据(基于某个条件 )创建别名，查询此部分数据时，可以直接使用别名
在运行的集群中可以无缝的从一个索引切换到另一个索引

#给现有的索引取别名
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "movie_index",
        "alias": "movie_index_2021"
      }
    },
    {
      "add": {
        "index": "movie_index_cn",
        "alias": "movie_index_2021"
      }
    }
  ]
}


GET /movie_index_2021/_search


#创建索引时可以直接指定别名
PUT movie_index_1
{
    "settings":{
        "number_of_shards":1
    },
    "aliases": {
      "movie_index_2022": {}
    }, 
    "mappings":{
        "properties":{
            "actorList":{
                "properties":{
                    "id":{
                        "type":"long"
                    },
                    "name":{
                        "type":"keyword"
                    }
                }
            },
            "doubanScore":{
                "type":"float"
            },
            "id":{
                "type":"long"
            },
            "name":{
                "type":"text",
                "analyzer":"ik_smart"
            }
        }
    }
}


#给索引的子集创建视图(别名)
GET /movie_index/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "doubanScore": {
              "gte": 5,
              "lte": 8
            }
          }
        }
      ]
    }
  }
}

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "movie_index",
        "alias": "movie_index_dbs",
        "filter": {
          "range": {
            "doubanScore": {
              "gte": 5,
              "lte": 8
            }
          }
        }
      }
    }
  ]
}

GET /movie_index_dbs/_search


#索引无缝切换
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "movie_index",
        "alias": "movie_2022"
      }
    }
  ]
}

GET /movie_2022/_search

POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": "movie_index",
        "alias": "movie_2022"
      }
    },
    {
      "add": {
        "index": "movie_index_cn",
        "alias": "movie_2022"
      }
    }
  ]
}


GET /movie_2022/_search


#查询别名列表
GET /_cat/aliases?v

索引模板

PUT _template/template_movie_2022
{
  "index_patterns": ["movie_test*"],
  "settings": {
    "number_of_shards": 1
  },
  "aliases": {
    "{index}_query": {},
    "movie_test_query": {}
  }, 
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "movie_name": {
        "type": "text",
        "analyzer": "ik_smart"
      }
    }
  }
}

PUT movie_test_20220601/_doc/1
{
  "id": "1001",
  "movie_name": "瞬息全宇宙"
}

GET movie_test_20220601/_mapping
GET movie_test_query/_search

#查看已有模板
GET _cat/templates?v

#查看某个模板详情
GET _template/template_movie_2022