本文旨在帮助没有使用过elasticsearch的开发人员达到快速上手的目的,文章并不是很难,但有些地方要注重理解和练习。
一.elasticsearch简介
以下摘自百度百科:
Elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java语言开发的,并作为Apache许可条款下的开放源码发布,是一种流行的企业级搜索引擎。Elasticsearch用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。官方客户端在Java、.NET(C#)、PHP、Python、Apache Groovy、Ruby和许多其他语言中都是可用的。根据DB-Engines的排名显示,Elasticsearch是最受欢迎的企业搜索引擎,其次是Apache Solr,也是基于Lucene。
二.elasticsearch安装
本文安装以elasticsearch-6.3.1.tar.gz 为例(单节点安装)
https://www.elastic.co/cn/downloads/past-releases/elasticsearch-6-6-0
1.准备好一台安装Java1.8环境的虚机,并配置好ip
2.解压到/opt/module 目录下
3.修改配置文件 /opt/module/elasticsearch-6.3.1/config/elasticsearch.yml
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
# 0.0.0.0 无远程连接限制
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes:
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
4.启动elasticsearch
/opt/module/elasticsearch-6.3.1/bin/elasticsearch
5.1问题解决
问题1:
max file descriptors [4096] for elasticsearch process likely too low, increase to at least [65536] elasticsearch
原因:系统允许 Elasticsearch 打开的最大文件数需要修改成65536
解决:vi /etc/security/limits.conf
添加内容:
原因:系统允许 Elasticsearch 打开的最大文件数需要修改成65536
解决:vi /etc/security/limits.conf
添加内容:
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 65536
注意:“*” 不要省略掉
问题 2:
max number of threads [1024] for user [judy2] likely too low, increase to at least [4096] (CentOS7.x 不用改)
原因:允许最大进程数修该成4096
解决:vi /etc/security/limits.d/90-nproc.conf
修改如下内容:
* soft nproc 1024
#修改为
* soft nproc 4096
问题 3:
max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144] (CentOS7.x 不用改)
原因:一个进程可以拥有的虚拟内存区域的数量。
原因:一个进程可以拥有的虚拟内存区域的数量。
解决:
在 /etc/sysctl.conf 文件最后添加一行
vm.max_map_count=262144
即可永久修改
重启linux
三.elasticsearch的基本概念
1.集群(cluster):
一组拥有共同的 cluster name 的节点。
2.节点(node):
集群中的一个 Elasticearch 实例。
3.索引(index):
ElasticSearch将它的数据存储在一个或多个索引(index)中。用SQL领域的术语来类比,索引就像数据库,可以向索引写入文档或者从索引中读取文档。
4.文档类型(type):
文档类型(type)是用来规定文档的各个字段内容的数据类型和其他的一些约束,相当于关系型数据库中的表,一个索引(index)可以有多个文档类型(type)。
5.文档(document):
一个文档(document)相当于关系型数据库中的一行数据。
6.字段(Field):
相当于数据库中的column。
7.映射(Mapping):
相当于数据库中的schema,用来约束字段的类型,映射可以被明确地定义,或者在一个文档被索引的时候自动生成。
8.分片(Shard):
索引的子集,索引可以切分成多个分片,分布到不同的集群节点上。分片对应的是 Lucene 中的索引。分片分为主分片(Primary shard)和副本分片(Replica shard)每个主分片可以有0个或者多个副本。
Elasticsearch与关系数据的类比对应关系如下:
Relational DB | DataBases | Table | Rows | Columns |
---|---|---|---|---|
关系型数据库 | 数据库 | 表 | 行 | 列 |
ElasticSearch | Indices | Type | Documents | Fields |
ElasticSearch | 索引 | 类型 | 文档 | 域 |
四.安装kibana
kibana 是一个免费且开放的用户界面,能够让您对 Elasticsearch 数据进行可视化,并让您在 Elastic Stack 中进行导航。您可以进行各种操作,从跟踪查询负载,到理解请求如何流经您的整个应用,都能轻松完成。
kibana安装
解压后修改配置文件
vi /opt/module/kibana-6.3.1/config/kibana.yml
# Kibana is served by a back end server. This setting specifies the port to use.
server.port: 5601 # kibana 对外端口号
# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: "192.168.5.100" ##指定Kibana服务器将绑定到的地址。0.0.0.0 为无限制ip连接
# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""
# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false
# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576
# The Kibana server's name. This is used for display purposes.
#server.name: "your-hostname"
# The URL of the Elasticsearch instance to use for all your queries.
elasticsearch.url: "http://192.168.5.100:9200" #连接es的ip+端口
# When this setting's value is true Kibana uses the hostname specified in the server.host
# setting. When the value of this setting is false, Kibana uses the hostname of the host
# that connects to this Kibana instance.
#elasticsearch.preserveHost: true
# Kibana uses an index in Elasticsearch to store saved searches, visualizations and
# dashboards. Kibana creates a new index if the index doesn't already exist.
kibana.index: ".kibana"
# The default application to load.
#kibana.defaultAppId: "home"
# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
#elasticsearch.username: "user"
#elasticsearch.password: "pass"
# Enables SSL and paths to the PEM-format SSL certificate and SSL key files, respectively.
# These settings enable SSL for outgoing requests from the Kibana server to the browser.
#server.ssl.enabled: false
#server.ssl.certificate: /path/to/your/server.crt
#server.ssl.key: /path/to/your/server.key
# Optional settings that provide the paths to the PEM-format SSL certificate and key files.
# These files validate that your Elasticsearch backend uses the same key files.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key
# Optional setting that enables you to specify a path to the PEM file for the certificate
# authority for your Elasticsearch instance.
#elasticsearch.ssl.certificateAuthorities: [ "/path/to/your/CA.pem" ]
# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full
# Time in milliseconds to wait for Elasticsearch to respond to pings. Defaults to the value of
# the elasticsearch.requestTimeout setting.
#elasticsearch.pingTimeout: 1500
# Time in milliseconds to wait for responses from the back end or Elasticsearch. This value
# must be a positive integer.
#elasticsearch.requestTimeout: 30000
# List of Kibana client-side headers to send to Elasticsearch. To send *no* client-side
# headers, set this value to [] (an empty list).
#elasticsearch.requestHeadersWhitelist: [ authorization ]
# Header names and values that are sent to Elasticsearch. Any custom headers cannot be overwritten
# by client-side headers, regardless of the elasticsearch.requestHeadersWhitelist configuration.
#elasticsearch.customHeaders: {}
# Time in milliseconds for Elasticsearch to wait for responses from shards. Set to 0 to disable.
#elasticsearch.shardTimeout: 30000
# Time in milliseconds to wait for Elasticsearch at Kibana startup before retrying.
#elasticsearch.startupTimeout: 5000
# Logs queries sent to Elasticsearch. Requires logging.verbose set to true.
#elasticsearch.logQueries: false
# Specifies the path where Kibana creates the process ID file.
#pid.file: /var/run/kibana.pid
# Enables you specify a file where Kibana stores log output.
#logging.dest: stdout
# Set the value of this setting to true to suppress all logging output.
#logging.silent: false
# Set the value of this setting to true to suppress all logging output other than error messages.
#logging.quiet: false
# Set the value of this setting to true to log all events, including system usage information
# and all requests.
#logging.verbose: false
# Set the interval in milliseconds to sample system and process performance
# metrics. Minimum is 100ms. Defaults to 5000.
#ops.interval: 5000
# The default locale. This locale can be used in certain circumstances to substitute any missing
# translations.
#i18n.defaultLocale: "en"
启动kibana
/opt/module/kibana-6.3.1/bin/kibana
五.elasticsearch restful api(DSL)
elasticsearch 搜索库与传统关系型数据的查询语言有很大不同,虽能通过第三方插件Elasticsearch-SQL,可用sql查询Elasticsearch,但语法跟关系型数据库的sql有一些不同,这里我们不做重点介绍,elasticsearch有自己特有的查询语法,称为DSL查询语言。
1.查看es中有那些索引
GET _cat/indices?v
health | green(集群完整) yellow(单点正常、集群不完整) red(单点不正常) |
---|---|
status | 是否能使用 |
index | 索引名 |
uuid | 索引统一编号 |
pri | 主节点几个 |
rep | 从节点几个 |
docs.count | 文档数 |
docs.deleted | 文档被删了多少 |
store.size | 整体占空间大小 |
pri.store.size | 主节点占 |
2.增加一个索引
PUT movie_index
3.删除一个索引
DELETE movie_index
4.新增文档
# 如果之前没有建立过index或者type,es会自定创建
PUT movie_index/movie/1
{
"id":1,
"name":"operation red sea",
"doubanScore":8.5,
"actorList":[
{"id":1,"name":"zhangsan"},
{"id":2,"name":"hai qing"},
{"id":3,"name":"zhang han yu"}
]
}
5.直接使用id查询
GET movie_index/movie/1
6.修改-整体替换
# 和新增没有区别 要求:必须包括全部字段
# 需要注意的是,在新增和修改时使用POST和PUT是没有区别的,两者只是在约定上不同而已
PUT movie_index/movie/1
{
"id":1,
"name":"operation red sea",
"doubanScore":8.5,
"actorList":[
{"id":1,"name":"zhangsan"}
]
}
7.修改-某个字段
# 修改某一个字段用POST
POST movie_index/movie/3/_update
{
"doc":{
"doubanScore":"7.8"
}
}
8.删除一个document
DELETE movie_index/movie/3
9.搜索type全部数据
GET movie_index/movie/_search
10.按条件查询
GET movie_index/movie/_search
{
"query":{
"match_all":{}
}
}
11.按照分词查询
GET movie_index/movie/_search
{
"query":{
"match":{"name":"red"}
}
}
12.按分词子属性查询
GET movie_index/movie/_search
{
"query":{
"match":{"actorList.id":"1"}
}
}
13.match phrase(短语查询)
GET movie_index/movie/_search
{
"query":{
"match_phrase":{
"name":"jidushan" # 以短语进行分词
}
}
}
14.fuzzy 查询(类似sql中的模糊查询)
# 矫正匹配分词,当一个单词都无法精准匹配,es通过一种算法对非常接近的单词也给一定的评分,能够查询出来,但是消耗更多的性能。
GET movie_index/movie/_search
{
"query":{
"fuzzy":{"name":"lubinxu"}
}
}
15.过滤 查询后过滤
GET movie/movie/_search
{
}
16.过滤 查询前过滤(推荐使用)
GET movie_index/movie/_search
{
"query": {
"bool": { //布尔查询,匹配多种情况
"must": [ // 多条件匹配
{"match": {
"name": "red"
}}
],
"filter": {
"term": {
"actorList.id": "3"
}
}
}
}
17.过滤 按范围过滤
GET movie_index/movie/_search
{
"query": {
"bool": {
"filter": {
"range": {
"doubanScore": {
"gte": 0,
"lte": 20
}
}
}
}
}
}
gt | 大于 |
---|---|
It | 小于 |
gte | 大于等于 |
Ite | 小于等于 |
18.排序
GET movie_index/movie/_search
{
"query":{
"match": {"name":"red sea"}
}
, "sort": [
{
"doubanScore": {
"order": "desc"
}
}
]
}
19.分页查询
# from 从n开始
# size 每页大小
GET movie_index/movie/_search
{
"query": {"match": {
"name": "red"
}}
, "from": 0
, "size": 20
}
20.指定查询的字段
GET movie_index/movie/_search
{
"query": {"match_all": {}},
"_source": ["name","doubanScore"]
}
21.高亮
这是一个elasticsearch数据库 特有的操作
GET movie_index/movie/_search
{
"query":{
"match": {"name":"red sea"}
},
"highlight": {
"fields": {"name":{} }
}
}
22.聚合
22.1取出每个演员共参演了多少部电影
使用sql
select count(*) from 表 group by 演员
# 涉及分词的字段 是不能直接进行过滤的
# keyword 是只索引不分词的字段 是可以用来过滤,分组的
GET movie_index/movie/_search
{
"query": {
"match_all": {}
},
"aggs": { // 分组标志,类似sql中的group by
"groupby_actor": { //为这个分组起一个名子
"terms": { //默认用一个count 聚合操作
"field": "actorList.name.keyword", // 用于分组的字段
"size": 10 //默认为10,最多分多少组
}
}
}
}
22.2每个演员参演电影的平均分是多少,并按评分排序
使用sql
select avg(doubanScore) from 表 group by 演员 order by desc
GET movie_index/movie/_search
{
"query": {
"match_all": {}
},
"aggs": {
"groupby_avg_actor": { //分组
"terms": {
"field": "actorList.name.keyword",
"size":1000,
"order": { //排序
"avg_doubanScore": "asc"
}
},
"aggs": { //聚合操作
"avg_doubanScore": {
"avg": {
"field": "doubanScore"
}
}
}
}
}
}
六.Java程序中的应用