elasticsearch入门篇(一)

最新推荐文章于 2024-09-15 22:20:22 发布

俺是刘铁柱

最新推荐文章于 2024-09-15 22:20:22 发布

阅读量259

点赞数

文章标签： elasticsearch

本文链接：https://blog.csdn.net/weixin_43249121/article/details/107524290

版权

本文旨在帮助没有使用过elasticsearch的开发人员达到快速上手的目的，文章并不是很难，但有些地方要注重理解和练习。

一.elasticsearch简介

以下摘自百度百科：

Elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java语言开发的，并作为Apache许可条款下的开放源码发布，是一种流行的企业级搜索引擎。Elasticsearch用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。官方客户端在Java、.NET（C#）、PHP、Python、Apache Groovy、Ruby和许多其他语言中都是可用的。根据DB-Engines的排名显示，Elasticsearch是最受欢迎的企业搜索引擎，其次是Apache Solr，也是基于Lucene。

二.elasticsearch安装

本文安装以elasticsearch-6.3.1.tar.gz 为例（单节点安装）

https://www.elastic.co/cn/downloads/past-releases/elasticsearch-6-6-0

1.准备好一台安装Java1.8环境的虚机，并配置好ip

2.解压到/opt/module 目录下

3.修改配置文件 /opt/module/elasticsearch-6.3.1/config/elasticsearch.yml

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
# 0.0.0.0 无远程连接限制
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes:
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

4.启动elasticsearch

/opt/module/elasticsearch-6.3.1/bin/elasticsearch

5.1问题解决

问题1：

max file descriptors [4096] for elasticsearch process likely too low, increase to at least [65536] elasticsearch

原因：系统允许 Elasticsearch 打开的最大文件数需要修改成65536

解决：vi /etc/security/limits.conf

添加内容：

原因：系统允许 Elasticsearch 打开的最大文件数需要修改成65536
解决：vi /etc/security/limits.conf
添加内容：
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 65536
 
注意：“*” 不要省略掉

问题 2:

max number of threads [1024] for user [judy2] likely too low, increase to at least [4096] （CentOS7.x 不用改）

原因：允许最大进程数修该成4096
解决：vi /etc/security/limits.d/90-nproc.conf   
修改如下内容：
* soft nproc 1024
#修改为
 * soft nproc 4096

问题 3:

max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144] （CentOS7.x 不用改）

原因：一个进程可以拥有的虚拟内存区域的数量。

原因：一个进程可以拥有的虚拟内存区域的数量。
解决： 
在   /etc/sysctl.conf  文件最后添加一行
vm.max_map_count=262144
即可永久修改

重启linux

三.elasticsearch的基本概念

1.集群（cluster）：

一组拥有共同的 cluster name 的节点。

2.节点（node）：

集群中的一个 Elasticearch 实例。

3.索引（index）：

ElasticSearch将它的数据存储在一个或多个索引（index）中。用SQL领域的术语来类比，索引就像数据库，可以向索引写入文档或者从索引中读取文档。

4.文档类型（type）：

文档类型（type）是用来规定文档的各个字段内容的数据类型和其他的一些约束，相当于关系型数据库中的表，一个索引（index）可以有多个文档类型（type）。

5.文档（document）：

一个文档（document）相当于关系型数据库中的一行数据。

6.字段（Field）：

相当于数据库中的column。

7.映射（Mapping）：

相当于数据库中的schema，用来约束字段的类型，映射可以被明确地定义，或者在一个文档被索引的时候自动生成。

8.分片（Shard）:

索引的子集，索引可以切分成多个分片，分布到不同的集群节点上。分片对应的是 Lucene 中的索引。分片分为主分片（Primary shard）和副本分片（Replica shard）每个主分片可以有0个或者多个副本。

Elasticsearch与关系数据的类比对应关系如下：

Relational DB	DataBases	Table	Rows	Columns
关系型数据库	数据库	表	行	列
ElasticSearch	Indices	Type	Documents	Fields
ElasticSearch	索引	类型	文档	域

四.安装kibana

kibana 是一个免费且开放的用户界面，能够让您对 Elasticsearch 数据进行可视化，并让您在 Elastic Stack 中进行导航。您可以进行各种操作，从跟踪查询负载，到理解请求如何流经您的整个应用，都能轻松完成。
kibana安装
解压后修改配置文件

vi /opt/module/kibana-6.3.1/config/kibana.yml

# Kibana is served by a back end server. This setting specifies the port to use.
server.port: 5601 # kibana 对外端口号

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
server.host: "192.168.5.100" #＃指定Kibana服务器将绑定到的地址。0.0.0.0 为无限制ip连接

# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""

# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false

# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URL of the Elasticsearch instance to use for all your queries.
elasticsearch.url: "http://192.168.5.100:9200" #连接es的ip+端口

# When this setting's value is true Kibana uses the hostname specified in the server.host
# setting. When the value of this setting is false, Kibana uses the hostname of the host
# that connects to this Kibana instance.
#elasticsearch.preserveHost: true

# Kibana uses an index in Elasticsearch to store saved searches, visualizations and
# dashboards. Kibana creates a new index if the index doesn't already exist.
kibana.index: ".kibana"

# The default application to load.
#kibana.defaultAppId: "home"

# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
#elasticsearch.username: "user"
#elasticsearch.password: "pass"

# Enables SSL and paths to the PEM-format SSL certificate and SSL key files, respectively.
# These settings enable SSL for outgoing requests from the Kibana server to the browser.
#server.ssl.enabled: false
#server.ssl.certificate: /path/to/your/server.crt
#server.ssl.key: /path/to/your/server.key

# Optional settings that provide the paths to the PEM-format SSL certificate and key files.
# These files validate that your Elasticsearch backend uses the same key files.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key

# Optional setting that enables you to specify a path to the PEM file for the certificate
# authority for your Elasticsearch instance.
#elasticsearch.ssl.certificateAuthorities: [ "/path/to/your/CA.pem" ]

# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full

# Time in milliseconds to wait for Elasticsearch to respond to pings. Defaults to the value of
# the elasticsearch.requestTimeout setting.
#elasticsearch.pingTimeout: 1500

# Time in milliseconds to wait for responses from the back end or Elasticsearch. This value
# must be a positive integer.
#elasticsearch.requestTimeout: 30000

# List of Kibana client-side headers to send to Elasticsearch. To send *no* client-side
# headers, set this value to [] (an empty list).
#elasticsearch.requestHeadersWhitelist: [ authorization ]

# Header names and values that are sent to Elasticsearch. Any custom headers cannot be overwritten
# by client-side headers, regardless of the elasticsearch.requestHeadersWhitelist configuration.
#elasticsearch.customHeaders: {}

# Time in milliseconds for Elasticsearch to wait for responses from shards. Set to 0 to disable.
#elasticsearch.shardTimeout: 30000

# Time in milliseconds to wait for Elasticsearch at Kibana startup before retrying.
#elasticsearch.startupTimeout: 5000

# Logs queries sent to Elasticsearch. Requires logging.verbose set to true.
#elasticsearch.logQueries: false

# Specifies the path where Kibana creates the process ID file.
#pid.file: /var/run/kibana.pid

# Enables you specify a file where Kibana stores log output.
#logging.dest: stdout

# Set the value of this setting to true to suppress all logging output.
#logging.silent: false

# Set the value of this setting to true to suppress all logging output other than error messages.
#logging.quiet: false

# Set the value of this setting to true to log all events, including system usage information
# and all requests.
#logging.verbose: false

# Set the interval in milliseconds to sample system and process performance
# metrics. Minimum is 100ms. Defaults to 5000.
#ops.interval: 5000

# The default locale. This locale can be used in certain circumstances to substitute any missing
# translations.
#i18n.defaultLocale: "en"

启动kibana

/opt/module/kibana-6.3.1/bin/kibana

五.elasticsearch restful api(DSL)

elasticsearch 搜索库与传统关系型数据的查询语言有很大不同，虽能通过第三方插件Elasticsearch-SQL，可用sql查询Elasticsearch，但语法跟关系型数据库的sql有一些不同，这里我们不做重点介绍，elasticsearch有自己特有的查询语法，称为DSL查询语言。

1.查看es中有那些索引

GET _cat/indices?v

health	green(集群完整) yellow(单点正常、集群不完整) red(单点不正常)
status	是否能使用
index	索引名
uuid	索引统一编号
pri	主节点几个
rep	从节点几个
docs.count	文档数
docs.deleted	文档被删了多少
store.size	整体占空间大小
pri.store.size	主节点占

2.增加一个索引

PUT movie_index

3.删除一个索引

DELETE movie_index

4.新增文档

# 如果之前没有建立过index或者type，es会自定创建

PUT movie_index/movie/1
{
    "id":1,
    "name":"operation red sea",
    "doubanScore":8.5,
    "actorList":[
        {"id":1,"name":"zhangsan"},
        {"id":2,"name":"hai qing"},
		{"id":3,"name":"zhang han yu"}
    ]
}

5.直接使用id查询

GET movie_index/movie/1

6.修改-整体替换

# 和新增没有区别 要求：必须包括全部字段
# 需要注意的是，在新增和修改时使用POST和PUT是没有区别的，两者只是在约定上不同而已
PUT movie_index/movie/1
{
    "id":1,
    "name":"operation red sea",
    "doubanScore":8.5,
    "actorList":[
        {"id":1,"name":"zhangsan"}
    ]
}

7.修改-某个字段

# 修改某一个字段用POST
POST movie_index/movie/3/_update
{
    "doc":{
        "doubanScore":"7.8"
    }
}

8.删除一个document

DELETE movie_index/movie/3

9.搜索type全部数据

GET movie_index/movie/_search

10.按条件查询

GET movie_index/movie/_search
{
    "query":{
        "match_all":{}
    }
}

11.按照分词查询

GET movie_index/movie/_search
{
    "query":{
        "match":{"name":"red"}
    }
}

12.按分词子属性查询

GET movie_index/movie/_search
{
    "query":{
        "match":{"actorList.id":"1"}
    }
}

13.match phrase(短语查询)

GET movie_index/movie/_search
{
    "query":{
        "match_phrase":{
            "name":"jidushan" # 以短语进行分词
        }
    }
}

14.fuzzy 查询（类似sql中的模糊查询）

# 矫正匹配分词，当一个单词都无法精准匹配，es通过一种算法对非常接近的单词也给一定的评分，能够查询出来，但是消耗更多的性能。
GET movie_index/movie/_search
{
    "query":{
        "fuzzy":{"name":"lubinxu"}
    }
}

15.过滤查询后过滤

GET movie/movie/_search
{
    
}

16.过滤查询前过滤（推荐使用）

GET movie_index/movie/_search
{
  "query": {
    "bool": {  //布尔查询，匹配多种情况
      "must": [ // 多条件匹配
        {"match": {
          "name": "red"
        }}
      ],
      "filter": {
        "term": {
          "actorList.id": "3"
        }
      }
    }
  }

17.过滤按范围过滤

GET movie_index/movie/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "doubanScore": {
            "gte": 0,
            "lte": 20
          }
        }
      }
    }
  }
}

gt	大于
It	小于
gte	大于等于
Ite	小于等于

18.排序

GET movie_index/movie/_search
{
  "query":{
    "match": {"name":"red sea"}
  }
  , "sort": [
    {
      "doubanScore": {
        "order": "desc"
      }
    }
  ]
}

19.分页查询

# from 从n开始
# size 每页大小
GET movie_index/movie/_search
{
  "query": {"match": {
    "name": "red"
  }}
  , "from": 0
  , "size": 20
}

20.指定查询的字段

GET movie_index/movie/_search
{
  "query": {"match_all": {}}, 
  "_source": ["name","doubanScore"]
}

21.高亮

这是一个elasticsearch数据库特有的操作

GET movie_index/movie/_search
{
    "query":{
      "match": {"name":"red sea"}
    },
    "highlight": {
      "fields": {"name":{} }
    }
    
}

22.聚合

22.1取出每个演员共参演了多少部电影

使用sql
select count(*) from 表 group by 演员
# 涉及分词的字段 是不能直接进行过滤的
# keyword 是只索引不分词的字段 是可以用来过滤，分组的
GET movie_index/movie/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": { // 分组标志，类似sql中的group by
    "groupby_actor": { //为这个分组起一个名子
      "terms": { //默认用一个count 聚合操作
        "field": "actorList.name.keyword", // 用于分组的字段
        "size": 10 //默认为10，最多分多少组
      }
    }
  }
}

22.2每个演员参演电影的平均分是多少，并按评分排序

使用sql
select avg(doubanScore) from 表 group by 演员 order by desc

GET movie_index/movie/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "groupby_avg_actor": { //分组
      "terms": {
        "field": "actorList.name.keyword", 
        "size":1000,
        "order": { //排序
          "avg_doubanScore": "asc"
        }
      },
      "aggs": { //聚合操作
        "avg_doubanScore": {
          "avg": {
            "field": "doubanScore"
          }
        }
      }
    }
  }
}