ElasticSearch入门

最新推荐文章于 2023-05-20 23:34:27 发布

信徒favor

最新推荐文章于 2023-05-20 23:34:27 发布

阅读量198

点赞数

文章标签： elasticsearch es java 搜索引擎

本文链接：https://blog.csdn.net/xintu1314/article/details/111227357

版权

什么是ElasticSearch？

*独立的网络上的一个或一组进程节点
*对外提供搜索服务（http或transport协议）
*对内就是一个搜索数据库

基于Apache Lucene构建的开源搜素引擎；
采用java编写，提供简单易用的RESTFul API；
轻松的横向扩展，可支持PB级的结构化或非结构化数据处理；

在这里插入图片描述

应用场景？

海量数据分析引擎；
站内搜索引擎；
数据仓库；
英国卫报-实时分析公众对文章的回应
维基百科、GitHub-站内实时搜索
百度-实时日志监控平台

安装

Windows下参考https://blog.csdn.net/yx1214442120/article/details/55102298

基础概念

索引：含有相同属性的文档合集
类型：索引可以定义一个或多个类型，文档必须属于一个类型
文档：文档是可以被索引的基本数据单位
分片：每个索引都有多个分片，每个分片是一个Lucene索引
备份：拷贝一份分片就完成了分片的备份

索引：1、搜索中数据库或表定义
2、构建文档时候的索引创建

分词：1、搜索是以词为单位做最基本的搜索单元
2、依靠分词器构建分词
3、用分词构建倒排索引

搜索过程
在这里插入图片描述

搜索的本质和原理

比如我们一个Chinese，同时命中两个word，那么怎么判断哪一个优先级更高

TF-IDF打分

TF：词频这个document文档包含了多少个这个词，包含越多表明越相关
DF：文档频率包含该词的文档总数目
IDF：DF取反

意味着分出十个词，，谷歌在5个文档中出现了，倒排列表表示在文档ID1出现了一次，出现的位置是1
所以它的打分是TF/IDF第一个1/5，第二个1/5 ，第三个2/5

分布式索引原理

number_of_shards:定义索引主分片数量，用于响应写操作（也可响应读）
number_of_replicas:定义索引备份分片数量，用于响应读操作
分布式索引依据分片配置均匀的响应用户请求
通过paxos方式从具备竞争主节点能力的机器中竞选主节点后，所有写请求都要通过主节点
读请求可不经过主节点，直接发生在从节点上，若对应节点无分片则路由到对应有分片的节点中。

ES的安装

www.elastic.co
进入官网

点击这里下载两个

点击es中bin文件的.bat，直接启动在这里插入图片描述

同理点击kibana的.bat文件启动

使用

两个应用启动之后，登陆localhost:5601
如下图所示，可以创建索引
在这里插入图片描述

但我们发现节点是yellow
因为我们简单创建时，默认指定主分片，从分片；主从分片都在一个节点上面

假如我们改成
PUT /test
{
“settings”: {
“number_of_shards”: 1,
“number_of_replicas”: 0
}

}
则变成green，但扩展性会变差

分布式原理

1、分片
（默认1）
2、主从
3、路由
需要知道主分片，从分片分别在哪里

假如添加节点进集群
在这里插入图片描述
则会使用负载均衡

在这里插入图片描述

重新竞选master

集群搭建

复制两个
然后注意因为刚才练习的时候已经建立过索引，所以把data文件夹下的node删除；
然后改.yml配置

node-1

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: dianping-app
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 127.0.0.1
#
# Set a custom port for HTTP:
#
http.port: 9200

transport.tcp.port: 9300

http.cors.enabled: true

http.cors.allow-origin: "*"
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["127.0.0.1:9300", "127.0.0.1:9301","127.0.0.1:9302"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["127.0.0.1:9300", "127.0.0.1:9301","127.0.0.1:9302"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

node-2

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: dianping-app
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-3
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 127.0.0.1
#
# Set a custom port for HTTP:
#
http.port: 9202

transport.tcp.port: 9302

http.cors.enabled: true

http.cors.allow-origin: "*"
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.seed_hosts: ["127.0.0.1:9300", "127.0.0.1:9301","127.0.0.1:9302"]
#
# Bootstrap the cluster using an initial set of master-eligible nodes:
#
cluster.initial_master_nodes: ["127.0.0.1:9300", "127.0.0.1:9301","127.0.0.1:9302"]
#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

node-3同理
在这里插入图片描述
建立完集群后，此时再到kibana创建，此时显示green

基础语法

新建索引

1、结构化索引

2、非结构化索引

DELETE employee

PUT /employee
{
“settings”: {
“number_of_shards”: 1,
“number_of_replicas”: 1
}

}

PUT /employee/_doc/1
{
“name”:“xintu”,
“age”:30
}
在这里插入图片描述
更新的时候直接在字段上更改，然后再运行

3、强制指定创建

强制指定创建，若已存在，则失败
在这里插入图片描述

获取索引

在这里插入图片描述

1、查询全部文档

GET /employee/_search

2、不带条件查询所有记录

GET /employee/_search
{
“query”:{
“match_all”:{}
}
}

3、分页查询

GET /employee/_search
{
“query”:{
“match_all”:{}
},
“from”:0,
“size”:1
}

4、带关键字条件的查询

//带关键字条件的查询
GET /employee/_search
{
“query”: {
“match”: {“name”:“兄弟”}
}
}
此时可以搜索出来
如果把兄弟改成兄/兄长等，均可以出来

5、带排序查询

//带排序
GET /employee/_search
{
“query”: {
“match”: {“name”:“兄”}
},
“sort”: [
{
“age”: {
“order”: “desc”
}
}
]
}//年龄倒序

6、带filter的查询

GET /employee/_search
{
“query”: {
“bool”: {
“filter”: [
{“term”: {
“age”: “30”
}}
]
}
}
}//查询年龄为30的
filter不会打分

7、带聚合查询

//带聚合
GET /employee/_search
{
“query”: {
“match”: {“name”:“兄”}
},
“sort”: [
{
“age”: {
“order”: “desc”
}
}
],
“aggs”: {
“group_by_age”: {
“terms”: {
“field”: “age”
}
}
}
}
像这个就是查每个年龄段符合的结果有几个
“aggregations” : {
“group_by_age” : {
“doc_count_error_upper_bound” : 0,
“sum_other_doc_count” : 0,
“buckets” : [
{
“key” : 30,
“doc_count” : 2
}
]
}
}