使用elasticsearch搭建集群

最新推荐文章于 2024-07-10 16:08:44 发布

逐梦如风

最新推荐文章于 2024-07-10 16:08:44 发布

阅读量3.6k

点赞数

分类专栏： java 搜索引擎文章标签： elasticsearch 集群

本文链接：https://blog.csdn.net/cabing2005/article/details/54709178

版权

java 同时被 2 个专栏收录

24 篇文章 0 订阅

订阅专栏

搜索引擎

8 篇文章 0 订阅

订阅专栏

elasticsearch搭建集群

es的映射支持的数据类型

JSON基础类型如下：
    字符串：string
    数字：byte、short、integer、long、float、double、
    时间：date
    布尔值: true、false
    数组: array
    对象: object
    Elasticsearch独有的类型：
    多重: multi
    经纬度: geo_point
    网络地址: ip
    堆叠对象: nested object
    二进制: binary
    附件: attachment

es的表的创建，合理的mapping，mapping的名词解释

type 字段类型
index 索引方式,有三个选项：
    analyzed:默认选项，以标准的全文索引方式，分析字符串，完成索引。
    not_analyzed:精确索引不对字符串做分析直接索引字段数据的精确内容。
    no：不索引该字段。
analyzer
    对于index为analyzed的字符串字段，
    使用analyzer参数来指定哪一种分析器将在搜索和索引的时候使用
    Elasticsearch使用standard分析器，但是你可以通过指定一个内建的分      析器来更改它，例如可以指定whitespace、simple或english等分析器。
Index_analzyer：指的是索引过程中采用的分词器 
Search_analyzer:指的是检索过程中采用的分词器 
boost 设置命中和没命中的score差距
_all 它自动包含被索引文档中一个或者多个域中的内容， 在进行搜索时，
    如果不指明要搜索的文档的域，ElasticSearch则会去搜索_all域。
search_analyzer
term_vector 词条向量
store 默认为no，
        mapping的时 候，如果设置了数据类型为integer，
        其他的都未指定，默认的index即为no_analyzied。至于store，
        采用默认值也是合适的（默认 为no)，实际上这些默认值都是lucene
        的默认值。因为一般我们是用不到store='yes'的功能的，除非，
        我们需要对某个域（就是字段）进行高 亮显示
include_in_all 如果某个域不希望被加到_all中，可以使用 "include_in_all":false。

匹配的时候关键字

match 简单查询
match_phrase 完全匹配(stop:1 匹配一个也满足)

multi_match 有一个字段满足匹配就显示
best_fields 完全匹配权重高
most_fields 越多字段匹配的文档评分越高
cross_fields 分词词汇是分配到不同字段中
term是代表完全匹配，即不进行分词器分析，文档中必须包含整个搜索的词汇
bool联合查询: must,should,must_not
    must: 文档必须完全匹配条件
    should: should下面会带一个以上的条件，
            至少满足一个条件，这个文档就符合should
    must_not: 文档必须不匹配条件

产出：参考分词创建客户端搜索的mapping

es配置说明

如何配置es

set.default.ES_HOME=<Pathto ElasticSearch Home>
    es的home路径

set.default.ES_HEAP_SIZE=1024
    分配给es的内存大小

wrapper.startup.timeout=300
    启动等待超时时间

wrapper.shutdown.timeout=300
    关闭等待超时时间（以秒为单位）

wrapper.ping.timeout=300
ping超时时间(以秒为单位)

es的基础配置说明

    elasticsearch.yml和logging.yml，第一个是es的基本配置文件，第二个是日志配置文件，es也是使用log4j来记录日志的，所以logging.yml里的设置按普通log4j配置文件来设置就行了

    cluster.name:elasticsearch
    配置es的集群名称，默认是elasticsearch，es会自动发现在同一网段下的es，如果在同一网段下有多个集群，就可以用这个属性来区分不同的集群。

    node.name:"FranzKafka"
    节点名，默认随机指定一个name列表中名字，该列表在es的jar包中config文件夹里name.txt文件中，其中有很多作者添加的有趣名字。

    node.master:true
    指定该节点是否有资格被选举成为node，默认是true，es是默认集群中的第一台机器为master，如果这台机挂了就会重新选举master。

    node.data:true
    指定该节点是否存储索引数据，默认为true。

    index.number_of_shards:5
    设置默认索引分片个数，默认为5片。

    index.number_of_replicas:1
    设置默认索引副本个数，默认为1个副本。

    path.conf:/path/to/conf
    设置配置文件的存储路径，默认是es根目录下的config文件夹。

    path.data:/path/to/data
    设置索引数据的存储路径，默认是es根目录下的data文件夹，
    可以设置多个存储路径，用逗号隔开，例：

        path.data:/path/to/data1,/path/to/data2
        path.work:/path/to/work
        设置临时文件的存储路径，默认是es根目录下的work文件夹。

    path.logs:/path/to/logs
    设置日志文件的存储路径，默认是es根目录下的logs文件夹

    path.plugins:/path/to/plugins
    设置插件的存放路径，默认是es根目录下的plugins文件夹

    bootstrap.mlockall:true
    设置为true来锁住内存。因为当jvm开始swapping时es的效率会降低，所以要保证它不swap，可以把ES_MIN_MEM和ES_MAX_MEM两个环境变量设置成同一个值，并且保证机器有足够的内存分配给es。同时也要允许elasticsearch的进程可以锁住内存，linux下可以通过`ulimit-l unlimited`命令。

    network.bind_host:192.168.0.1
    设置绑定的ip地址，可以是ipv4或ipv6的，默认为0.0.0.0。

    network.publish_host:192.168.0.1
    设置其它节点和该节点交互的ip地址，如果不设置它会自动判断，值必须是个真实的ip地址。

    network.host:192.168.0.1
    这个参数是用来同时设置bind_host和publish_host上面两个参数。

    transport.tcp.port:9300
    设置节点间交互的tcp端口，默认是9300。

    transport.tcp.compress:true
    设置是否压缩tcp传输时的数据，默认为false，不压缩。

    http.port:9200
    设置对外服务的http端口，默认为9200。

    http.max_content_length:100mb
    设置内容的最大容量，默认100mb

    http.enabled:false
    是否使用http协议对外提供服务，默认为true，开启。

    gateway.type:local
    gateway的类型，默认为local即为本地文件系统，可以设置为本地文件系统，分布式文件系统，Hadoop的HDFS，和amazon的s3服务器，其它文件系统的设置方法下次再详细说。
    gateway.recover_after_nodes:1
    设置集群中N个节点启动时进行数据恢复，默认为1。
    gateway.recover_after_time:5m
    设置初始化数据恢复进程的超时时间，默认是5分钟。

    gateway.expected_nodes:2
    设置这个集群中节点的数量，默认为2，一旦这N个节点启动，就会立即进行数据恢复。

    cluster.routing.allocation.node_initial_primaries_recoveries:4
    初始化数据恢复时，并发恢复线程的个数，默认为4。

    cluster.routing.allocation.node_concurrent_recoveries:2
    添加删除节点或负载均衡时并发恢复线程的个数，默认为4。

    indices.recovery.max_size_per_sec:0
    设置数据恢复时限制的带宽，如入100mb，默认为0，即无限制。

    indices.recovery.concurrent_streams:5
    设置这个参数来限制从其它分片恢复数据时最大同时打开并发流的个数，默认为5。

    discovery.zen.minimum_master_nodes:1
    设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点。默认为1，对于大的集群来说，可以设置大一点的值（2-4）

    discovery.zen.ping.timeout:3s
    设置集群中自动发现其它节点时ping连接超时时间，默认为3秒，对于比较差的网络环境可以高点的值来防止自动发现时出错。

    discovery.zen.ping.multicast.enabled:false
    设置是否打开多播发现节点，默认是true。

    discovery.zen.ping.unicast.hosts:["host1", "host2:port","host3[portX-portY]"]
    设置集群中master节点的初始列表，可以通过这些节点来自动发现新加入集群的节点

配置集群

集群节点的分类

Elasticsearch集群中有的节点一般有三种角色:master node、data node和client node。

master node:master几点主要用于元数据(metadata)的处理，比如索引的新增、删除、分片分配等。
他的配置是：
    node.master: true  
    node.data: true

data node:data 节点上保存了数据分片。它负责数据相关操作，比如分片的 CRUD，以及搜索和整合操作。这些操作都比较消耗 CPU、内存和 I/O 资源；
他的配置是：
    node.master: false  
    node.data: true

client node:client 节点起到路由请求的作用，实际上可以看做负载均衡器。
他的配置是：
    node.master: false  
    node.data: false

在单机上配置es集群

对外使用的端口9201，9202，9203
内部tcp通讯端口 9301，9302，9303

把es的代码在本机上拷贝三份，修改配置如下，再运行这三份代码即可

主集群节点node1

cluster.name: elasticsearch_test
node.name: "node1"    ##节点名称
node.master: true   ##是否是master节点
node.data: true   ##该节点上是否保存数据
#index.number_of_replicas: 1   ##备份的数量，这里设为1
#path.data: /usr/local/var/elasticsearch1    ##该节点上数据存储的path
transport.tcp.port: 9301   ##tcp的端口号
http.port: 9201    ##http的端口号
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.timeout: 3s    ##节点间自动发现的响应时间
#discovery.zen.ping.unicast.hosts: ["localhost"]    ##节点间自动发现，master节点为localhost
discovery.zen.ping.unicast.hosts: ["localhost:9301"]    ##节点间自动发现，master节点为localhost
discovery.zen.ping.multicast.enabled: true

node2

cluster.name: elasticsearch_test
node.name: "node2"    ##节点名称
node.master: true   ##是否是master节点
node.data: true   ##该节点上是否保存数据
index.number_of_replicas: 1   ##备份的数量，这里设为1
#path.data: /usr/local/var/elasticsearch2    ##该节点上数据存储的path
transport.tcp.port: 9302   ##tcp的端口号
http.port: 9202    ##http的端口号
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.timeout: 3s    ##节点间自动发现的响应时间
discovery.zen.ping.unicast.hosts: ["localhost:9301"]    ##节点间自动发现，master节点为localhost

node3

cluster.name: elasticsearch_test
node.name: "node3"    ##节点名称
node.master: true   ##是否是master节点
node.data: true   ##该节点上是否保存数据
#index.number_of_replicas: 1   ##备份的数量，这里设为1
#path.data: /usr/local/var/elasticsearch    ##该节点上数据存储的path
transport.tcp.port: 9303   ##tcp的端口号
http.port: 9203    ##http的端口号
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.timeout: 3s    ##节点间自动发现的响应时间
discovery.zen.ping.unicast.hosts: ["localhost:9301"]    ##节点间自动发现，master节点为localhost

在和上运行集群

产出：在测试机上部署es集群

在两台测试机

cluster.name: elasticsearch_kang
node.name: "test_node_196"    ##瑰绉[0m
node.master: true   ##[16;29Hmaster[
node.data: true   ##璇ヨ逛淇版[0m[0m  
index.number_of_replicas: 1   ##澶唤杩璁句负1 负1    
path.data: /data1/wap/var/data/elasticsearch   ##璇ヨ逛版瀛ㄧpathpath 
transport.tcp.port: 9300   ##tcpｅ[0m[
http.port: 9200    ##httpｅ[0m[
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.timeout: 3s    ##归搴[0m [0m            
discovery.zen.ping.unicast.hosts: ["master 的ip"]    ##归锛aster逛负localhosthost         

network.bind_host:本机ip
network.host: 本机ip

path.logs: /data1/wap/var/data/elasticsearch/logs

cluster.name: elasticsearch_kang
node.name: "test_node_197"    ##瑰绉[0m
node.master: false   ##aster[0m
node.data: true   ##璇ヨ逛淇版[0m
index.number_of_replicas: 1   ##澶唤杩璁句负1
path.data: /data1/wap/var/data/elasticsearch   ##璇ヨ逛版瀛ㄧpath
transport.tcp.port: 9300   ##tcpｅ[0m
http.port: 9200    ##httpｅ[0m
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.timeout: 3s    ##归搴[0m
discovery.zen.ping.unicast.hosts: ["master 的ip"]    ##归锛aster逛负localhost

network.bind_host: 本机ip
network.host: 本机ip

path.logs: /data1/wap/var/data/elasticsearch/logs

逐梦如风

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
使用elasticsearch搭建集群

elasticsearch搭建集群es的映射支持的数据类型JSON基础类型如下：字符串：string 数字：byte、short、integer、long、float、double、时间：date 布尔值: true、false 数组: array 对象: object Elasticsearch独有的类型：多重: multi
复制链接

扫一扫

专栏目录