Configuring Elasticsearch



Elasticsearch ships with good defaults and requires very little configuration. Most settings can be changed on a running cluster using the Cluster Update Settings API.



The configuration files should contain settings which are node-specific (such as node.name and paths), or settings which a node requires in order to be able to join a cluster, such as cluster.name and network.host.



Config file location



Elasticsearch has two configuration files:



    elasticsearch.yml for configuring Elasticsearch, and


    log4j2.properties for configuring Elasticsearch logging.



These files are located in the config directory, whose location defaults to $ES_HOME/config/. The Debian and RPM packages set the config directory location to /etc/elasticsearch/.



The location of the config directory can be changed with the path.conf setting, as follows:



./bin/elasticsearch -Epath.conf=/path/to/my/config/


Config file format



The configuration format is YAML. Here is an example of changing the path of the data and logs directories:




    data: /var/lib/elasticsearch

    logs: /var/log/elasticsearch


Settings can also be flattened as follows:



path.data: /var/lib/elasticsearch

path.logs: /var/log/elasticsearch


Environment variable substitution



Environment variables referenced with the ${...} notation within the configuration file will be replaced with the value of the environment variable, for instance:



node.name:    ${HOSTNAME}

network.host: ${ES_NETWORK_HOST}


Prompting for settings



For settings that you do not wish to store in the configuration file, you can use the value ${prompt.text} or ${prompt.secret} and start Elasticsearch in the foreground. ${prompt.secret} has echoing disabled so that the value entered will not be shown in your terminal; ${prompt.text} will allow you to see the value as you type it in. For example:




  name: ${prompt.text}


When starting Elasticsearch, you will be prompted to enter the actual value like so:



Enter value for [node.name]:





Elasticsearch will not start if ${prompt.text} or ${prompt.secret} is used in the settings and the process is run as a service or in the background.



Logging configuration



Elasticsearch uses Log4j 2 for logging. Log4j 2 can be configured using the log4j2.properties file. Elasticsearch exposes three properties, ${sys:es.logs.base_path}, ${sys:es.logs.cluster_name}, and ${sys:es.logs.node_name} (if the node name is explicitly set via node.name) that can be referenced in the configuration file to determine the location of the log files. The property ${sys:es.logs.base_path} will resolve to the log directory, ${sys:es.logs.cluster_name} will resolve to the cluster name (used as the prefix of log filenames in the default configuration), and ${sys:es.logs.node_name} will resolve to the node name (if the node name is explicitly set).



For example, if your log directory (path.logs) is /var/log/elasticsearch and your cluster is named production then ${sys:es.logs.base_path} will resolve to /var/log/elasticsearch and ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}.log will resolve to /var/log/elasticsearch/production.log.



appender.rolling.type = RollingFile //1

appender.rolling.name = rolling

appender.rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}.log //2

appender.rolling.layout.type = PatternLayout

appender.rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %.10000m%n

appender.rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}-%d{yyyy-MM-dd}.log //3

appender.rolling.policies.type = Policies

appender.rolling.policies.time.type = TimeBasedTriggeringPolicy //4

appender.rolling.policies.time.interval = 1 //5

appender.rolling.policies.time.modulate = true //6


1Configure the RollingFile appender


2Log to /var/log/elasticsearch/production.log



3Roll logs to /var/log/elasticsearch/production-yyyy-MM-dd.log


4Using a time-based roll policy



5Roll logs on a daily basis



6Align rolls on the day boundary (as opposed to rolling every twenty-four hours)






Log4js configuration parsing gets confused by any extraneous whitespace; if you copy and paste any Log4j settings on this page, or enter any Log4j configuration in general, be sure to trim any leading and trailing whitespace.



If you append .gz or .zip to appender.rolling.filePattern, then the logs will be compressed as they are rolled.



If you want to retain log files for a specified period of time, you can use a rollover strategy with a delete action.



appender.rolling.strategy.type = DefaultRolloverStrategy //1

appender.rolling.strategy.action.type = Delete //2

appender.rolling.strategy.action.basepath = ${sys:es.logs.base_path} //3

appender.rolling.strategy.action.condition.type = IfLastModified //4

appender.rolling.strategy.action.condition.age = 7D //5

appender.rolling.strategy.action.PathConditions.type = IfFileName //7

appender.rolling.strategy.action.PathConditions.glob = ${sys:es.logs.cluster_name}-* //7


1Configure the DefaultRolloverStrategy



2Configure the Delete action for handling rollovers



3The base path to the Elasticsearch logs



4The condition to apply when handling rollovers



5Retain logs for seven days



6Only delete files older than seven days if they match the specified glob



7Delete files from the base path matching the glob ${sys:es.logs.cluster_name}-*; this is the glob that log files are rolled to; this is needed to only delete the rolled Elasticsearch logs but not also delete the deprecation and slow logs



Multiple configuration files can be loaded (in which case they will get merged) as long as they are named log4j2.properties and have the Elasticsearch config directory as an ancestor; this is useful for plugins that expose additional loggers. The logger section contains the java packages and their corresponding log level. The appender section contains the destinations for the logs. Extensive information on how to customize logging and all the supported appenders can be found on the Log4j documentation.



Configuring logging levels



There are four ways to configuring logging levels, each having situations in which they are appropriate to use.



    1Via the command-line: -E <name of logging hierarchy>=<level> (e.g., -E logger.org.elasticsearch.transport=trace). This is most appropriate when you are temporarily debugging a problem on a single node (for example, a problem with startup, or during development).

通过命令行:-E <name of logging hierarchy>=<level>(例如,-E logger.org.elasticsearch.transport=trace)。这种方式适合于你临时调试一个问题对于某一个节点(例如,在启动时的问题或开发中的问题)。

    2Via elasticsearch.yml: <name of logging hierarchy>: <level> (e.g., logger.org.elasticsearch.transport: trace). This is most appropriate when you are temporarily debugging a problem but are not starting Elasticsearch via the command-line (e.g., via a service) or you want a logging level adjusted on a more permanent basis.

通过elasticsearch.yml<name of logging hierarchy>: <level>(例如,logger.org.elasticsearch.transport: trace)。这种方式适用于当你临时调试一个问题但是不能通过命令行来启动Elasticsearch的时候(例如是一个服务)或者你希望记录级别基于更加持久化的方式保存。

3Via cluster settings:



    PUT /_cluster/settings


      "transient": {

        "<name of logging hierarchy>": "<level>"




    For example:



    PUT /_cluster/settings


      "transient": {

        "logger.org.elasticsearch.transport": "trace"




    This is most appropriate when you need to dynamically need to adjust a logging level on an actively-running cluster.


    4Via the log4j2.properties:



    logger.<unique_identifier>.name = <name of logging hierarchy>

    logger.<unique_identifier>.level = <level>


    For example:



    logger.transport.name = org.elasticsearch.transport

    logger.transport.level = trace


    This is most appropriate when you need fine-grained control over the logger (for example, you want to send the logger to another file, or manage the logger differently; this is a rare use-case).



Deprecation logging


In addition to regular logging, Elasticsearch allows you to enable logging of deprecated actions. For example this allows you to determine early, if you need to migrate certain functionality in the future. By default, deprecation logging is enabled at the WARN level, the level at which all deprecation log messages will be emitted.



logger.deprecation.level = warn


This will create a daily rolling deprecation log file in your log directory. Check this file regularly, especially when you intend to upgrade to a new major version.



The default logging configuration has set the roll policy for the deprecation logs to roll and compress after 1 GB, and to preserve a maximum of five log files (four rolled logs, and the active log).



You can disable it in the config/log4j2.properties file by setting the deprecation log level to error.



Important Elasticsearch configuration



While Elasticsearch requires very little configuration, there are a number of settings which need to be configured manually and should definitely be configured before going into production.



    path.data and path.logs








path.data and path.logs


If you are using the .zip or .tar.gz archives, the data and logs directories are sub-folders of $ES_HOME. If these important folders are left in their default locations, there is a high risk of them being deleted while upgrading Elasticsearch to a new version.



In production use, you will almost certainly want to change the locations of the data and log folder:




  logs: /var/log/elasticsearch

  data: /var/data/elasticsearch


The RPM and Debian distributions already use custom paths for data and logs.



The path.data settings can be set to multiple paths, in which case all paths will be used to store data (although the files belonging to a single shard will all be stored on the same data path):





    - /mnt/elasticsearch_1

    - /mnt/elasticsearch_2

    - /mnt/elasticsearch_3




A node can only join a cluster when it shares its cluster.name with all the other nodes in the cluster. The default name is elasticsearch, but you should change it to an appropriate name which describes the purpose of the cluster.



cluster.name: logging-prod


Make sure that you dont reuse the same cluster names in different environments, otherwise you might end up with nodes joining the wrong cluster.





By default, Elasticsearch will take the 7 first character of the randomly generated uuid used as the node id. Note that the node id is persisted and does not change when a node restarts and therefore the default node name will also not change.



It is worth configuring a more meaningful name which will also have the advantage of persisting after restarting the node:



node.name: prod-data-2


The node.name can also be set to the servers HOSTNAME as follows:



node.name: ${HOSTNAME}




It is vitally important to the health of your node that none of the JVM is ever swapped out to disk. One way of achieving that is set the bootstrap.memory_lock setting to true.



For this setting to have effect, other system settings need to be configured first. See Enable bootstrap.memory_lock for more details about how to set up memory locking correctly.





By default, Elasticsearch binds to loopback addresses only — e.g. and [::1]. This is sufficient to run a single development node on a server.






In fact, more than one node can be started from the same $ES_HOME location on a single node. This can be useful for testing Elasticsearchs ability to form clusters, but it is not a configuration recommended for production.



In order to communicate and to form a cluster with nodes on other servers, your node will need to bind to a non-loopback address. While there are many network settings, usually all you need to configure is network.host:





The network.host setting also understands some special values such as _local_, _site_, _global_ and modifiers like :ip4 and :ip6, details of which can be found in Special values for network.host

network.host设置也要求一些指定的值例如_local_, _site_, _global_,和修改器就像:ip4 and :ip6,其中的细节可以在https://www.elastic.co/guide/en/elasticsearch/reference/5.5/modules-network.html#network-interface-values中找到。





As soon you provide a custom setting for network.host, Elasticsearch assumes that you are moving from development mode to production mode, and upgrades a number of system startup checks from warnings to exceptions. See Development mode vs production mode for more information.





Out of the box, without any network configuration, Elasticsearch will bind to the available loopback addresses and will scan ports 9300 to 9305 to try to connect to other nodes running on the same server. This provides an auto- clustering experience without having to do any configuration.



When the moment comes to form a cluster with nodes on other servers, you have to provide a seed list of other nodes in the cluster that are likely to be live and contactable. This can be specified as follows:





   - //1

   - seeds.mydomain.com //2


1The port will default to transport.profiles.default.port and fallback to transport.tcp.port if not specified.



2A hostname that resolves to multiple IP addresses will try all resolved addresses.




To prevent data loss, it is vital to configure the discovery.zen.minimum_master_nodes setting so that each master-eligible node knows the minimum number of master-eligible nodes that must be visible in order to form a cluster.



Without this setting, a cluster that suffers a network failure is at risk of having the cluster split into two independent clusters — a split brain — which will lead to data loss. A more detailed explanation is provided in Avoiding split brain with minimum_master_nodes



To avoid a split brain, this setting should be set to a quorum of master- eligible nodes:



(master_eligible_nodes / 2) + 1


In other words, if there are three master-eligible nodes, then minimum master nodes should be set to (3 / 2) + 1 or 2:

也就是说,如果有三个合法的主机节点,最小的主节点数据应该是(3 / 2) + 1或者是2


discovery.zen.minimum_master_nodes: 2


Secure Settings



Some settings are sensitive, and relying on filesystem permissions to protect their values is not sufficient. For this use case, elasticsearch provides a keystore, which may be password protected, and the elasticsearch-keystore tool to manage the settings in the keystore.






All commands here should be run as the user which will run elasticsearch.






Only some settings are designed to be read from the keystore. See documentation for each setting to see if it is supported as part of the keystore.



Creating the keystore



To create the elasticsearch.keystore, use the create command:



bin/elasticsearch-keystore create


The file elasticsearch.keystore will be created alongside elasticsearch.yml.



Listing settings in the keystore



A list of the settings in the keystore is available with the list command:



bin/elasticsearch-keystore list


Adding string settings



Sensitive string settings, like authentication credentials for cloud plugins, can be added using the add command:



bin/elasticsearch-keystore add the.setting.name.to.set


The tool will prompt for the value of the setting. To pass the value through stdin, use the --stdin flag:



cat /file/containing/setting/value | bin/elasticsearch-keystore add --stdin the.setting.name.to.set


Removing settings



To remove a setting from the keystore, use the remove command:



bin/elasticsearch-keystore remove the.setting.name.to.remove


Bootstrap Checks



Collectively, we have a lot of experience with users suffering unexpected issues because they have not configured important settings. In previous versions of Elasticsearch, misconfiguration of some of these settings were logged as warnings. Understandably, users sometimes miss these log messages. To ensure that these settings receive the attention that they deserve, Elasticsearch has bootstrap checks upon startup.



These bootstrap checks inspect a variety of Elasticsearch and system settings and compare them to values that are safe for the operation of Elasticsearch. If Elasticsearch is in development mode, any bootstrap checks that fail appear as warnings in the Elasticsearch log. If Elasticsearch is in production mode, any bootstrap checks that fail will cause Elasticsearch to refuse to start.



There are some bootstrap checks that are always enforced to prevent Elasticsearch from running with incompatible settings. These checks are documented individually.



Development vs. production mode



By default, Elasticsearch binds to localhost for HTTP and transport (internal) communication. This is fine for downloading and playing with Elasticsearch, and everyday development but its useless for production systems. To form a cluster, Elasticsearch instances must be reachable via transport communication so they must bind transport to an external interface. Thus, we consider an Elasticsearch instance to be in development mode if it does not bind transport to an external interface (the default), and is otherwise in production mode if it does bind transport to an external interface.



Note that HTTP can be configured independently of transport via http.host and transport.host; this can be useful for configuring a single instance to be reachable via HTTP for testing purposes without triggering production mode.



We recognize that some users need to bind transport to an external interface for testing their usage of the transport client. For this situation, we provide the discovery type single-node (configure it by setting discovery.type to single-node); in this situation, a node will elect itself master and will not form a cluster with any other node.



If you are running a single node in production, it is possible to evade the bootstrap checks (either by not binding transport to an external interface, or by binding transport to an external interface and setting the discovery type to single-node). For this situation, you can force execution of the bootstrap checks by setting the system property es.enforce.bootstrap.checks to true (set this in Setting JVM options, or by adding -Des.enforce.bootstrap.checks=true to the environment variable ES_JAVA_OPTS). We strongly encourage you to do this if you are in this specific situation. This system property can be used to force execution of the bootstrap checks independent of the node configuration.



Heap size check



If a JVM is started with unequal initial and max heap size, it can be prone to pauses as the JVM heap is resized during system usage. To avoid these resize pauses, its best to start the JVM with the initial heap size equal to the maximum heap size. Additionally, if bootstrap.memory_lock is enabled, the JVM will lock the initial size of the heap on startup. If the initial heap size is not equal to the maximum heap size, after a resize it will not be the case that all of the JVM heap is locked in memory. To pass the heap size check, you must configure the heap size.



File descriptor check



File descriptors are a Unix construct for tracking open "files". In Unix though, everything is a file. For example, "files" could be a physical file, a virtual file (e.g., /proc/loadavg), or network sockets. Elasticsearch requires lots of file descriptors (e.g., every shard is composed of multiple segments and other files, plus connections to other nodes, etc.). This bootstrap check is enforced on OS X and Linux. To pass the file descriptor check, you might have to configure file descriptors.

文件描述符石Unix用于追踪打开的“文件”。在Unix中,任何事物都是文件。例如,“files”可以是物理文件、一个虚拟文件(例如,/proc/loadavg)或网络套接字。Elasticsearch要求很多的文件描述符(例如,每个分片是多个段的组合和其他文件,包括连接其他节点等等)。这个启动检查在OS XLinux上是强制的。为了通过文件描述符检查,你必须要配置文件描述符。


Memory lock check



When the JVM does a major garbage collection it touches every page of the heap. If any of those pages are swapped out to disk they will have to be swapped back in to memory. That causes lots of disk thrashing that Elasticsearch would much rather use to service requests. There are several ways to configure a system to disallow swapping. One way is by requesting the JVM to lock the heap in memory through mlockall (Unix) or virtual lock (Windows). This is done via the Elasticsearch setting bootstrap.memory_lock. However, there are cases where this setting can be passed to Elasticsearch but Elasticsearch is not able to lock the heap (e.g., if the elasticsearch user does not have memlock unlimited). The memory lock check verifies that if the bootstrap.memory_lock setting is enabled, that the JVM was successfully able to lock the heap. To pass the memory lock check, you might have to configure mlockall.



Maximum number of threads check



Elasticsearch executes requests by breaking the request down into stages and handing those stages off to different thread pool executors. There are different thread pool executors for a variety of tasks within Elasticsearch. Thus, Elasticsearch needs the ability to create a lot of threads. The maximum number of threads check ensures that the Elasticsearch process has the rights to create enough threads under normal use. This check is enforced only on Linux. If you are on Linux, to pass the maximum number of threads check, you must configure your system to allow the Elasticsearch process the ability to create at least 2048 threads. This can be done via /etc/security/limits.conf using the nproc setting (note that you might have to increase the limits for the root user too).



Maximum size virtual memory check



Elasticsearch and Lucene use mmap to great effect to map portions of an index into the Elasticsearch address space. This keeps certain index data off the JVM heap but in memory for blazing fast access. For this to be effective, the Elasticsearch should have unlimited address space. The maximum size virtual memory check enforces that the Elasticsearch process has unlimited address space and is enforced only on Linux. To pass the maximum size virtual memory check, you must configure your system to allow the Elasticsearch process the ability to have unlimited address space. This can be done via /etc/security/limits.conf using the as setting to unlimited (note that you might have to increase the limits for the root user too).



Maximum map count check



Continuing from the previous point, to use mmap effectively, Elasticsearch also requires the ability to create many memory-mapped areas. The maximum map count check checks that the kernel allows a process to have at least 262,144 memory-mapped areas and is enforced on Linux only. To pass the maximum map count check, you must configure vm.max_map_count via sysctl to be at least 262144.



Client JVM check



There are two different JVMs provided by OpenJDK-derived JVMs: the client JVM and the server JVM. These JVMs use different compilers for producing executable machine code from Java bytecode. The client JVM is tuned for startup time and memory footprint while the server JVM is tuned for maximizing performance. The difference in performance between the two VMs can be substantial. The client JVM check ensures that Elasticsearch is not running inside the client JVM. To pass the client JVM check, you must start Elasticsearch with the server VM. On modern systems and operating systems, the server VM is the default. Additionally, Elasticsearch is configured by default to force the server VM.



Use serial collector check



There are various garbage collectors for the OpenJDK-derived JVMs targeting different workloads. The serial collector in particular is best suited for single logical CPU machines or extremely small heaps, neither of which are suitable for running Elasticsearch. Using the serial collector with Elasticsearch can be devastating for performance. The serial collector check ensures that Elasticsearch is not configured to run with the serial collector. To pass the serial collector check, you must not start Elasticsearch with the serial collector (whether its from the defaults for the JVM that youre using, or youve explicitly specified it with -XX:+UseSerialGC). Note that the default JVM configuration that ships with Elasticsearch configures Elasticsearch to use the CMS collector.



System call filter check



Elasticsearch installs system call filters of various flavors depending on the operating system (e.g., seccomp on Linux). These system call filters are installed to prevent the ability to execute system calls related to forking as a defense mechanism against arbitrary code execution attacks on Elasticsearch The system call filter check ensures that if system call filters are enabled, then they were successfully installed. To pass the system call filter check you must either fix any configuration errors on your system that prevented system call filters from installing (check your logs), or at your own risk disable system call filters by setting bootstrap.system_call_filter to false.



OnError and OnOutOfMemoryError checks



The JVM options OnError and OnOutOfMemoryError enable executing arbitrary commands if the JVM encounters a fatal error (OnError) or an OutOfMemoryError (OnOutOfMemoryError). However, by default, Elasticsearch system call filters (seccomp) are enabled and these filters prevent forking. Thus, using OnError or OnOutOfMemoryError and system call filters are incompatible. The OnError and OnOutOfMemoryError checks prevent Elasticsearch from starting if either of these JVM options are used and system call filters are enabled. This check is always enforced. To pass this check do not enable OnError nor OnOutOfMemoryError; instead, upgrade to Java 8u92 and use the JVM flag ExitOnOutOfMemoryError. While this does not have the full capabilities of OnError nor OnOutOfMemoryError, arbitrary forking will not be supported with seccomp enabled.

JVM选项OnErrorOnOutOfMemoryError允许检查命令如果JVM出现了致命的错误(OnError)或一个OutOfMemoryError (OnOutOfMemoryError)。然而,默认的Elasticsearch系统调用过滤器(seccomp)是允许并且这些过滤器会避免问题。因此使用OnErrorOnOutOfMemoryEoor和系统调用是不相容的。OnErrorOnOutOfMemoryError检查避免Elasticsearch在启动时由于JVM选项的使用和系统调用过滤器被允许。为了通过检查,作为代替,升级到Java8u92版本并且使用JVM标志ExitOnOutOfMemoryError。这并没有完全避免OnErrorOnOutOfMemoryError,任意的forking将不会被seccomp开启时支持。


Early-access check



The OpenJDK project provides early-access snapshots of upcoming releases. These releases are not suitable for production. The early-access check detects these early-access snapshots. To pass this check, you must start Elasticsearch on a release build of the JVM.



G1GC check



Early versions of the HotSpot JVM that shipped with JDK 8 are known to have issues that can lead to index corruption when the G1GC collector is enabled. The versions impacted are those earlier than the version of HotSpot that shipped with JDK 8u40. The G1GC check detects these early versions of the HotSpot JVM.








