如何通过docker来搭建elasticsearch环境呢？

最新推荐文章于 2024-08-10 18:56:02 发布

xuguokun1986

最新推荐文章于 2024-08-10 18:56:02 发布

阅读量2.8k

点赞数

本文链接：https://blog.csdn.net/xuguokun1986/article/details/53483533

版权

弄了好几天也搞了好几天，一直没有找到一个好的办法解决通过docker搭建elasticsearch集群的问题，因为像es、

mysql这来都是有状态的，如果要通过docker搭建集群环境，必须让想办法让容器之间能发现各自，下面是分享的一

个GIThub上的例子，发发比较笨拙，并且相当于每个节点独享了宿主机。其中很关键的一点是：-add-host，其实就

是在容器的hosts文件中添加了一条记录。还有就是export的操作可以不执行，只需要保证两台宿主机之间可以相互通

信就行。

Elasticsearch 2.3.1 cluster with docker

18 APRIL 2016 on elasticsearch, docker

How to deploy an Elasticsearch 2.3.1 cluster using docker

We will deploy:

Elasticsearch 2.3.1
Two nodes
Authentication enabled via NGINX proxy
Persistent data to each node local file system

To follow this tutorial you must have docker installed on your servers or VMs. You can find instructions to do so here.
I'll also assume you can run docker without sudo and that you are usingDebian or one of its derivatives.

Official Elasticsearch cluster documentation can be found here.

Step One:

Get the IPs of the two servers running the following command on each one:

ifconfig eth0:1 | grep "inet addr" | cut -d: -f2 | awk '{print $1}'

(If you are using a different network interface other than eth0:1, make sure to modify the above command accordingly)

Then export them on every machine:

yourUsername@yourServerName1:~$ export node1=192.168.206.177  
yourUsername@yourServerName2:~$ export node2=192.168.207.165

(Make sure to change the IP addresses, to match your servers ones, before exporting.)

In a production environment also make sure each of the servers is accessible by way of resolvable DNS or hostnames. Either set up'/etc/hosts' to reflect this configuration or configure your DNS names.

Step Two:

For this blog post I'll use /home/docker/elastic directory. Create the directory on both servers:

yourUsername@yourServerName2:/mkdir -p ~/docker/elasticsearch  
yourUsername@yourServerName1:/mkdir -p ~/docker/elasticsearch

Step Three

On yourServerName1, start Elasticsearch docker container with:

docker run --name="esNode1" -p 9300:9300 --hostname="yourServerName1" \  
--add-host yourServerName2@yourDomain.com:192.168.207.165 \
-v "$PWD/docker/elasticsearch/data":/usr/share/elasticsearch/data \
-v "$PWD/docker/elasticsearch/plugins":/usr/share/elasticsearch/plugins \
-d elasticsearch:2.3.1 \
-Des.node.name="esNode1" \
-Des.network.host=_eth0:ipv4_ \
-Des.network.bind_host=0.0.0.0 \
-Des.cluster.name=yourClusterName \
-Des.network.publish_host=192.168.206.177 \
-Des.discovery.zen.ping.multicast.enabled=false \
-Des.discovery.zen.ping.unicast.hosts=192.168.207.165 \
-Des.discovery.zen.ping.timeout=3s \
-Des.discovery.zen.minimum_master_nodes=1 \
--env="ES_HEAP_SIZE=8g"

and on yourServerName2, start Elasticsearch docker container with:

docker run --name="esNode2" -p 9300:9300 --hostname="yourServerName2" \  
--add-host yourServerName1@yourDomain.com:192.168.206.177 \
-v "$PWD/docker/elasticsearch/data":/usr/share/elasticsearch/data \
-v "$PWD/docker/elasticsearch/plugins":/usr/share/elasticsearch/plugins \
-d elasticsearch:2.3.1 \
-Des.node.name="esNode2" \
-Des.network.host=_eth0:ipv4_ \
-Des.network.bind_host=0.0.0.0 \
-Des.cluster.name=yourClusterName \
-Des.network.publish_host=192.168.207.165 \
-Des.discovery.zen.ping.multicast.enabled=false \
-Des.discovery.zen.ping.unicast.hosts=192.168.206.177 \
-Des.discovery.zen.ping.timeout=3s \
-Des.discovery.zen.minimum_master_nodes=1 \
--env="ES_HEAP_SIZE=8g"

The --add-host is used to edit /etc/hosts inside the mongoDB docker container, so we can use hostnames instead of IPs. In a production environment these entries can be resolved via DNS, so those lines could be skipped.

-v lines let us choose where to mount locally elasticsearch docker container data and plugin directories. Those are what give you persistence outside docker container.

-d line let us choose which image and which version to pull from Docker Hub.

-Des.* lines are all configuration options passed to Elasticsearch.
Some are self explanatory, such as -Des.node.name="esNode2" and -Des.cluster.name=yourClusterName, but others might require further explanation.
Check out the following links to learn more about network settings anddiscovery.

A good rule of thumb is to set heap size to half of your memory, but don't cross 32GB if you are lucky enough to have that many. Also disableswap on your servers. Learn why from the official Elasticsearch documentation about heap and swap.

To disable swap:

sudo swapoff -a

and also edit /etc/fstab/ and comment out all lines where swap is present.
If disabling swap completely is not an option, there are other techniques described in the link above, that might work for you.

We have now a fully working Elasticsearch 2.3.1, but it is totally exposed and unprotected, meaning that everyone can, not only, access your data, but also erase them all with ease.
In the next steps we are going see how to set up access control for our cluster, using NGINX as a proxy with basic authentication.

Step Four

If you don't already have nginx installed, do it now on both server:

sudo apt-get install nginx

We need to generate 2 password files one for standard users and another one for administrators. We can do this wiht openssl, but we are limited to 8 characters passwords, or we accomplish this with apache2-utils and have no such limit. Choose what's best for you. I used the latter.
Also remember to pick two meaningful usernames, for example stdusersand admins.

If you went the openssl route:

printf "stduser:$(openssl passwd -crypt sup3rs3cr3t)" > users_password  
printf "admin:$(openssl passwd -crypt ub3rs3cr3t)" > admins_password

change user and group to root and move them to /etc/nginx/conf.d/:

sudo chown root:root users_password admins_password  
sudo mv users_password admins_password /etc/nginx/conf.d/

else you'll need to install apache2-utils first, if it's not already installed:

sudo apt-get install apache2-utils

and then generate the password files:

sudo htpasswd -c /etc/nginx/conf.d/search_users.htpasswd user  
sudo htpasswd -c /etc/nginx/conf.d/search_admins.htpasswd admin

Step Five

Let's create on each server an NGINX configuration file and open it with an editor. I use vim:

sudo vim /etc/nginx/sites-available/elastic

on yourServerName1 then insert those lines:

upstream elasticsearch {  
  server 172.17.0.2:9200;
  server 192.168.207.165:9200;
  keepalive 15;
}

server {  
    listen 8081 default_server;
    listen [::]:8081 default_server ipv6only=on;

    server_name yourServerName1.yourDomain.com;

    location / {
      return 403;
    }

    location ~* /[a-zA-Z0-9_]*[a-zA-Z0-9,_]*/(health|_health|state|stats) {
      return 405;
    }

    location ~* (/_search|/_analyze|_mget)$ {
      if ( $request_method !~ ^(GET|HEAD)$ ) {
        return 405;
      }

      if ( $request_uri = /_health ) {
        return 405;
      }


      if ( $request_uri = /_bulk ) {

        return 405;
      }

      auth_basic "Elasticsearch Users";
      auth_basic_user_file /etc/nginx/conf.d/search_users.htpasswd;
      proxy_pass http://elasticsearch;
      proxy_redirect off;
      proxy_http_version 1.1;
      proxy_set_header Connection "Keep-Alive";
      proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

server {  
    listen 8082 default_server;
    listen [::]:8082 default_server ipv6only=on;

    server_name yourServerName1.yourDomain.com;

    location / {
      auth_basic "Elasticsearch Admins";
      auth_basic_user_file /etc/nginx/conf.d/search_admins.htpasswd;
      proxy_pass http://elasticsearch;
      proxy_redirect off;
      proxy_http_version 1.1;
      proxy_set_header Connection "Keep-Alive";
      proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

on yourServerName2 insert those lines instead:

upstream elasticsearch {  
  server 172.17.0.2:9200;
  server 192.168.207.165:9200;
  keepalive 15;
}

server {  
    listen 8081 default_server;
    listen [::]:8081 default_server ipv6only=on;

    server_name yourServerName2.yourDomain.com;

    location / {
      return 403;
    }

    location ~* /[a-zA-Z0-9_]*[a-zA-Z0-9,_]*/(health|_health|state|stats) {
      return 405;
    }

    location ~* (/_search|/_analyze|_mget)$ {
      if ( $request_method !~ ^(GET|HEAD)$ ) {
        return 405;
      }

      if ( $request_uri = /_health ) {
        return 405;
      }


      if ( $request_uri = /_bulk ) {

        return 405;
      }


      auth_basic "Elasticsearch Users";
      auth_basic_user_file /etc/nginx/conf.d/search_users.htpasswd;
      proxy_pass http://elasticsearch;
      proxy_redirect off;
      proxy_http_version 1.1;
      proxy_set_header Connection "Keep-Alive";
      proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

server {  
    listen 8082 default_server;
    listen [::]:8082 default_server ipv6only=on;

    server_name yourServerName2.yourDomain.com;

    location / {
      auth_basic "Elasticsearch Admins";
      auth_basic_user_file /etc/nginx/conf.d/search_admins.htpasswd;
      proxy_pass http://elasticsearch;
      proxy_redirect off;
      proxy_http_version 1.1;
      proxy_set_header Connection "Keep-Alive";
      proxy_set_header Proxy-Connection "Keep-Alive";
    }
}

As you can see the NGINX configuration file are pretty similar. They only differ in server_name and in the upstream section.
Now we need to enable on both servers the configurations we just created:

sudo ln -s /etc/nginx/sites-available/elastic /etc/nginx/sites-enabled/elastic

and then reload the NGINX configuration, again on both server:

sudo service nginx reload

We did just set up a simple load balancer, thaks to the upstreamdirective, and allowing access only to authenticated users, even with different roles and permissions.

On port 8081 we are only allowing GET and HEAD requests to endpoint containing: _search, _analyze, _mget. In other words we are only allowing methods to retrieve data, but not to modify existing, deleting or inserting new data. That's what regular entitled users will use.

On port 8082 we are allowed to do anything we'd like to. That's, after all, the admin account we'll use to manage our cluster.

Step Six

It is usually handy to have an upstart script or something equivalent to manage your docker container instances.

On node1 (the one running on yourServerName1):

sudo vim /etc/init/esNode1.conf

and insert those lines:

description "Elasticsearch 2.3.1 node 1"  
author "yourMailUsername@yourDomain.com"  
start on filesystem and started docker  
stop on runlevel [!2345]  
respawn  
script  
    /usr/bin/docker start -a es1Node1
end script

and on node2 (the one running on yourServerName2):

sudo vim /etc/init/esNode2.conf

and insert those lines:

description "Elasticsearch 2.3.1 node 2"  
author "yourMailUsername@yourDomain.com"  
start on filesystem and started docker  
stop on runlevel [!2345]  
respawn  
script  
    /usr/bin/docker start -a esNode2
end script

With those upstart scripts in place, you can issue commands in the form:

sudo service serviceName status|start|stop|restart

So, for example, if we would like to know whether or not the Elasticsearch is up and running on yourServerName1, we'd type:

sudo service esNode1 status

and if it is up and running it will output somethinng like:

esNode1 start/running, process 23163

Note that if you already had your docker container running when you created the upstart scripts, you will need to manually stop the docker containers on with:

yourUsername@yourServerName1:~$ docker stop es1Node  
yourUsername@yourServerName2:~$ docker stop es2Node

and then starting them with the upstart script:

yourUsername@yourServerName1:~$ sudo service esNode1 start  
yourUsername@yourServerName2:~$ sudo service esNode2 start

From this moment on, upstart will be responsible, to keep your docker container running, and restarting them on server restarts.

Conclusion

We have now a fully operational Elasticsearch 2.3.1 cluster running withdocker! Take a tour of the official documentation to learn how to createindexes and mappings and then import or insert some data.

In an upcoming post we'll explore how to create a very fast autocomplete box using Elasticsearch.