Elasticsearch-jdbc实现MySQL同步到ElasticSearch详解-CSDN博客

elasticsearch-jdbc V2.3.2.0版本不需要安装。以下使用的elasticsearch也是2.3.2测试。
操作系统：CentOS release 6.5
1）elasticsearch 2.4.4 安装成功，测试ok。
2）mysql安装成功，能实现增、删、改、查。

可供测试的数据库为test，表为cc，具体信息如下：

mysql> select * from cc;
+----+------------+
| id | name |
+----+------------+
| 1 | laoyang |
| 2 | dluzhang |
| 3 | dlulaoyang |
+----+------------+
3 rows in set (0.00 sec)

第一步：下载工具。
地址：http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.2.0/elasticsearch-jdbc-2.3.2.0-dist.zip
第二步：导入Centos。路径自己定，笔者放到根目录下，解压。unzip elasticsearch-jdbc-2.3.2.0-dist.zip
第三步：设置环境变量。

[root@5b9dbaaa148a /]# vi /etc/profile
export JDBC_IMPORTER_HOME=/elasticsearch-jdbc-2.3.2.0

使环境变量生效：
[root@5b9dbaaa148a /]# source /etc/profile
第四步：配置使用。详细参考：https://github.com/jprante/elasticsearch-jdbc

1、 elasticsearch-jdbc 同步方法一

1）、根目录下新建文件夹odbc_es 如下：

[root@5b9dbaaa148a /]# ll /odbc_es/ drwxr-xr-x 2 root root 4096 Jun 16 03:11 logs -rwxrwxrwx 1 root root 542 Jun 16 04:03 mysql_import_es.sh

2）、新建脚本mysql_import_es.sh，内容如下；

[root@5b9dbaaa148a odbc_es]# cat mysql_import_es.sh
’#!/bin/sh
bin=$JDBC_IMPORTER_HOME/bin
lib=$JDBC_IMPORTER_HOME/lib
echo '{
"type" : "jdbc",
"jdbc": {
"elasticsearch.autodiscover":true,
"elasticsearch.cluster":"my-application", #簇名，详见：/usr/local/elasticsearch/config/elasticsearch.yml
"url":"jdbc:mysql://10.8.5.101:3306/test", #mysql数据库地址
"user":"root", #mysql用户名
"password":"123456", #mysql密码
"sql":"select * from cc",
"elasticsearch" : {
  "host" : "10.8.5.101",
  "port" : 9300
},
"index" : "myindex", #新的index
"type" : "mytype" #新的type
}
}'| java \
  -cp "${lib}/*" \
  -Dlog4j.configurationFile=${bin}/log4j2.xml \
  org.xbib.tools.Runner \
  org.xbib.tools.JDBCImporter

3）、为 mysql_import_es.sh 添加可执行权限。
[root@5b9dbaaa148a odbc_es]# chmod a+x mysql_import_es.sh
4）执行脚本mysql_import_es.sh
[root@5b9dbaaa148a odbc_es]# ./mysql_import_es.sh

第五步：测试数据同步是否成功。
使用elasticsearch检索查询：

[root@5b9dbaaa148a odbc_es]# curl -XGET 'http://10.8.5.101:9200/myindex/mytype/_search?pretty'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
  "total" : 8,
  "successful" : 8,
  "failed" : 0
  },
  "hits" : {
  "total" : 3,
  "max_score" : 1.0,
  "hits" : [ {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWH",
  "_score" : 1.0,
  "_source" : {
  "id" : 1,
  "name" : "laoyang"
  }
  }, {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWI",
  "_score" : 1.0,
  "_source" : {
  "id" : 2,
  "name" : "dluzhang"
  }
  }, {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWJ",
  "_score" : 1.0,
  "_source" : {
  "id" : 3,
  "name" : "dlulaoyang"
  }
  } ]
  }
}

出现以上包含mysql数据字段的信息则为同步成功。

2、 elasticsearch-jdbc 同步方法二

[root@5b9dbaaa148a odbc_es]# cat mysql_import_es_simple.sh
#!/bin/sh
bin=$JDBC_IMPORTER_HOME/bin
lib=$JDBC_IMPORTER_HOME/lib
  java \
  -cp "${lib}/*" \
  -Dlog4j.configurationFile=${bin}/log4j2.xml \
  org.xbib.tools.Runner \
  org.xbib.tools.JDBCImporter statefile.json

[root@5b9dbaaa148a odbc_es]# cat statefile.json
{
"type" : "jdbc",
"jdbc": {
"elasticsearch.autodiscover":true,
"elasticsearch.cluster":"my-application",
"url":"jdbc:mysql://10.8.5.101:3306/test",
"user":"root",
"password":"123456",
"sql":"select * from cc",
"elasticsearch" : {
  "host" : "10.8.5.101",
  "port" : 9300
},
"index" : "myindex_2",
"type" : "mytype_2"
}
}

脚本和json文件分开，脚本执行前先加载json文件。
执行方式：直接运行脚本 ./mysql_import_es_simple.sh 即可。

3、Mysql与elasticsearch等价查询

目标：实现从表cc中查询id=3的name信息。
1）MySQL中sql语句查询：

mysql> select * from cc where id=3;
+----+------------+
| id | name |
+----+------------+
| 3 | dlulaoyang |
+----+------------+
1 row in set (0.00 sec)

2）elasticsearch检索：

[root@5b9dbaaa148a odbc_es]# curl http://10.8.5.101:9200/myindex/mytype/_search?pretty -d '
{
"filter" : { "term" : { "id" : "3" } }
}'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
  "total" : 8,
  "successful" : 8,
  "failed" : 0
  },
  "hits" : {
  "total" : 1,
  "max_score" : 1.0,
  "hits" : [ {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWJ",
  "_score" : 1.0,
  "_source" : {
  "id" : 3,
  "name" : "dlulaoyang"
  }
  } ]
  }
}

常见错误：

错误日志位置：/odbc_es/logs
日志内容：
[root@5b9dbaaa148a logs]# tail -f jdbc.log
[04:03:39,570][INFO ][org.xbib.elasticsearch.helper.client.BaseTransportClient][pool-3-thread-1] after auto-discovery connected to [{5b9dbaaa148a}{aksn2ErNRlWjUECnp_8JmA}{10.8.5.101}{10.8.5.101:9300}{master=true}]

Bug1、[02:46:23,894][ERROR][importer.jdbc ][pool-3-thread-1] error while processing request: cluster state is RED and not YELLOW, from here on, everything will fail!
原因：
you created an index with replicas but you had only one node in the cluster. One way to solve this problem is by allocating them on a second node. Another way is by turning replicas off.
你创建了带副本 replicas 的索引，但是在你的簇中只有一个节点。

解决方案：
方案一：允许分配‘它们’到第二个节点。
方案二：关闭副本replicas（非常可行）。如下：

curl -XPUT 'localhost:9200/_settings' -d '
{
  "index" : {
  "number_of_replicas" : 0
  }
}

Bug2、[13:00:37,137][ERROR][importer.jdbc ][pool-3-thread-1] error while processing request: no cluster nodes available, check settings {autodiscover=false, client.transport.ignore_cluster_name=false, client.transport.nodes_sampler_interval=5s, client.transport.ping_timeout=5s, cluster.name=elasticsearch,
org.elasticsearch.client.transport.NoNodeAvailableException: no cluster nodes available, check
解决方案：
见上脚本中新增：
“elasticsearch.cluster”:”my-application”, #簇名，和/usr/local/elasticsearch/config/elasticsearch.yml 簇名保持一致。

转载于:https://my.oschina.net/nonstop/blog/869184