elasticsearch-jdbc实现MySQL同步到ElasticSearch深入详解

最新推荐文章于 2024-10-05 11:27:00 发布

JhonXie

最新推荐文章于 2024-10-05 11:27:00 发布

阅读量137

点赞数

文章标签：大数据 json 数据库

原文链接：https://my.oschina.net/u/2274056/blog/1582387

版权

2019独角兽企业重金招聘Python工程师标准>>>

1.如何实现mysql与elasticsearch的数据同步？

逐条转换为json显然不合适，需要借助第三方工具或者自己实现。核心功能点：同步增、删、改、查同步。

2、mysql与elasticsearch同步的方法有哪些？优缺点对比？

目前该领域比较牛的插件有：

1）、elasticsearch-jdbc，严格意义上它已经不是第三方插件。已经成为独立的第三方工具。https://github.com/jprante/elasticsearch-jdbc 2）、elasticsearch-river-mysql插件 https://github.com/scharron/elasticsearch-river-mysql 3）、go-mysql-elasticsearch（国内作者siddontang） https://github.com/siddontang/go-mysql-elasticsearch

1-3同步工具/插件对比：

go-mysql-elasticsearch仍处理开发不稳定阶段。为什么选择elasticsearch-jdbc而不是elasticsearch-river-mysql插件的原因？（参考：http://stackoverflow.com/questions/23658534/using-elasticsearch-river-mysql-to-stream-data-from-mysql-database-to-elasticsea） 1）通用性角度：elasticsearch-jdbc更通用， 2）版本更新角度：elasticsearch-jdbc GitHub活跃度很高，最新的版本2.3.3.02016年5月28日兼容Elasticsearch2.3.3版本。而elasticsearch-river-mysql 2012年12月13日后便不再更新。综上，选择elasticsearch-jdbc作为mysql同步Elasticsearch的工具理所当然。

elasticsearch-jdbc的缺点与不足（他山之石）：

1）、go-mysql-elasticsearch作者siddontang在博客提到的： elasticsearch-river-jdbc的功能是很强大，但并没有很好的支持增量数据更新的问题，它需要对应的表只增不减，而这个几乎在项目中是不可能办到的。 http://www.jianshu.com/p/05cff717563c 2）、博主leotse90在博文中提到elasticsearch-jdbc的缺点：那就是删除操作不能同步（物理删除）！ http://leotse90.com/2015/11/11/ElasticSearch与MySQL数据同步以及修改表结构/

我截止2016年6月16日没有测试到，不妄加评论。

这里写图片描述

3、elasticsearch-jdbc如何使用？要不要安装？

3.1 和早期版本不同点

elasticsearch-jdbcV2.3.2.0版本不需要安装。以下笔者使用的elasticsearch也是2.3.2测试。操作系统：CentOS release 6.6 (Final) 看到这里，你可能会问早期的版本有什么不同呢？很大不同。从我搜集资料来看，不同点如下： 1）早期1.x版本，作为插件，需要安装。 2）配置也会有不同。

3.2 elasticsearch-jdbc使用(同步方法一）

前提： 1）elasticsearch 2.3.2 安装成功，测试ok。 2）mysql安装成功，能实现增、删、改、查。可供测试的数据库为test，表为cc，具体信息如下：

mysql> select * from cc; +----+------------+ | id | name | +----+------------+ | 1 | laoyang | | 2 | dluzhang | | 3 | dlulaoyang | +----+------------+ 3 rows in set (0.00 sec)

第一步：下载工具。址：http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.2.0/elasticsearch-jdbc-2.3.2.0-dist.zip 第二步：导入Centos。路径自己定，笔者放到根目录下，解压。unzip elasticsearch-jdbc-2.3.2.0-dist.zip 第三步：设置环境变量。

[root@5b9dbaaa148a /]# vi /etc/profile export JDBC_IMPORTER_HOME=/elasticsearch-jdbc-2.3.2.0

使环境变量生效： [root@5b9dbaaa148a /]# source /etc/profile 第四步：配置使用。详细参考：https://github.com/jprante/elasticsearch-jdbc 1）、根目录下新建文件夹odbc_es 如下：

[root@5b9dbaaa148a /]# ll /odbc_es/ drwxr-xr-x 2 root root 4096 Jun 16 03:11 logs -rwxrwxrwx 1 root root 542 Jun 16 04:03 mysql_import_es.sh

2）、新建脚本mysql_import_es.sh，内容如下；

[root@5b9dbaaa148a odbc_es]# cat mysql_import_es.sh ’#!/bin/sh bin=$JDBC_IMPORTER_HOME/bin lib=$JDBC_IMPORTER_HOME/lib echo '{ "type" : "jdbc", "jdbc": { "elasticsearch.autodiscover":true, "elasticsearch.cluster":"my-application", #簇名，详见：/usr/local/elasticsearch/config/elasticsearch.yml "url":"jdbc:mysql://10.8.5.101:3306/test", #mysql数据库地址 "user":"root", #mysql用户名 "password":"123456", #mysql密码 "sql":"select * from cc", "elasticsearch" : { "host" : "10.8.5.101", "port" : 9300 }, "index" : "myindex", #新的index "type" : "mytype" #新的type } }'| java
-cp "${lib}/*"
-Dlog4j.configurationFile=${bin}/log4j2.xml
org.xbib.tools.Runner
org.xbib.tools.JDBCImporter

3）、为 mysql_import_es.sh 添加可执行权限。 [root@5b9dbaaa148a odbc_es]# chmod a+x mysql_import_es.sh 4）执行脚本mysql_import_es.sh [root@5b9dbaaa148a odbc_es]# ./mysql_import_es.sh

第五步：测试数据同步是否成功。使用elasticsearch检索查询：

[root@5b9dbaaa148a odbc_es]# curl -XGET 'http://10.8.5.101:9200/myindex/mytype/_search?pretty'

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
  "total" : 8,
  "successful" : 8,
  "failed" : 0
  },
  "hits" : {
  "total" : 3,
  "max_score" : 1.0,
  "hits" : [ {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWH",
  "_score" : 1.0,
  "_source" : {
  "id" : 1,
  "name" : "laoyang"
  }
  }, {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWI",
  "_score" : 1.0,
  "_source" : {
  "id" : 2,
  "name" : "dluzhang"
  }
  }, {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWJ",
  "_score" : 1.0,
  "_source" : {
  "id" : 3,
  "name" : "dlulaoyang"
  }
  } ]
  }
}

出现以上包含mysql数据字段的信息则为同步成功。

4、 elasticsearch-jdbc 同步方法二

[root@5b9dbaaa148a odbc_es]# cat mysql_import_es_simple.sh #!/bin/sh bin=$JDBC_IMPORTER_HOME/bin lib=$JDBC_IMPORTER_HOME/lib java
-cp "${lib}/*"
-Dlog4j.configurationFile=${bin}/log4j2.xml
org.xbib.tools.Runner
org.xbib.tools.JDBCImporter statefile.json

[root@5b9dbaaa148a odbc_es]# cat statefile.json

{
"type" : "jdbc",
"jdbc": {
"elasticsearch.autodiscover":true,
"elasticsearch.cluster":"my-application",
"url":"jdbc:mysql://10.8.5.101:3306/test",
"user":"root",
"password":"123456",
"sql":"select * from cc",
"elasticsearch" : {
  "host" : "10.8.5.101",
  "port" : 9300
},
"index" : "myindex_2",
"type" : "mytype_2"
}
}

脚本和json文件分开，脚本执行前先加载json文件。执行方式：直接运行脚本 ./mysql_import_es_simple.sh 即可。

5、Mysql与elasticsearch等价查询

目标：实现从表cc中查询id=3的name信息。 1）MySQL中sql语句查询：

mysql> select * from cc where id=3; +----+------------+ | id | name | +----+------------+ | 3 | dlulaoyang | +----+------------+ 1 row in set (0.00 sec)

2）elasticsearch检索：

[root@5b9dbaaa148a odbc_es]# curl http://10.8.5.101:9200/myindex/mytype/_search?pretty -d '

{
"filter" : { "term" : { "id" : "3" } }
}'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
  "total" : 8,
  "successful" : 8,
  "failed" : 0
  },
  "hits" : {
  "total" : 1,
  "max_score" : 1.0,
  "hits" : [ {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWJ",
  "_score" : 1.0,
  "_source" : {
  "id" : 3,
  "name" : "dlulaoyang"
  }
  } ]
  }
}

常见错误：

错误日志位置：/odbc_es/logs 日志内容： [root@5b9dbaaa148a logs]# tail -f jdbc.log [04:03:39,570][INFO ][org.xbib.elasticsearch.helper.client.BaseTransportClient][pool-3-thread-1] after auto-discovery connected to [{5b9dbaaa148a}{aksn2ErNRlWjUECnp_8JmA}{10.8.5.101}{10.8.5.101:9300}{master=true}]

Bug1、[02:46:23,894][ERROR][importer.jdbc ][pool-3-thread-1] error while processing request: cluster state is RED and not YELLOW, from here on, everything will fail! 原因： you created an index with replicas but you had only one node in the cluster. One way to solve this problem is by allocating them on a second node. Another way is by turning replicas off. 你创建了带副本 replicas 的索引，但是在你的簇中只有一个节点。

解决方案：方案一：允许分配‘它们’到第二个节点。方案二：关闭副本replicas（非常可行）。如下：

curl -XPUT 'localhost:9200/_settings' -d '
{
  "index" : {
  "number_of_replicas" : 0
  }
}

Bug2、[13:00:37,137][ERROR][importer.jdbc ][pool-3-thread-1] error while processing request: no cluster nodes available, check settings {autodiscover=false, client.transport.ignore_cluster_name=false, client.transport.nodes_sampler_interval=5s, client.transport.ping_timeout=5s, cluster.name=elasticsearch, org.elasticsearch.client.transport.NoNodeAvailableException: no cluster nodes available, check 解决方案：见上脚本中新增： “elasticsearch.cluster”:”my-application”, #簇名，和/usr/local/elasticsearch/config/elasticsearch.yml 簇名保持一致。

参考： http://stackoverflow.com/questions/11944915/getting-an-elasticsearch-cluster-to-green-cluster-setup-on-os-x

下载地址 http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.3.0/elasticsearch-jdbc-2.3.3.0-dist.zip 解压，设置环境变量修改bin中脚本运行。

注意：包下载下来没有包含statefile.json 文件，第一次运行sh文件生成该配置，后面使用都用该文件配置

./mysql-goodstaxi.sh & touch jdbc.log

#!/bin/sh

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
bin=/httx/run/elasticsearch-jdbc-2.3.3.0/bin
lib=/httx/run/elasticsearch-jdbc-2.3.3.0/lib

echo '
{
    "type" : "jdbc",
    "jdbc" : {
        "url" : "jdbc:mysql://10.7.*.*:8066/tete?useUnicode=true&characterEncoding=utf-8",
        "statefile" : "statefile.json",
        "schedule" : "0 0-59 0-23 ? * *",
        "user" : "54645",
        "password" : "456456",
        "sql" :  [
            {
                "statement" : "select *,TradeId as _id from Trade where stampDate > ?",
                "parameter" : [ "$metrics.lastexecutionstart" ]
            }
        ],
		 "index_settings" : {
            "analysis" : {
            "analyzer" : {
                "ik" : {
                    "tokenizer" : "ik"
                }
            }
        }
        },
        "elasticsearch" : {
			 "cluster" : "565",
			 "host" : "10.7.*.*",
			 "port" : 9300
		},
        "index" : "goods",
        "type" : "goods",
        "index_settings" : {
			"index" : {
				"number_of_shards" : 1
			}
		}
    }
}
' | java \
    -cp "${lib}/*" \
    -Dlog4j.configurationFile=${bin}/log4j2.xml \
    org.xbib.tools.Runner \
    org.xbib.tools.JDBCImporte

16546

#!/bin/sh

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
bin=/httx/6/elasticsearch-jdbc-2.3.3.0/bin
lib=/httx/6/elasticsearch-jdbc-2.3.3.0/lib

echo '
{
    "type" : "jdbc",
    "jdbc" : {
        "url" : "jdbc:mysql://10.7.*.*:8066/good?useUnicode=true&characterEncoding=utf-8",
        "statefile" : "statefile.json",
        "schedule" : "0 0-59 0-23 ? * *",
        "user" : "admin",
        "password" : "45456",
        "sql" : "select *,6TradeId as _id from 6Trade",
        "elasticsearch" : {
			 "cluster" : "6",
			 "host" : "10.7.*.*",
			 "port" : 9300
		},
        "index" : "good",
        "type" : "goods",
        "index_settings" : {
			"index" : {
				"number_of_shards" : 1
			}
		}
    }
}
' | java \
    -cp "${lib}/*" \
    -Dlog4j.configurationFile=${bin}/log4j2.xml \
    org.xbib.tools.Runner \
    org.xbib.tools.JDBCImporter

转载于:https://my.oschina.net/u/2274056/blog/1582387