用elasticsearch-river-jdbc同步数据到elasticsearch

转载 2015年11月20日 16:33:55

用elasticsearch-river-jdbc同步数据到elasticsearch

标签: elasticsearch插件
333人阅读 评论(0) 收藏 举报
分类:
作者同类文章X

用elasticsearch-river-jdbc同步数据到elasticsearch

1 插件安装 

插件的github地址 https://github.com/jprante/elasticsearch-jdbc/
要选择对应的es版本的插件,但这个插件不能直接安装,可用maven打包再安装

这里用一个可直接安装的包 (对应的es 1.4.5)
cd /usr/share/elasticsearch
> bin/plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.4.0.8/elasticsearch-river-jdbc-1.4.0.8-plugin.zip

安装后在{es_home}/plugins/有个jdbc

注意:如果es版本与插件版本不一致在索引(同步)数据是会报错:“Failed to load class with value [jdbc]”

2 mysql-connector-java-5.1.30-bin.jar 驱动下载

这个可在网上搜索下载后放到 {es_home}/lib下

3 建立river,索引数据

curl -XPUT '192.168.1.116:9200/_river/userinfo/_meta' -d '{
"strategy" : "simple",
"interval" : 0,
"flush_interval" : "5s",
"type" : "jdbc",
"jdbc": {
"driver": "com.mysql.jdbc.Driver",
"url": "jdbc:mysql://mysql的数据库地址:3306/testdb",
"user": "mysql1",
"password": "Admin123",
"sql": "select UserID as _id,username,createdate,memo FROM userinfo",
"interval":"1800",
"index" : "userinfo",
"type" : "userinfo"
}
}'

UserID as _id 这样的话可以增量同步,_id是es的默认id命名

"interval":"1800", 这里是同步数据的频率 1800s,半小时,可以按需要设成 1s或其它


如果创建有问题可以用下面的命令删除后重建 
curl -XDELETE '192.168.1.116:9200/_river/userinfo/_meta'
curl -XDELETE '192.168.1.116:9200/_river/userinfo/_status'

XDELETE 参数路径格式: url/_index/_type/_id 

参数说明 

引自:https://github.com/jprante/elasticsearch-jdbc/

Parameters outside of the jdbc block

strategy - the strategy of the JDBC plugin, currently implemented: "simple""column"

schedule - a single or a list of cron expressions for scheduled execution. Syntax is equivalent to the Quartz cron expression format (see below).

threadpoolsize - a thread pool size for the scheduled executions for schedule parameter. If set to 1, all jobs will be executed serially. Default is 4.

interval - a time value for the delay between two river runs (default: not set)

max_bulk_actions - the length of each bulk index request submitted (default: 10000)

max_concurrent_bulk_requests - the maximum number of concurrent bulk requests (default: 2 * number of CPU cores)

max_bulk_volume - a byte size parameter for the maximum volume allowed for a bulk request (default: "10m")

max_request_wait - a time value for the maximum wait time for a response of a bulk request (default: "60s")

flush_interval - a time value for the interval period of flushing index docs to a bulk action (default: "5s")

Parameters inside of the jdbc block

url - the JDBC driver URL

user - the JDBC database user

password - the JDBC database password

sql - SQL statement(s), either a string or a list. If a statement ends with .sql, the statement is looked up in the file system. Example for a list of SQL statements:

"sql" : [
    {
        "statement" : "select ... from ... where a = ?, b = ?, c = ?",
        "parameter" : [ "value for a", "value for b", "value for c" ]
    },
    {
        "statement" : "insert into  ... where a = ?, b = ?, c = ?",
        "parameter" : [ "value for a", "value for b", "value for c" ],
        "write" : "true"
    },
    {
        "statement" : ...
    }
]

sql.statement - the SQL statement

sql.write - boolean flag, if true, the SQL statement is interpreted as an insert/update statement that needs write access (default: false).

sql.callable - boolean flag, if true, the SQL statement is interpreted as a JDBC CallableStatementfor stored procedures (default: false).

sql.parameter - bind parameters for the SQL statement (in order). Some special values can be used with the following meanings:

  • $now - the current timestamp
  • $job - a job counter
  • $count - last number of rows merged
  • $river.name - the river name
  • $last.sql.start - a timestamp value for the time when the last SQL statement started
  • $last.sql.end - a timestamp value for the time when the last SQL statement ended
  • $last.sql.sequence.start - a timestamp value for the time when the last SQL sequence started
  • $last.sql.sequence.end - a timestamp value for the time when the last SQL sequence ended
  • $river.state.started - the timestamp of river start (from river state)
  • $river.state.timestamp - last timestamp of river activity (from river state)
  • $river.state.counter - counter from river state, counts the numbers of runs

locale - the default locale (used for parsing numerical values, floating point character. Recommended values is "en_US")

timezone - the timezone for JDBC setTimestamp() calls when binding parameters with timestamp values

rounding - rounding mode for parsing numeric values. Possible values "ceiling", "down", "floor", "halfdown", "halfeven", "halfup", "unnecessary", "up"

scale - the precision of parsing numeric values

autocommit - true if each statement should be automatically executed. Default is false

fetchsize - the fetchsize for large result sets, most drivers use this to control the amount of rows in the buffer while iterating through the result set

max_rows - limit the number of rows fetches by a statement, the rest of the rows is ignored

max_retries - the number of retries to (re)connect to a database

max_retries_wait - a time value for the time that should be waited between retries. Default is "30s"

resultset_type - the JDBC result set type, can be TYPE_FORWARD_ONLY, TYPE_SCROLL_SENSITIVE, TYPE_SCROLL_INSENSITIVE. Default is TYPE_FORWARD_ONLY

resultset_concurrency - the JDBC result set concurrency, can be CONCUR_READ_ONLY, CONCUR_UPDATABLE. Default is CONCUR_UPDATABLE

ignore_null_values - if NULL values should be ignored when constructing JSON documents. Default is false

prepare_database_metadata - if the driver metadata should be prepared as parameters for acccess by the river. Default is false

prepare_resultset_metadata - if the result set metadata should be prepared as parameters for acccess by the river. Default is false

column_name_map - a map of aliases that should be used as a replacement for column names of the database. Useful for Oracle 30 char column name limit. Default is null

query_timeout - a second value for how long an SQL statement is allowed to be executed before it is considered as lost. Default is 1800

connection_properties - a map for the connection properties for driver connection creation. Default isnull

index - the Elasticsearch index used for indexing

type - the Elasticsearch type of the index used for indexing

index_settings - optional settings for the Elasticsearch index

type_mapping - optional mapping for the Elasticsearch index type

Overview about the default parameter settings

{
    "strategy" : "simple",
    "schedule" : null,
    "interval" : 0L,
    "threadpoolsize" : 4,
    "max_bulk_actions" : 10000,
    "max_concurrent_bulk_requests" : 2 * available CPU cores,
    "max_bulk_volume" : "10m",
    "max_request_wait" : "60s",
    "flush_interval" : "5s",
    "jdbc" : {
        "url" : null,
        "user" : null,
        "password" : null,
        "sql" : null,
        "locale" : Locale.getDefault().toLanguageTag(),
        "timezone" : TimeZone.getDefault(),
        "rounding" : null,
        "scale" : 2,
        "autocommit" : false,
        "fetchsize" : 10, /* MySQL: Integer.MIN */
        "max_rows" : 0,
        "max_retries" : 3,
        "max_retries_wait" : "30s",
        "resultset_type" : "TYPE_FORWARD_ONLY",
        "resultset_concurreny" : "CONCUR_UPDATABLE",
        "ignore_null_values" : false,
        "prepare_database_metadata" : false,
        "prepare_resultset_metadata" : false,
        "column_name_map" : null,
        "query_timeout" : 1800,
        "connection_properties" : null,
        "index" : "jdbc",
        "type" : "jdbc",
        "index_settings" : null,
        "type_mapping" : null,
    }
}

Time scheduled execution of JDBC river

Setting a cron expression in the paramter schedule enables repeated (or time scheduled) runs of JDBC river.

You can also define a list of cron expressions (in a JSON array) to schedule for many different time schedules.

Example of a schedule paramter:

    "schedule" : "0 0-59 0-23 ? * *"

This executes JDBC river every minute, every hour, all the days in the week/month/year.

The following documentation about the syntax of the cron expression is copied from the Quartz scheduler javadoc page.

Cron expressions provide the ability to specify complex time combinations such as "At 8:00am every Monday through Friday" or "At 1:30am every last Friday of the month".

Cron expressions are comprised of 6 required fields and one optional field separated by white space. The fields respectively are described as follows:

Field Name Allowed Values Allowed Special Characters
Seconds 0-59 , - * /
Minutes 0-59 , - * /
Hours 0-23 , - * /
Day-of-month 1-31 , - * ? / L W
Month 1-12 or JAN-DEC , - * /
Day-of-Week 1-7 or SUN-SAT , - * ? / L #
Year (Optional) empty, 1970-2199 , - * /

elasticsearch-jdbc实现MySQL同步到ElasticSearch深入详解

1.如何实现mysql与elasticsearch的数据同步?逐条转换为json显然不合适,需要借助第三方工具或者自己实现。核心功能点:同步增、删、改、查同步。2、mysql与elasticsearc...
  • wojiushiwo987
  • wojiushiwo987
  • 2016年06月16日 20:43
  • 45045

用elasticsearch-river-jdbc同步数据到elasticsearch

用elasticsearch-river-jdbc同步数据到elasticsearch 1 插件安装  插件的github地址 https://github.com/jprante/elasticse...
  • huwei2003
  • huwei2003
  • 2015年09月21日 16:57
  • 6311

elasticsearch使用river同步mysql数据

====== mysql的river介绍======       - 什么是river?river代表es的一个数据源,也是其它存储方式(如:数据库)同步数据到es的一个方法。它是以插件方式存在的一个...
  • alen1985
  • alen1985
  • 2014年11月21日 19:23
  • 32289

elasticsearch 配置 JDBC数据源 与IK 中文分词插件

 配置 JDBC 数据源 1. 安装 openjdk(本人用的ubuntu) 1 sudo apt-get install openjdk-7-jre ...
  • wilsonke
  • wilsonke
  • 2015年06月02日 19:17
  • 5283

elasticsearch-river-jdbc

整理下elasticsearch-river-jdbc插件的用法,希望对大家有帮助。 环境:window + mysql  首先从介绍安装开始 1.1:es version 0.20+ (使用0...
  • an74520
  • an74520
  • 2013年03月30日 15:46
  • 6671

elasticsearch-river-jdbc-1.5.0.5.jar

  • 2018年01月03日 15:39
  • 336KB
  • 下载

elasticsearch之jdbc同步

由于es官网叫停river类的导入插件,因此原始的elasticsearch-jdbc-river变更为elasticsearch-jdbc,成为一个独立的导入工具。官方提到的同类型工具还有logst...
  • u014017121
  • u014017121
  • 2017年02月22日 10:28
  • 684

将数据从PostgreSQL同步到Elasticsearch的经验总结

Elasticsearch 是一款基于 Apache Lucene 构建的开源全文检索引擎,它能够轻松地进行大规模的横向扩展,以支撑PB级的结构化和非结构化海量数据的处理。而关系型数据库比较擅长对...
  • mimi886
  • mimi886
  • 2015年01月12日 13:15
  • 645

面向高稳定,高性能之-Hbase数据实时同步到ElasticSearch(之二)

我在这篇focus在两个主题:如何支持多表同步共用一个jar包,如何持续稳定的与ES交互写入数据。 在 《 使用Hbase协作器(Coprocessor)同步数据到ElasticSearch》中作...
  • zhulangfly
  • zhulangfly
  • 2017年06月22日 15:06
  • 1909

用elasticsearch-river-jdbc同步数据到elasticsearch

用elasticsearch-river-jdbc同步数据到elasticsearch 标签: elasticsearch插件 2015-09-21 16:57 333人阅读 ...
  • javastart
  • javastart
  • 2015年11月20日 16:33
  • 8421
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:用elasticsearch-river-jdbc同步数据到elasticsearch
举报原因:
原因补充:

(最多只允许输入30个字)