21.HDFS引擎
21.1.HDFS引擎
ClickHouse通过该引擎管理HDFS上的数据,从而与Apache Hadoop生态系统集成。
指定表引擎:
ENGINE = HDFS(URI, format)
URI参数是HDFS文件URI,format参数指定文件格式。
URI路径支持globs,这样该表是只读的:
1.* : 替换任意字符,除了空字符和’/’。
2.? : 替换任意单个字符。
3.{some_string,another_string,yet_another_one} : 替换字符串列表中的任意一个。
4.{N…M} : 替换N到M范围的任意数字,包括左右的边界值。
示例:假设有如下HDFS的URI的文件,格式为TSV
'hdfs://hdfs1:9000/some_dir/some_file_1'
'hdfs://hdfs1:9000/some_dir/some_file_2'
'hdfs://hdfs1:9000/some_dir/some_file_3'
'hdfs://hdfs1:9000/another_dir/some_file_1'
'hdfs://hdfs1:9000/another_dir/some_file_2'
'hdfs://hdfs1:9000/another_dir/some_file_3'
有几种方法可以生成包含所有6个文件的表:
CREATE TABLE table_with_range (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/{some,another}_dir/some_file_{1..3}', 'TSV')
CREATE TABLE table_with_question_mark (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/{some,another}_dir/some_file_?', 'TSV')
CREATE TABLE table_with_asterisk (name String, value UInt32) ENGINE = HDFS('hdfs://hdfs1:9000/{some,another}_dir/*', 'TSV')
21.2.案例
1.创建HDFS引擎表(以下的ClickHouse最好是hdfs所在的机器上,这样的话才可以按照下面的方式访问通。)
drop table hdfs_engine_table;
CREATE TABLE hdfs_engine_table (name String,value UInt32) ENGINE=HDFS('hdfs://tqHadoopCluster/toto_2/tmp/hdfs','TSV'); 此处配置了ha,设置了名称。tqHadoopCluster在单击下是:ip:8021
效果图:
middleware :) CREATE TABLE hdfs_engine_table (name String,value UInt32) ENGINE=HDFS('hdfs://tqHadoopCluster/toto_2/tmp/hdfs','TSV');
CREATE TABLE hdfs_engine_table
(
`name` String,
`value` UInt32
)
ENGINE = HDFS('hdfs://tqHadoopCluster/toto_2/tmp/hdfs', 'TSV')
Ok.
0 rows in set. Elapsed: 0.007 sec.
middleware :)
2.插入数据
INSERT INTO hdfs_engine_table VALUES (‘one’, 1), (‘two’, 2), (‘three’, 3);
3.查看数据
SELECT * FROM hdfs_engine_table LIMIT 2;
┌─name──┬─value─┐
│ one │ 1 │
│ two │ 2 │
│ three │ 3 │
└───────┴───────┘
4.查看HDFS路径
hdfs dfs -ls hdfs://tqHadoopCluster/tmp/hdfs
再如:
CREATE TABLE hdfs_engine_table1 (name String, value UInt32) ENGINE=HDFS('hdfs://tqHadoopCluster/tmp/other_storage1/*', 'TSV');
22.JDBC引擎
22.1.JDBC引擎
ClickHouse通过JDBC引擎连接到外部数据库, 如MySQL、Oracle、PostgreSQL等。
ClickHouse使用一个单独的项目clickhouse-jdbc-bridge,其作为一个守护进程运行。
Clickhouse-jdbc-bridge项目链接:https://github.com/ClickHouse/clickhouse-jdbc-bridge (git地址:https://github.com/ClickHouse/clickhouse-jdbc-bridge.git)
Jar包地址:https://github.com/ClickHouse/clickhouse-jdbc-bridge/releases/tag/release_1.0.1
指定表引擎:
ENGINE = JDBC(dbms_uri, external_database, external_table);
1.dbms_uri : 外部的DBMS的URI。格式:
jdbc:<driver_name>://<host_name>:<port>/?user=<username>&password=<password>。
例如:MySQL的格式:jdbc:mysql://localhost:3306/?user=root&password=root。
2.external_database : 外部DBMS的数据库。
3.external_database : 外部DBMS的数据库。
启动服务:
java -jar ./clickhouse-jdbc-bridge-1.0.1.jar --driver-path /root/jdbc/lib/
其中/root/jdbc/lib为数据库驱动包所在的目录。
修改配置:在config.xml修改:
<jdbc_bridge>
<host>192.168.106.102</host>
<port>9019</port>
</jdbc_bridge>
然后重启ClickHouse
下面是关于clickhouse-jdbc-bridge的一张图:
22.2.案例
下载或编译源码,生成jar包:clickhouse-jdbc-bridge-2.0.0-SNAPSHOT.jar
clickhouse-jdbc-bridge-2.0.0-SNAPSHOT.jar放到服务器上。
启动服务:
[root@middleware ~]# pwd
/root
[root@middleware ~]# java -jar ./clickhouse-jdbc-bridge-1.0.1.jar /root/mysql-connector-java-5.1.34.jar
2020-12-09 16:33:11.011 [main] [INFO ] {Utils:837} - Loading JSON from file [config/vertx.json]...
2020-12-09 16:33:12.012 [main] [WARN ] {Utils:846} - Failed to load JSON from file config/vertx.json
2020-12-09 16:33:17.017 [vert.x-eventloop-thread-0] [INFO ] {Utils:837} - Loading JSON from file [config/server.json]...
2020-12-09 16:33:18.018 [vert.x-eventloop-thread-0] [WARN ] {Utils:846} - Failed to load JSON from file config/server.json
2020-12-09 16:33:18.018 [vert.x-eventloop-thread-0] [INFO ] {JdbcBridgeVerticle:531} - Registering consumer to monitor configuration file(s) at [config/datasources]
2020-12-09 16:33:18.018 [vert.x-eventloop-thread-0] [INFO ] {JdbcBridgeVerticle:531} - Registering consumer to monitor configuration file(s) at [config/schemas]
2020-12-09 16:33:18.018 [vert.x-eventloop-thread-0] [INFO ] {JdbcBridgeVerticle:531} - Registering consumer to monitor configuration file(s) at [config/queries]
2020-12-09 16:33:18.018 [vert.x-eventloop-thread-0] [INFO ] {DefaultDataSourceManager:196} - Registering new type of datasource: [jdbc] -> [ru.yandex.clickhouse.jdbcbridge.impl.JdbcDataSource]
2020-12-09 16:33:18.018 [vert.x-eventloop-thread-0] [INFO ] {DefaultDataSourceManager:200} - Default datasource type is set to [jdbc]
2020-12-09 16:33:18.018 [vert.x-eventloop-thread-0] [INFO ] {DefaultDataSourceManager:196} - Registering new type of datasource: [config] -> [ru.yandex.clickhouse.jdbcbridge.impl.ConfigDataSource]
2020-12-09 16:33:18.018 [vert.x-eventloop-thread-0] [INFO ] {DefaultDataSourceManager:196} - Registering new type of datasource: [script] -> [ru.yandex.clickhouse.jdbcbridge.impl.ScriptDataSource]
2020-12-09 16:33:18.018 [vert.x-eventloop-thread-0] [INFO ] {Utils:837} - Loading JSON from file [config/httpd.json]...
2020-12-09 16:33:18.018 [vert.x-eventloop-thread-0] [WARN ] {Utils:846} - Failed to load JSON from file config/httpd.json
2020-12-09 16:33:18.018 [vert.x-eventloop-thread-0] [INFO ] {JdbcBridgeVerticle:220} - Starting web server...
2020-12-09 16:33:19.019 [vert.x-worker-thread-1] [INFO ] {DefaultDataSourceManager:262} - No datasource configuration found
2020-12-09 16:33:19.019 [vert.x-worker-thread-1] [INFO ] {DefaultQueryManager:77} - No schema configuration found
2020-12-09 16:33:19.019 [vert.x-worker-thread-1] [INFO ] {DefaultQueryManager:76} - No query configuration found
2020-12-09 16:33:19.019 [vert.x-eventloop-thread-0] [INFO ] {JdbcBridgeVerticle:224} - Server http://0.0.0.0:9019 started in 14397 ms
2 准备MySQL表的数据
use test;
drop table if exists `test`;
CREATE TABLE `test` (
`int_id` INT NOT NULL AUTO_INCREMENT,
`int_nullable` INT NULL DEFAULT NULL,
`float` FLOAT NOT NULL,
`float_nullable` FLOAT NULL DEFAULT NULL,
PRIMARY KEY (`int_id`));
插入数据:
insert into test (`int_id`, `float`) VALUES (1,2);
select * from test;
+--------+--------------+-------+----------------+
| int_id | int_nullable | float | float_nullable |
+--------+--------------+-------+----------------+
| 1 | NULL | 2 | NULL |
+--------+--------------+-------+----------------+
3 在ClickHouse创建JDBC引擎表
drop table if exists jdbc_table;
CREATE TABLE jdbc_table
(
`int_id` Int32,
`int_nullable` Nullable(Int32),
`float` Float32,
`float_nullable` Nullable(Float32)
)
ENGINE JDBC('jdbc:mysql://192.168.110.173:3306/?user=root&password=xxxx', 'test', 'test');
效果图:
middleware :) CREATE TABLE jdbc_table
:-] (
:-] `int_id` Int32,
:-] `int_nullable` Nullable(Int32),
:-] `float` Float32,
:-] `float_nullable` Nullable(Float32)
:-] )
:-] ENGINE JDBC('jdbc:mysql://xxx.xxx.xxx.xxx:3306/?user=root&password=xxxxxxx', 'test', 'test');
CREATE TABLE jdbc_table
(
`int_id` Int32,
`int_nullable` Nullable(Int32),
`float` Float32,
`float_nullable` Nullable(Float32)
)
ENGINE = JDBC('jdbc:mysql://xxx.xxx.xxx.xxx:3306/?user=root&password=xxxxxxx', 'test', 'test')
Ok.
0 rows in set. Elapsed: 0.011 sec.
middleware :)
查看数据:
:) select int_id, float from jdbc_table;
SELECT
int_id,
float
FROM jdbc_table
┌─int_id─┬─float─┐
│ 1 │ 2 │
└────────┴───────┘