背景:我司因presto在大数据量下查询较慢,后综合技术特点及我们的数据特点决定采用Clickhouse替代。
实现方案:将Hive数据每天增量同步至Clickhouse。
备注:以下将Clickhouse简称ck
实现步骤:
- 在ck中创建Hive引擎的表
- 在ck中创建MergeTree引擎的表
- 每天将Hive引擎的表增量同步至MergeTree引擎的表
踩坑点
-
一开始在ck创建表后发现hive那边是存储的文本格式的,后来在hive修改为orc格式 ,ck这边没有重新建表,然后就报了上边的错误,删表后重建即可。
Column 0, name: app_module, type: String, parsed text: "<0x03><ASCII NUL><ASCII NUL><ASCII NUL><0x12><0x0F><BACKSPACE><0x11><0x12><TAB><BACKSPACE><ASCII NUL><0x10>▒I<0x18>▒"ERROR: There is no line feed. "P" found instead.It's like your file has more columns than expected.And if your file has the right number of columns, maybe it has an unquoted string value with a comma.: While executing HiveTextRowInputFormat: While executing Hive. (INCORRECT_DATA) -
Hive非分区表在ck创建H