Flume hive sink采坑记录

最新推荐文章于 2022-11-11 19:40:58 发布

辉煌下的黑

最新推荐文章于 2022-11-11 19:40:58 发布

阅读量3.7k

点赞数 3

分类专栏： flume 文章标签： flume hive sink

本文链接：https://blog.csdn.net/u013360689/article/details/80361278

版权

flume 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

一、hive sink概述

hive sink与hdfs sink 想对比，hive sink可以近实时的把数据采集到hive表中，hdfs sink要构建hive外部表去关联hdfs路径，并且实时性没辣么高。

二、注意事项

1、Hive表必须设置bucket并且 stored as orc

2、flume配置的hive列名必须都是小写，即fieldnames的配置都必须是小写

3、要手动构建分区，即autoCreatePartitions = false

三、Configure hive sink

```

a1.sinks.k2.type = hive
a1.sinks.k2.channel = c2
#hive元存储的url
a1.sinks.k2.hive.metastore = thrift://192.168.3.150:9083
#hive表库名
a1.sinks.k2.hive.database = test
#hive表表名
a1.sinks.k2.hive.table = ods_table
#hive表分区，逗号分隔，%Y代表2018，&y代表18
a1.sinks.k2.hive.partition = %Y-%m-%d
#此处自动创建分区必须关闭，否则会报错。使用手动构建分区
a1.sinks.k2.autoCreatePartitions = false
#使用本地时间（而不是事件头的时间戳）
a1.sinks.k2.useLocalTimeStamp = false
#a1.sinks.k2.round = true
#a1.sinks.k2.roundValue = 1
#a1.sinks.k2.roundUnit = minute
a1.sinks.k2.serializer = DELIMITED
#切记切记，一定要记得转义
a1.sinks.k2.serializer.delimiter = "\\001"
#a1.sinks.k2.serializer.serdeSeparator = "\\001"
#在Flume配置的Hive 列名必须都为小写字母。Hive表必须设置bucket并且 stored as orc。
a1.sinks.k2.serializer.fieldnames = dstype,id,type,lastuploadtime

```

四、hive

create table test.ods_table
(
dsType string ,
id string ,
type string ,
lastUploadTime string
)
partitioned by (dt string)
clustered by (id) into 2 buckets
stored as orc

TBLPROPERTIES ('transactional'='true');

alter table test.ods_table add if not exists partition ( dt='2018-05-18');

辉煌下的黑

关注

3
点赞
踩
11

收藏

觉得还不错? 一键收藏
1
评论
Flume hive sink采坑记录

一、hive sink概述hive sink与hdfs sink 想对比，hive sink可以近实时的把数据采集到hive表中，hdfs sink要构建hive外部表去关联hdfs路径，并且实时性没辣么高。二、注意事项1、Hive表必须设置bucket并且 stored as orc2、flume配置的hive列名必须都是小写，即fieldnames的配置都必须是小写3、要手动构建分区，即aut...
复制链接

扫一扫

专栏目录