1、Hive建表需求
1)表必须是事务表
2)必须是分桶表
3)表的存储方式必须是 orc
4)表的列名为小写
5)考虑到数据源到ods层,做个时间分区比较好
2、示例
2.1 建表
create table hivesinktest(
user_id int
,user_name string
,age int
)
partitioned by (dt string)
clustered by (user_id) into 2 buckets
row format delimited fields terminated by '\t'
stored as orc # 存储为 orc文件
tblproperties('transactional'='true') # 事务表开启
2.2 修改(加入)hive-site.xml配置,支持事务处理
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
2.3 拷贝hive下的jar包到flume/lib下
文件路径 HIVE_HOME/hcatalog/share/hcatalog
2.4 Flume 配置文件
me the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type=http
a1.sources.r1.bind=ip
a1.sources.r1.handler=cn.shh.source.SourceHttp
a1.sources.r1.port=5000
a1.sources.r1.insertTimestamp=true
# Describe the sink
a1.sinks.k1.type=hive
a1.sinks.k1.hive.metastore=thrift://ip:9083
a1.sinks.k1.hive.database=default
a1.sinks.k1.hive.table=sinktest
a1.sinks.k1.hive.partition = %Y-%m-%d
a1.sinks.k1.autoCreatePartitions = false
a1.sinks.k1.useLocalTimeStamp = true
a1.sinks.k1.serializer=DELIMITED
a1.sinks.k1.serializer.delimiter="\t"
a1.sinks.k1.serializer.serdeSeparator='\t'
a1.sinks.k1.serializer.fieldnames=user_id,user_name,age
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 100000
a1.channels.c1.transactionCapacity = 15000
# Bind the source and sink to the channel //绑定组件
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
2.5 开启netcat客户端 向监听端口发送数据(加tab)
2.6 查看hive中表的变化
成功~~ 注意:首先得启动hive的元数据服务
虚拟机上测试遇到的问题:NoSuchMethodError
感觉是集成的jar包(hcatalog)出了问题,调用不到方法,不知道具体问题出在哪,第一次测试的时候hivesink可以正常操作。
哪位路过的大哥995 - -。。。。
CDH中Flume与Hive连条遇到的问题
java.lang.ClassNotFoundException:org.apache.hadoop.hive.ql.session.SessionState
java.lang.ClassNotFoundException: org.apache.hadoop.hive.cli.CliSessionState
解决方案:从CDH的jars里拷贝包到flume-ng/lib下
结果最后又碰到第一个问题。。