项目中打算使用Flume把数据直接传到Hive表而不是HDFS上,使用Hive作为Sink,Flume版本为1.9.0。
前期启动遇到各种报错:
NoClassDefFoundError: org/apache/hadoop/hive/ql/session/SessionState
NoClassDefFoundError: org/apache/hadoop/hive/cli/CliSessionState
NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
NoClassDefFoundError: org/apache/hadoop/conf/Configuration
NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
java.lang.NoClassDefFoundError: com/esotericsoftware/kryo/Serializer
java.lang.ClassNotFoundException: com.esotericsoftware.kryo.Serializer
NoClassDefFoundError: org/antlr/runtime/RecognitionException
解决:
将相应的jar包一股脑拷过去:
例如,CDH中的jar包目录是:
/data/cloudera/parcels/CDH-5.11.2-1.cdh5.11.2.p0.4/jars
进入到目录后:
scp hive-* root@172.28.65.106:/usr/local/flume/lib
scp hadoop-* root@172.28.65.106:/usr/local/flume/lib
scp antlr-* root@172.28.65.106:/usr/local/flume/lib
scp kryo-2.22.jar root@172.28.65.106:/usr/local/flume/lib
设置flume配置文件:
# example.conf: A single-node Flume configuration
# Name the components on this agent
video_hive.sources = r1
video_hive.sinks = k1
video_hive.channels = c1
# Describe/configure the source
video_hive.sources.r1.type = netcat
video_hive.sources.r1.bind = localhost
video_hive.sources.r1.port = 44444
# Describe the sink
video_hive.sinks.k1.type = hive
video_hive.sinks.k1.channel = c1
video_hive.sinks.k1.hive.metastore = thrift://dev07.hadoop.openpf:9083
#video_hive.sinks.k1.hive.metastore = thrift://172.28.23.21:9083
video_hive.sinks.k1.hive.database = recommend_video
video_hive.sinks.k1.hive.table = video_test
#video_hive.sinks.k1.hive.table = user_video_action_log
video_hive.sinks.k1.hive.partition = %Y-%m-%d
#video_hive.sinks.k1.autoCreatePartitions = false
video_hive.sinks.k1.useLocalTimeStamp = true
video_hive.sinks.k1.batchSize = 1500
#video_hive.sinks.k1.round = true
#video_hive.sinks.k1.roundValue = 10
#video_hive.sinks.k1.roundUnit = m