Hadoop3.1.3+flume1.8无法将hive.log收集到hdfs上的解决方法
1.jar包问题
flume1.8需要将hadoop3.x版本的jar包导入到flume/lib/
之下
需要导入的jiar包如下:
cp /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-common-3.1.3.jar /opt/module/flume-1.8.0/lib/
cp /opt/module/hadoop-3.1.3/share/hadoop/common/lib/woodstox-core-5.0.3.jar /opt/module/flume-1.8.0/lib/
cp /opt/module/hadoop-3.1.3/share/hadoop/common/lib/stax2-api-3.1.4.jar ./lib/
cp /opt/module/hadoop-3.1.3/share/hadoop/common/lib/commons-configuration2-2.1.1.jar /opt/module/flume-1.8.0/lib/
cp /opt/module/hadoop-3.1.3/share/hadoop/common/lib/hadoop-auth-3.1.3.jar /opt/module/flume-1.8.0/lib/
cp /opt/module/hadoop-3.1.3/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar /opt/module/flume-1.8.0/lib/
cp /opt/module/hadoop-3.1.3/share/hadoop/hdfs/hadoop-hdfs-3.1.3.jar /opt/module/flume-1.8.0/lib/
cp /opt/module/hadoop-3.1.3/share/hadoop/hdfs/hadoop-hdfs-client-3.1.3.jar /opt/module/flume-1.8.0/lib/
这时执行bin/flume-ng agent -c conf/ -f job/file-flume-hdfs.conf -n a2
会抛异常:
Exception in thread “SinkRunner-PollingRunner-DefaultSinkProcessor” java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
此时需要将flume的lib下的低版本guava-11.0.2.jar换成Hadoop下的高版本guava-27.0-jre.jar
,再将低版本的删除,不删除仍然会抛异常
cd /usr/local/hadoop-3.1.3/share/hadoop/common/lib/
cp guava-27.0-jre.jar /opt/module/flume-1.8.0/lib/
rm -rf guava-11.0.2.jar
再重新执行flume任务
2.端口问题
我的file-flume-hdfs.conf
如下:
注意:
a2.sinks.k2.hdfs.path = hdfs://hadoop102:8020/flume/%Y-%m-%d/%H
里面的端口号是8020,hadoop3.x中HDFS NameNode 内部通常端口:8020/9000/9820,hadoop2.x则是8020/9000,我最开始一直使用9000端口,hdfs上一直无法创建路径,换成8020后便成功创建了。
注意这里的端口号,就是你自己的hadoop目录下的./etc/hadoop/core-site.xml里面配置的fs.defaultFS的端口号。
# Name the components on this agent
a2.sources = r2
a2.sinks = k2
a2.channels = c2
# Describe/configure the source
a2.sources.r2.type = exec
a2.sources.r2.command = tail -F /opt/module/hive-3.1.2/logs/hive.log
a2.sources.r2.shell = /bin/bash -c
# Describe the sink
a2.sinks.k2.type = hdfs
a2.sinks.k2.hdfs.path = hdfs://hadoop102:8020/flume/%Y-%m-%d/%H
#上传文件的前缀
a2.sinks.k2.hdfs.filePrefix = logs-
#是否按照时间滚动文件夹
a2.sinks.k2.hdfs.round = true
#多少时间单位创建一个新的文件夹
a2.sinks.k2.hdfs.roundValue = 1
#重新定义时间单位
a2.sinks.k2.hdfs.roundUnit = hour
#是否使用本地时间戳
a2.sinks.k2.hdfs.useLocalTimeStamp = true
#积攒多少个Event才flush到HDFS一次
a2.sinks.k2.hdfs.batchSize = 1000
#设置文件类型,可支持压缩
a2.sinks.k2.hdfs.fileType = DataStream
#多久生成一个新的文件
a2.sinks.k2.hdfs.rollInterval = 30
#设置每个文件的滚动大小
a2.sinks.k2.hdfs.rollSize = 134217700
#文件的滚动与Event数量无关
a2.sinks.k2.hdfs.rollCount = 0
# Use a channel which buffers events in memory
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100
# Bind the source and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2
我最开始使用flume1.9.0,但是一直没解决掉问题,便降低了flume版本,用了1.8.0,通过以上方法成功解决问题!