【写在前面】 飞腾开发者平台是基于飞腾自身强大的技术基础和开放能力,聚合行业内优秀资源而打造的。该平台覆盖了操作系统、算法、数据库、安全、平台工具、虚拟化、存储、网络、固件等多个前沿技术领域,包含了应用使能套件、软件仓库、软件支持、软件适配认证四大板块,旨在共享尖端技术,为开发者提供一个涵盖多领域的开发平台和工具套件。 点击这里开始你的技术升级之旅吧
本文分享至飞腾开发者平台《飞腾平台Flume1.8移植与安装手册》
1 介绍
Flume是由cloudera软件公司产出的可分布式日志收集系统,后与2009年被捐赠了apache软件基金会,为hadoop相关组件之一。
Flume是一种分布式,可靠且可用的服务,用于高效地收集,汇总和移动大量日志数据。它具有基于流式数据流的简单而灵活的架构。它具有可靠的可靠性机制以及许多故障转移和恢复机制,具有强大的容错性和容错能力。它使用一个简单的可扩展数据模型,允许在线分析应用程序。
本文主要介绍移植适配后的Flume1.8在飞腾平台的安装与部署过程。
2 环境要求
2.1 硬件要求
硬件要求如下表所示。
项目 | 说明 |
---|---|
CPU | FT-2000+/64服务器 |
网络 | 无要求 |
存储 | 无要求 |
内存 | 无要求 |
2.2 操作系统要求
操作系统要求如下表所示。
项目 | 说明 |
---|---|
CentOS | 8 |
Kernel | 4.18.0-193.el8.aarch64 |
2.3 软件要求
软件要求如下表所示。
项目 | 说明 |
---|---|
Java | 1.8.0_281 |
Hadoop | 3.3.0 |
3 安装与部署
3.1 程序部署
下载apache-sqoop
wget
http://mirrors.tuna.tsinghua.edu.cn/apache/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz
mv /apache-flume-1.8.0-bin.tar.gz /opt
cd /opt/
tar -zxvf apache-flume-1.8.0-bin
mv apache-flume-1.8.0-bin flume-1.8.0
3.2 程序配置
1)配置环境变量
编辑 /etc/profile 文件,添加以下内容:
export FLUME_HOME=/opt/flume-1.8
export PATH=$PATH:$FLUME_HOME/bin
export FLUME_CONF_DIR=\$FLUME_HOME/conf
2)配置启动信息
# vim /opt/flume-1.8/conf/flume-env.sh
#日志配置
export JAVA_HOME=/opt/jdk-11.0.11
# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
export JAVA_OPTS="-Xms2000m -Xmx5000m -Dcom.sun.management.jmxremote"
3)进入$ FLUME_HOME目录,并新建 conf/ ile-to-hdfs.conf文件添加以下配置
添加配置文件(读取指定文件写入HDFS中)
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /tmp/test.log
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path =hdfs://master.hadoop:9000/flume/%y-%m-%d/%H-%M
# 保存到HDFS上的前缀
a1.sinks.k1.hdfs.filePrefix = weichat_log
a1.sinks.k1.hdfs.fileSuffix = .dat
a1.sinks.k1.hdfs.batchSize= 100
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat =Text
# 配置存储在HDFS上的文件大小单位(bytes)
a1.sinks.k1.hdfs.rollSize = 262144
# 写入多少个event数据后滚动文件(事件个数)
a1.sinks.k1.hdfs.rollCount = 10
# 文件滚动之前的等待时间(秒)
a1.sinks.k1.hdfs.rollInterval = 120
# 1分钟就改目录(创建目录)
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 1
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.useLocalTimeStamp = true
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
3.3 启动服务
1)创建测试目录
[hadoop@master flume-1.8]$ hadoop fs -mkdir /flume
2)启动服务
[hadoop@master flume-1.8]$ $FLUME_HOME/bin/flume-ng agent -c conf -f
$FLUME_HOME/conf/file-to-hdfs.conf -n a1 -Dflume.root.logger=INFO,console
Info: Sourcing environment configuration script /opt/flume-1.8/conf/flume-env.sh
Info: Including Hadoop libraries found via (/opt/hadoop-3.3.0/bin/hadoop) for HDFS access
Info: Including Hive libraries found via (/opt/hive-3.1.2) for Hive access
+ exec /opt/jdk-11.0.11/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/opt/flume-1.8/conf:/opt/flume-1.8/lib/\*:/opt/hadoop-3.3.0/etc/hadoop:/opt/hadoop-3.3.0/share/hadoop/common/lib/\*:/opt/hadoop-3.3.0/share/hadoop/common/\*:/opt/hadoop-3.3.0/share/hadoop/hdfs:/opt/hadoop-3.3.0/share/hadoop/hdfs/lib/\*:/opt/hadoop-3.3.0/share/hadoop/hdfs/\*:/opt/hadoop-3.3.0/share/hadoop/mapreduce/\*:/opt/hadoop-3.3.0/share/hadoop/yarn:/opt/hadoop-3.3.0/share/hadoop/yarn/lib/\*:/opt/hadoop 3.3.0/share/hadoop/yarn/\*:/opt/hive-3.1.2/lib/\*' -Djava.library.path=:/opt/hadoop-3.3.0/lib/native org.apache.flume.node.Application -f /opt/flume-1.8/conf/file-to-hdfs.conf -n a1
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.zip.ZipFile\$Source.initCEN(ZipFile.java:1502)
at java.base/java.util.zip.ZipFile\$Source.\<init\>(ZipFile.java:1280)
at java.base/java.util.zip.ZipFile\$Source.get(ZipFile.java:1243)
at java.base/java.util.zip.ZipFile\$CleanableResource.\<init\>(ZipFile.java:732)
at java.base/java.util.zip.ZipFile\$CleanableResource.get(ZipFile.java:841)
at java.base/java.util.zip.ZipFile.\<init\>(ZipFile.java:247)
at java.base/java.util.zip.ZipFile.\<init\>(ZipFile.java:177)
at java.base/java.util.jar.JarFile.\<init\>(JarFile.java:348)
at java.base/jdk.internal.loader.URLClassPath\$JarLoader.getJarFile(URLClassPath.java:815)
at java.base/jdk.internal.loader.URLClassPath\$JarLoader\$1.run(URLClassPath.java:760)
at java.base/jdk.internal.loader.URLClassPath\$JarLoader\$1.run(URLClassPath.java:753)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/jdk.internal.loader.URLClassPath\$JarLoader.ensureOpen(URLClassPath.java:752)
at java.base/jdk.internal.loader.URLClassPath\$JarLoader.\<init\>(URLClassPath.java:727)
at java.base/jdk.internal.loader.URLClassPath\$3.run(URLClassPath.java:493)
at java.base/jdk.internal.loader.URLClassPath\$3.run(URLClassPath.java:476)
at java.base/java.security.AccessController.doPrivileged(Native Method)
原因:是flume默认jvm堆栈只有20m,导致flume服务无法启动。
修改:flume启动脚本flume-ng中,修改JAVA_OPTS="-Xmx20m"为JAVA_OPTS="-Xmx2048m"。
3)再次启动Flume测试
[hadoop@master conf]$ $FLUME_HOME/bin/flume-ng agent -c conf -f
$FLUME_HOME/conf/file-to-hdfs.conf -n a1 -Dflume.root.logger=INFO,console
Info: Including Hadoop libraries found via (/opt/hadoop-3.3.0/bin/hadoop) for HDFS access
Info: Including Hive libraries found via (/opt/hive-3.1.2) for Hive access
+ exec /opt/jdk-11.0.11/bin/java -Xmx2000m -Dflume.root.logger=INFO,console -cp
'conf:/opt/flume-1.8/lib/\*:/opt/hadoop-3.3.0/etc/hadoop:/opt/hadoop-3.3.0/share/hadoop/common/lib/\*:/opt/hadoop-3.3.0/share/hadoop/common/\*:/opt/hadoop-3.3.0/share/hadoop/hdfs:/opt/hadoop-3.3.0/share/hadoop/hdfs/lib/\*:/opt/hadoop-3.3.0/share/hadoop/hdfs/\*:/opt/hadoop-3.3.0/share/hadoop/mapreduce/\*:/opt/hadoop-3.3.0/share/hadoop/yarn:/opt/hadoop-3.3.0/share/hadoop/yarn/lib/\*:/opt/hadoop-3.3.0/share/hadoop/yarn/\*:/opt/hive-3.1.2/lib/\*'-Djava.library.path=:/opt/hadoop-3.3.0/lib/native org.apache.flume.node.Application -f /opt/flume-1.8/conf/file-to-hdfs.conf -n a1
2021-08-13 16:34:01,202 ERROR hdfs.HDFSEventSink: process failed
java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1380)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1361)
at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1703)
at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:226)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:541)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
at org.apache.flume.SinkRunner\$PollingRunner.run(SinkRunner.java:145)
at java.base/java.lang.Thread.run(Thread.java:834)
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor"
java.lang.NoSuchMethodError:
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1380)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1361)
at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1703)
at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:226)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:541)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
at org.apache.flume.SinkRunner\$PollingRunner.run(SinkRunner.java:145)
at java.base/java.lang.Thread.run(Thread.java:834)./server/lib/guava-11.0.2.jar
[root@master sqoop-1.99.7-bin-hadoop200]#
[root@master sqoop-1.99.7-bin-hadoop200]
以上报错,经过分析后发现是guava版本问题,flume内部依赖guava版本与hadoop依赖的guava版本存在冲突与兼容性问题
解决方案:
hadoop\@master flume-1.8]\$ cp
/opt/hadoop-3.3.0/share/hadoop/common/lib/guava-27.0-jre.jar
./lib/guava-11.0.2.jar
再次启动成功
[hadoop@master ~]$ ps -elf|grep flume
0 S hadoop 2593281 1 0 80 0 - 64 do_wai Aug13 ? 00:00:00
/bin/sh ./start_flume.sh
0 S hadoop 2593282 2593281 0 80 0 - 226106 futex\_ Aug13 ? 00:01:29
/opt/jdk-11.0.11/bin/java -Xms2000m -Xmx5000m -Dcom.sun.management.jmxremote -Dflume.root.logger=INFO,console -cp
/opt/flume-1.8/conf:/opt/flume-1.8/lib/\*:/opt/hadoop-3.3.0/etc/hadoop:/opt/hadoop-3.3.0/share/hadoop/common/lib/\*:/opt/hadoop-3.3.0/share/hadoop/common/\*:/opt/hadoop-3.3.0/share/hadoop/hdfs:/opt/hadoop-3.3.0/share/hadoop/hdfs/lib/\*:/opt/hadoop-3.3.0/share/hadoop/hdfs/\*:/opt/hadoop-3.3.0/share/hadoop/mapreduce/\*:/opt/hadoop-3.3.0/share/hadoop/yarn:/opt/hadoop-3.3.0/share/hadoop/yarn/lib/\*:/opt/hadoop-3.3.0/share/hadoop/yarn/\*:/opt/hive-3.1.2/lib/\* -Djava.library.path=:/opt/hadoop-3.3.0/lib/native
org.apache.flume.node.Application -f /opt/flume-1.8/conf/file-to-hdfs.conf -n a1
0 S hadoop 2666015 2665971 0 80 0 - 58 pipe_w 17:13 pts/0 00:00:00 grep --color=auto flume
4 功能测试
4.1 启动客户端测试
1)写入测试数据到监听文件
[hadoop@master ~]$ echo 'hello'>>/tmp/test.log
在写入文件过程中,flume会在hdfs上生成临时文件:
[hadoop@master ~]$ hadoop fs -ls /flume/21-08-13/16-41/
Found 1 items
-rw-r--r-- 1 hadoop supergroup 30 2021-08-13 16:41
/flume/21-08-13/16-41/weichat_log.1628844076756.dat.tmp
达到一定的文件大小以后,会把hdfs上的临时文件自动命名会.dat文件结尾的数据文件
[hadoop\@master \~]\$ hadoop fs -ls
/flume/21-08-13/16-39/weichat_log.1628843971001.dat
-rw-r--r-- 1 hadoop supergroup 21 2021-08-13 16:40
/flume/21-08-13/16-39/weichat_log.1628843971001.dat
[hadoop@master ~]$ hadoop fs -cat
/flume/21-08-13/16-39/weichat_log.1628843971001.dat
hello
hello
hello
结果表明:大数据组件Flume1.8 在飞腾平台下运行结果正确,符合预期,功能正常。
推荐阅读
欢迎广大开发者来飞腾开发者平台获取更多前沿技术文档及资料
如开发者在使用飞腾产品有任何问题可通过在线工单联系我们
版权所有。飞腾信息技术有限公司 2023。保留所有权利。
未经本公司同意,任何单位、公司或个人不得擅自复制,翻译,摘抄本文档内容的部分或全部,不得以任何方式或途径进行传播和宣传。
商标声明
Phytium和其他飞腾商标均为飞腾信息技术有限公司的商标。
本文档提及的其他所有商标或注册商标,由各自的所有人拥有。
注意
本文档的内容视为飞腾的保密信息,您应当严格遵守保密任务;未经飞腾事先书面同意,您不得向任何第三方披露本文档内容或提供给任何第三方使用。
由于产品版本升级或其他原因,本文档内容会不定期进行更新。除非另有约定,本文档仅作为使用指导,飞腾在现有技术的基础上尽最大努力提供相应的介绍及操作指引,但飞腾在此明确声明对本文档内容的准确性、完整性、适用性、可靠性的等不作任何明示或暗示的保证。
本文档中所有内容,包括但不限于图片、架构设计、页面布局、文字描述,均由飞腾和/或其关联公司依法拥有其知识产权,包括但不限于商标权、专利权、著作权等。非经飞腾和/或其关联公司书面同意,任何人不得擅自使用、修改,复制上述内容。