环境
CentOS7、hadoop-3.2.2、flume-1.9.0、zookeeper-3.6.2、jdk1.8.0
安装测试使用的软件:
[root@node-1 ~]# yum -y install telnet-server
[root@node-1 ~]# yum -y install telnet
[root@node-1 ~]# systemctl start telnet.socket
flume的netcat-logger
配置${FLUME_HOME}/conf/flume-env.sh文件:
[bigdata@node-1 conf]$ cp flume-env.sh.template flume-env.sh
[bigdata@node-1 conf]$ vim flume-env.sh
...
# 配置JAVA_HOME
export JAVA_HOME=/opt/env/jdk1.8.0_181
...
创建一个存放自定义配置文件的文件夹,${FLUME_HOME}/myconf/,添加配置文件,如下:
[bigdata@node-1 myconf]$ mkdir myconf
[bigdata@node-1 myconf]$ cd myconf
[bigdata@node-1 myconf]$ vim logger-conf.properties
# a1表示agent的名字
# r1表示a1的source
a1.sources=r1
# k1表示a1的sink
a1.sinks=k1
# c1表示a1的channel
a1.channels=c1
# 表示a1的输入源类型是netcat
a1.sources.r1.type=netcat
# 表示a1输入源绑定的ip地址
a1.sources.r1.bind=192.168.56.129
# 表示a1输入源监听的端口
a1.sources.r1.port=12345
# 表示a1的输出类型是logger
a1.sinks.k1.type=logger
# 表示a1的channel类型是memorychannel
a1.channels.c1.type=memory
# 表示a1的channel中可存储的最大event个数
a1.channels.c1.capacity=1000
# 表示a1的channel从source中获取或者sink发送的最大event个数
a1.channels.c1.transactionCapacity=100
# 将a1的source和channel连接
a1.sources.r1.channels=c1
# 将a1的channel和sink连接
a1.sinks.k1.channel=c1
配置完成后,可执行agent了,如下:
[bigdata@node-1 myconf]$ flume-ng agent --conf conf --conf-file logger-conf.properties --name a1 -Dflume.root.logger=INFO,console
Info: Including Hadoop libraries found via (/opt/env/hadoop-3.2.2/bin/hadoop) for HDFS access
Info: Including HBASE libraries found via (/opt/env/hbase-2.4.2/bin/hbase) for HBASE access
Info: Including Hive libraries found via (/opt/env/hive-2.3.8) for Hive access
+ exec /opt/env/jdk1.8.0_181/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp 'conf:/opt/env/flume-1.9.0/lib/*:/opt/env/hadoop-3.2.2/etc/hadoop:/opt/env/hadoop-3.2.2/share/hadoop/common/lib/*:/opt/env/hadoop-3.2.2/share/hadoop/common/*:/opt/env/hadoop-3.2.2/share/hadoop/hdfs:/opt/env/hadoop-3.2.2/share/hadoop/hdfs/lib/*:/opt/env/hadoop-3.2.2/share/hadoop/hdfs/*:/opt/env/hadoop-3.2.2/share/hadoop/mapreduce/lib/*:/opt/env/hadoop-3.2.2/share/hadoop/mapreduce/*:/opt/env/hadoop-3.2.2/share/hadoop/yarn:/opt/env/hadoop-3.2.2/share/hadoop/yarn/lib/*:/opt/env/hadoop-3.2.2/share/hadoop/yarn/*:/opt/env/hbase-2.4.2/conf:/opt/env/jdk1.8.0_181/lib/tools.jar:/opt/env/hbase-2.4.2:/opt/env/hbase-2.4.2/lib/shaded-clients/hbase-shaded-client-byo-hadoop-2.4.2.jar:/opt/env/hbase-2.4.2/lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/opt/env/hbase-2.4.2/lib/client-facing-thirdparty/commons-logging-1.2.jar:/opt/env/hbase-2.4.2/lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/opt/env/hbase-2.4.2/lib/client-facing-thirdparty/log4j-1.2.17.jar:/opt/env/hbase-2.4.2/lib/client-facing-thirdparty/slf4j-api-1.7.30.jar:/opt/env/hadoop-3.2.2/etc/hadoop:/opt/env/hadoop-3.2.2/share/hadoop/common/lib/*:/opt/env/hadoop-3.2.2/share/hadoop/common/*:/opt/env/hadoop-3.2.2/share/hadoop/hdfs:/opt/env/hadoop-3.2.2/share/hadoop/hdfs/lib/*:/opt/env/hadoop-3.2.2/share/hadoop/hdfs/*:/opt/env/hadoop-3.2.2/share/hadoop/mapreduce/lib/*:/opt/env/hadoop-3.2.2/share/hadoop/mapreduce/*:/opt/env/hadoop-3.2.2/share/hadoop/yarn:/opt/env/hadoop-3.2.2/share/hadoop/yarn/lib/*:/opt/env/hadoop-3.2.2/share/hadoop/yarn/*:/opt/env/hbase-2.4.2/conf:/opt/env/hive-2.3.8/lib/*' -Djava.library.path=:/opt/env/hadoop-3.2.2/lib/native:/opt/env/hadoop-3.2.2/lib/native org.apache.flume.node.Application --conf-file logger-conf.properties --name a1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/env/flume-1.9.0/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/env/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/env/hive-2.3.8/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2021-06-24 14:14:27,858 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
2021-06-24 14:14:27,862 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:logger-conf.properties
2021-06-24 14:14:27,864 INFO conf.FlumeConfiguration: Processing:c1
2021-06-24 14:14:27,865 INFO conf.FlumeConfiguration: Processing:c1
2021-06-24 14:14:27,865 INFO conf.FlumeConfiguration: Processing:r1
2021-06-24 14:14:27,865 INFO conf.FlumeConfiguration: Processing:r1
2021-06-24 14:14:27,865 INFO conf.FlumeConfiguration: Processing:r1
2021-06-24 14:14:27,865 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1
2021-06-24 14:14:27,865 INFO conf.FlumeConfiguration: Processing:c1
2021-06-24 14:14:27,865 INFO conf.FlumeConfiguration: Processing:k1
2021-06-24 14:14:27,865 INFO conf.FlumeConfiguration: Processing:r1
2021-06-24 14:14:27,865 INFO conf.FlumeConfiguration: Processing:k1
2021-06-24 14:14:27,865 WARN conf.FlumeConfiguration: Agent configuration for 'a1' has no configfilters.
2021-06-24 14:14:27,876 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1]
2021-06-24 14:14:27,876 INFO node.AbstractConfigurationProvider: Creating channels
2021-06-24 14:14:27,886 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory
2021-06-24 14:14:27,888 INFO node.AbstractConfigurationProvider: Created channel c1
2021-06-24 14:14:27,888 INFO source.DefaultSourceFactory: Creating instance of source r1, type netcat
2021-06-24 14:14:27,891 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: logger
2021-06-24 14:14:27,893 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1]
2021-06-24 14:14:27,896 INFO node.Application: Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@55bba3fe counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
2021-06-24 14:14:27,899 INFO node.Application: Starting Channel c1
2021-06-24 14:14:27,900 INFO node.Application: Waiting for channel: c1 to start. Sleeping for 500 ms
2021-06-24 14:14:27,944 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
2021-06-24 14:14:27,944 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
2021-06-24 14:14:28,400 INFO node.Application: Starting Sink k1
2021-06-24 14:14:28,401 INFO node.Application: Starting Source r1
2021-06-24 14:14:28,402 INFO source.NetcatSource: Source starting
2021-06-24 14:14:28,408 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/192.168.56.129:12345]
另外开启一个终端,执行如下命令:
[bigdata@node-1 myconf]$ telnet node-1 12345
Trying 192.168.56.129...
Connected to node-1.
Escape character is '^]'.
hello flume
OK
test flume
OK
在原来的终端就可以看见sink的输出了,如下:
...
2021-06-24 14:14:28,400 INFO node.Application: Starting Sink k1
2021-06-24 14:14:28,401 INFO node.Application: Starting Source r1
2021-06-24 14:14:28,402 INFO source.NetcatSource: Source starting
2021-06-24 14:14:28,408 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/192.168.56.129:12345]
2021-06-24 14:18:10,767 INFO sink.LoggerSink: Event: { headers:{} body: 68 65 6C 6C 6F 20 66 6C 75 6D 65 0D hello flume. }
2021-06-24 14:18:48,793 INFO sink.LoggerSink: Event: { headers:{} body: 74 65 73 74 20 66 6C 75 6D 65 0D test flume. }
flume的file-hdfs
hadoop集群部署
编写配置文件dir-to-hdfs.properties,如下:
[bigdata@node-1 myconf]$ vim dir-to-hdfs.properties
fh.sources = r1
fh.sinks = k1
fh.channels=c1
fh.sources.r1.type = spooldir
fh.sources.r1.spoolDir = /opt/env/test
fh.sources.r1.fileSuffix = .log
fh.sources.r1.fileHeader = true
fh.sources.r1.igonrePattern = ([^ ]*\.tmp)
fh.sinks.k1.type = hdfs
fh.sinks.k1.hdfs.path = hdfs://vmcluster/input/%Y-%m-%d
fh.sinks.k1.hdfs.filePrefix = binlog-
fh.sinks.k1.hdfs.writeFormat = Text
fh.sinks.k1.hdfs.minBlockReplicas=1
fh.sinks.k1.hdfs.rollInterval = 0
fh.sinks.k1.hdfs.rollSize =134217728
fh.sinks.k1.hdfs.rollCount = 0
fh.sinks.k1.hdfs.fileType = DataStream
fh.channels.c1.type = memory
fh.channels.c1.capacity = 1000
fh.channels.c1.transactionCapacity = 100
fh.sources.r1.channels = c1
fh.sinks.k1.channel = c1
flume依赖的hadoop的jar包拷贝,如下:
[bigdata@node-1 common]$ cp ${HADOOP_HOME}/share/hadoop/common/hadoop-common-3.2.2.jar ${FLUME_HOME}/lib
[bigdata@node-1 lib]$ cp ${HADOOP_HOME}/share/hadoop/common/lib/commons-configuration2-2.1.1.jar ${FLUME_HOME}/lib
[bigdata@node-1 lib]$ cp ${HADOOP_HOME}/share/hadoop/common/lib/commons-io-2.5.jar ${FLUME_HOME}/lib
[bigdata@node-1 hdfs]$ cp ${HADOOP_HOME}/share/hadoop/hdfs/hadoop-hdfs-3.2.2.jar ${FLUME_HOME}/lib
[bigdata@node-1 hdfs]$ cp ${HADOOP_HOME}/share/hadoop/hdfs/hadoop-hdfs-client-3.2.2.jar ${FLUME_HOME}/lib
[bigdata@node-1 lib]$ cp ${HADOOP_HOME}/share/hadoop/hdfs/lib/hadoop-auth-3.2.2.jar ${FLUME_HOME}/lib
[bigdata@node-1 lib]$ cp ${HADOOP_HOME}/share/hadoop/hdfs/lib/htrace-core4-4.1.0-incubating.jar ${FLUME_HOME}/lib
[bigdata@node-1 lib]$ cp ${HADOOP_HOME}/share/hadoop/hdfs/lib/stax2-api-3.1.4.jar ${FLUME_HOME}/lib
[bigdata@node-1 lib]$ cp ${HADOOP_HOME}/share/hadoop/hdfs/lib/woodstox-core-5.0.3.jar ${FLUME_HOME}/lib
执行如下命令:
[bigdata@node-1 myconf]$ flume-ng agent --conf conf --conf-file dir-to-hdfs.properties --name fh