一、解压tar包
二、配置环境变量
1、Flume安装在Hadoop集群中
export JAVA_HOME=/usr/lib/jvm/java-6-sun
2、Flume安装在了Hadoop集群中,配置HA
#export JAVA_HOME=
还需要把Hadoop中的core-site.xml hdfs-site.xml 拷贝到Flume的conf目录下。
3、没有安装在集群中
export JAVA_HOME=
还需要把Hadoop中的core-site.xml hdfs-site.xml 拷贝到Flume的conf目录下。
还需要把Hadoop中相关的jar包拷贝到Flume的lib下。
三、操作命令
将日志文件存放在HDFS中的案例
flume-hdfs.properties的配置文件的内容为:
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per a1,
# in this case called 'a1'
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# For each one of the sources, the type is defined
a1.sources.s1.type = exec
a1.sources.s1.command=tail -f /opt/app/hive-0.13.1-cdh5.3.6/logs/hive.log
a1.sources.s1.hdfs.filePrefix
# The channel can be defined as follows.
a1.sources.s1.channels = c1
# Each sink's type must be defined
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path=/hive/log
a1.sinks.k1.hdfs.filePrefix=hive-log/part/date-%Y-%m-%d
a1.sinks.k1.hdfs.rollSize=102400
a1.sinks.k1.hdfs.rollInterval=0
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.useLocalTimeStamp=true
#Specify the channel the sink should use
a1.sinks.k1.channel = c1
# Each channel's type is defined.
a1.channels.c1.type = memory
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
a1.channels.c1.capacity = 100
#a1.channels.c1.checkpointDir =/opt/datas/flume/file/check
#a1.channels.c1.dataDirs =/opt/datas/flume/file/data
然后在flume中运行下面这段代码:
bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/flume-mem.propertities -Dflume.root.logger=INFO,console
最后在HDFS中查看文件: