flume+log4j+hdfs(日志通过flume传到hdfs)
- log4j 日志生成.
- flume 日志收集系统,收集日志.
- HDFS Hadoop分布式文件系统,存储日志,使用版本hadoop-3.0.0-alpha1.tar.gz 本文档采用伪分布式方式进行试验,后期进行集群测试.
hdfs 伪分布式安装见博文[hadoop基础环境搭建]一文。
flume安装参考链接
系统环境:centos6.5 linux 64系统
- 官网下载apache-flume-1.6.0-bin.tar.gz
- 压缩包上传至/tmp目录,解压缩至/opt/目录
tar -zxvf apache-flume-1.6.0-bin.tar.gz -C /opt/
3.配置flume的环境变量: 修改 /etc/profile(~/.bashrc)文件
export FLUME_HOME=/home/connect/software/flume
export PATH=$FLUME_HOME/bin:$PATH
修改生效:
source /etc/profile(~/.bashrc)
4 进入apahce-flume的bin目录下:
cd /opt/apache-flume-1.6.0-bin/bin
5 运行脚本程序:
./flume-ng version
如果出现版本号,则表明安装成功
6 使用实例: 在/opt/ apache-flume-1.6.0-bin/conf目录创建example.conf文件,内容如下:
# example.conf: A single-node Flume configuration
# Name the components on this agent
tier1.sources=source1
tier1.channels=channel1
tier1.sinks=sink1
tier1.sources.source1.type=avro
tier1.sources.source1.bind=0.0.0.0
tier1.sources.source1.port=44444
tier1.sources.source1.channels=channel1
tier1.channels.channel1.type=memory
tier1.channels.channel1.capacity=10000
tier1.channels.channel1.transactionCapacity=1000
tier1.channels.channel1.keep-alive=30
tier1.sinks.sink1.type=hdfs
tier1.sinks.sink1.channel=channel1
tier1.sinks.sink1.hdfs.path=hdfs://master68:8020/flume/events
tier1.sinks.sink1.hdfs.fileType=DataStream
tier1.sinks.sink1.hdfs.writeFormat=Text
tier1.sinks.sink1.hdfs.rollInterval=0
tier1.sinks.sink1.hdfs.rollSize=10240
tier1.sinks.sink1.hdfs.rollCount=0
tier1.sinks.sink1.hdfs.idleTimeout=60
然后,启动flume,在目录/opt/ apache-flume-1.6.0-bin下,运行flume
flume-ng agent -c ../conf -f ../conf/flume_kafka.conf -Dflume.root.logger=INFO,console -n tier1 > ../logs/flume.log 2>&1 &
参数说明:
-
n 指定agent名称
-
c 指定配置文件目录
-
f 指定配置文件
-
Dflume.root.logger=DEBUG,console 设置日志等级
-
然后idea新建maven工程
package com.besttone.flume;
import java.util.Date;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
public class WriteLog {
protected static final Log logger = LogFactory.getLog(WriteLog.class);
/**
* [@param](https://my.oschina.net/u/2303379) args
* [@throws](https://my.oschina.net/throws) InterruptedException
*/
public static void main(String[] args) throws InterruptedException {
// TODO Auto-generated method stub
while (true) {
//每隔两秒log输出一下当前系统时间戳
logger.info(new Date().getTime());
Thread.sleep(2000);
}
}
}
对应的pom文件为:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>flumeTest</groupId>
<artifactId>test</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>test</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.21</version>
</dependency>
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-core</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.flume.flume-ng-clients</groupId>
<artifactId>flume-ng-log4jappender</artifactId>
<version>1.6.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-war-plugin</artifactId>
<version>2.6</version>
<configuration>
<warSourceDirectory>WebContent</warSourceDirectory>
<failOnMissingWebXml>false</failOnMissingWebXml>
</configuration>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.5</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
log4j配置文件为:
### set log levels ###
log4j.rootLogger=INFO, stdout, file, flume
log4j.logger.per.flume=INFO
### flume ###
log4j.appender.flume=org.apache.flume.clients.log4jappender.Log4jAppender
log4j.appender.flume.layout=org.apache.log4j.PatternLayout
log4j.appender.flume.Hostname=10.37.167.204
log4j.appender.flume.Port=44444
### stdout ###
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Threshold=INFO
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %c{1} [%p] %m%n
### file ###
log4j.appender.file=org.apache.log4j.DailyRollingFileAppender
log4j.appender.file.Threshold=INFO
log4j.appender.file.File=./logs/tracker/tracker.log
log4j.appender.file.Append=true
log4j.appender.file.DatePattern='.'yyyy-MM-dd
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %c{1} [%p] %m%n
然后写一个运行脚本:
#!/bin/sh
jarlist=$(ls /../flume/lib/*.jar)
CLASSPATH =/.../flume/test-1.0-SNAPSHOT.jar
for jar in ${jarlist}
do
CLASSPATH=${CLASSPATH}:${jar}
done
echo ${CLASSPATH}
java -classpath $CLASSPATH flumeTest.WriteLog
最后打包发送到服务器,然后运行就ok了。
程序成功运行会出现如下画面:
2018-01-08 21:11:40 WriteLog [INFO] 1515417100168
2018-01-08 21:11:42 WriteLog [INFO] 1515417102169
2018-01-08 21:11:44 WriteLog [INFO] 1515417104170
2018-01-08 21:11:46 WriteLog [INFO] 1515417106172
2018-01-08 21:11:48 WriteLog [INFO] 1515417108175
2018-01-08 21:11:50 WriteLog [INFO] 1515417110177
2018-01-08 21:11:52 WriteLog [INFO] 1515417112178
2018-01-08 21:11:54 WriteLog [INFO] 1515417114180
2018-01-08 21:11:56 WriteLog [INFO] 1515417116181
2018-01-08 21:11:58 WriteLog [INFO] 1515417118183
2018-01-08 21:12:00 WriteLog [INFO] 1515417120184
2018-01-08 21:12:02 WriteLog [INFO] 1515417122185
2018-01-08 21:12:04 WriteLog [INFO] 1515417124186
2018-01-08 21:12:06 WriteLog [INFO] 1515417126188
2018-01-08 21:12:08 WriteLog [INFO] 1515417128189
2018-01-08 21:12:10 WriteLog [INFO] 1515417130191
2018-01-08 21:12:12 WriteLog [INFO] 1515417132192
2018-01-08 21:12:14 WriteLog [INFO] 1515417134193
2018-01-08 21:12:16 WriteLog [INFO] 1515417136194
2018-01-08 21:12:18 WriteLog [INFO] 1515417138196
2018-01-08 21:12:20 WriteLog [INFO] 1515417140197
2018-01-08 21:12:22 WriteLog [INFO] 1515417142198
5 最后进入hdfs或在浏览器中查看日志是否进入hdfs中输入:
http:localhost:9870
-
[ ] 问题(提示hadoop不能加载本地库)
-
-
[ ] 问题2
-
在linux下,不可避免的会用VIM打开一些windows下编辑过的文本文件。我们会发现文件的每行结尾都会有一个^M$符号,这是因为 DOS下的编辑器和Linux编辑器对文件行末的回车符处理不一致
-
解决方案: 1 (个人认为是最方便的) 在终端下敲命令:
dos2unix filename
直接转换成unix格式,就OK了.当出现不能用的时候,则说明dos2unix没有被安装,所以需要:
yum install dos2unix -y
然后继续运行就可以了
- [ ] 问题3
- 问题描述: 在hadoop伪分布式中,第一次NameNode格式化后,启动start-dfs.sh后,通过jps查看,namenode/datanode/namesecondly进程都启动了,而当再次格式化namenode之后,DataNode启动不起来了,可以查看问题相似链接 解决方案: 查看core-site.xml中所写的地址/opt/temp/lih-temp中dfs中的name,data,与namesecondary三者中的current中的VERSION中的关系,具体如下两图所示:
- 正确的图
- 错误的图
当第二次对namenode进行格式化时,必须要求namenode的clusterID与datanode的clusterID相同,而namesecondary的就不要求了
- [ ] 4 问题描述:在安装好hadoop之后,配置文件设置如链接所示,当利用wordcount来进行计算: 下文用到的链接
在目录中先新建两个txt文档(如上链接)
#./hdfs dfs -mkdir /hdfsInput
[bin]# ./hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha1.jar wordcount /hdfsInput /hdfsOutput
当利用hadoop运行mapreduce的jar程序时,出现了如下问题:
产生问题的原因: 路径设置不正确
解决办法: 在mapred-site.xm文件中添加(mapreduce的端口为:8088还有8042,虽然不知道是什么,但是后面会知道的):
<property>
<name>mapreduce.application.classpath</name>
<value>
/opt/hadoop/share/hadoop/hdfs/*,
/opt/hadoop/share/hadoop/hdfs/lib/*,
/opt/hadoop/share/hadoop/mapreduce/*,
/opt/hadoop/share/hadoop/mapreduce/lib/*,
/opt/hadoop/share/hadoop/yarn/*,
/opt/hadoop/share/hadoop/yarn/lib/*
</value>
<property>
参考各种网站:
- [ ] 拓展知识点
对项目进行打jar包,并利用命令对其进行运行
- 利用命令对项目进行打包jar并且进行运行: 编写java程序:
public class Helloword {
public static void main(String[] args) {
System.out.println("Hello word!!");
}
}
然后进行编译:
javac Helloword.java
java Helloword
然后 使用
jar -cvf hello.jar Helloword.class
即将Helloword打成了hello.jar,然后进行运行
java -jar hello.jar
jar -cvf hello.war Helloword.class
将Helloword打成war包
java -jar hello.war
- 利用intellij idea进行打包
-
[对一般工程](http://jingyan.baidu.com/article/f25ef254a829a6482c1b8224.html)
-
如果对含有依赖jar的maven项目,需要将maven中的依赖也打成jar包,需要在pom文件中添加相应的依赖:在首先要在pom里<dependencies>和<repositories>间增加<bulid>属性,build配置信息如下:
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>LIHAO.Helloword</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>assembly</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
然后利用:
mvn clean package install -Dmaven.test.skip -X
即可