流程图
Flume收集Nginx的日志,然后存在Kafka队列中,由storm读取Kafka中的日志信息,经过相关处理后,保存到HBase和MySQL中
安装步骤Kafka
从官网下载安装包, 解压到安装目录
到kafka官网下载页面下载:http://kafka.apache.org/downloads版本:kafka_2.10-0.8.1.1.tgz
$ tar -zxvf kafka_2.10-0.8.1.1.tgz -C /work/opt/modules/
修改配置文件
/opt/modules/kafka_2.10-0.8.2.1/config/server.propertiesbroker.id=0 port=9092 host.name=bigdata01.com num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=/work/opt/modules/kafka_2.10-0.8.2.1/log-data num.partitions=1 num.recovery.threads.per.data.dir=1 log.retention.hours=168 log.segment.bytes=1073741824 log.retention.check.interval.ms=300000 log.cleaner.enable=false zookeeper.connect=bigdata01.com:2181 zookeeper.connection.timeout.ms=6000
启动broker
启动之前,要先确保Zookeeper正常运行。broker启动命令如下:
$ nohup bin/kafka-server-start.sh config/server.properties > logs/server-start.log 2>&1 &
查看进程是否正常:
$ ps -ef | grep kafka
检查端口9092是否开放:
$ netstat -tlnup | grep 9092
创建topic
kafka正常启动运行后,在kafka解压路径下,执行命令:
$ bin/kafka-topics.sh --create --topic nginxlog --partitions 1 --replication-factor 1 --zookeeper bigdata01.com:2181
查看topic详情:
$ bin/kafka-topics.sh --describe --topic nginxlog --zookeeper bigdata01.com:2181
启动console消息生产者,发送消息到kafka的topic上
$ bin/kafka-console-producer.sh --broker-list bigdata01.com:9092 --topic nginxlog
启动console消息消费者,读取kafka上topic的消息
$ bin/kafka-console-consumer.sh --zookeeper bigdata01.com: 2181 --topic nginxlog --from-beginning
模拟产生Nginx日志文件
在服务器上创建一个工作目录
mkdir -p /home/beifeng/project_workspace
将data-generate-1.0-SNAPSHOT-jar-with-dependencies.jar文件上传到刚才创建好的工作目录
下载地址执行命令
java -jar data-generate-1.0-SNAPSHOT-jar-with-dependencies.jar 100 >> nginx.log
通过tail -f nginx.log 查看日志生成情况
停止产生日志,先使用jps查看进程pid,然后kill掉
配置Flume
编写flume agent配置文件flume-kafka-storm.properties
内容如下:# The configuration file needs to define the sources, # the channels and the sinks. # Sources, channels and sinks are defined per agent, # in this case called 'agent' a1.sources =s1 a1.channels =c1 a1.sinks = kafka_sink # define sources a1.sources.s1.type = exec a1.sources.s1.command =tail -F /home/beifeng/project_workplace/nginx.log #define channels a1.channels.c1.type = memory a1.channels.c1.capacity = 100 a1.channels.c1.transactionCapacity = 100 #define kafka sinks a1.sinks.kafka_sink.type =org.apache.flume.sink.kafka.KafkaSink a1.sinks.kafka_sink.topic=nginxlog a1.sinks.kafka_sink.brokerList=bigdata01.com:9092 a1.sinks.kafka_sink.requireAcks=1 a1.sinks.kafka_sink.batch=20 # Bind the source and sink to the channel a1.sources.s1.channels = c1 a1.sinks.kafka_sink.channel = c1
启动flume agent
$ bin/flume-ng agent -n a1 -c conf/ --conf-file conf/flume-kafka-storm.properties -Dflume.root.logger=INFO,console
启动kakfa的console消费者查看是否有日志产生