一 、 flume
- flume是一个日志监听架构,主要由三部分组成,source、channel、sink。一个agent包含了这三个组件,其中一个agent是一个线程
- 关于这三个组件具体配置不过多赘述。
-
# 配置组件 a1.sources=r1 a1.channels=c1 a1.sinks=k1 a1.sinks=k2 # 配置source a1.sources.r1.type=avro a1.sources.r1.bind=192.168.32.129 a1.sources.r1.port=44444 a1.sources.r1.interceptors=i1 a1.sources.r1.interceptors.i1.type=timestamp # 配置sink到hadoop a1.sinks.k1.type=hdfs a1.sinks.k1.hdfs.path=hdfs://192.168.32.129:9000/weblog/reportTime=%Y-%m-%d a1.sinks.k1.hdfs.fileType=DataStream a1.sinks.k1.hdfs.rollInterval=30 a1.sinks.k1.hdfs.rollSize=0 a1.sinks.k1.hdfs.rollCount=1000 # 配置sink2到kafka a1.sinks.k2.type = org.apache.flume.sink.kafka.KafkaSink #a1.sinks.k2.kafka.bootstrap.servers = hadoop01:9092,hadoop02:9092,hadoop03:9092 a1.sinks.k2.brokerList= hadoop01:9092,hadoop02:9092,hadoop03:9092 a1.sinks.k2.topic= weblog a1.sinks.k2.kafka.flumeBatchSize = 20 a1.sinks.k2.kafka.producer.acks = 1 a1.sinks.k2.kafka.producer.linger.ms = 1 a1.sinks.k2.kafka.producer.compression.type = snappy # 配置channel的属性 a1.channels.c1.type=memory a1.channels.c1.capacity=1000 a1.channels.c1.transactionCapacity=100 a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1 a1.sinks.k2.channel=c1
由于这里没有设置hdfs所以把hdfs相关的配置注释掉就可以
-
配置完毕,接下来我们到flume的bin目录下执行命令启动flume:./flume-ng agent -n a1 -c ../conf -f ../conf/hdfs.conf Dflume.root.logger=INFO,console
二 、 kafka的配置
- kafka是一个分布式消息队列,面向topic。
- 在这里我们主要使用sink连接了kafka的producer,而kafka的具体架构和各个功能,则不过多赘述
- kafka从flume-sink接收消息过后,会首先寻找zooker下的/brokers/topic节点,因为kafka集群之间不能直接通信,完全靠注册zookeeper来通信,找到topic节点下的partition得到meta数据,找到leader把消息写入到leader的本地log文件上,然后再相互通信,replica从leader身上拿到数据各自写到本地的log上,过程如图所示
- 有了生产者和partition,启动kafka:sh kafka-server-start.sh ../config/server.properties 接下来我们把storm设置成消费者
三、 Storm的配置
storm的结构
- 这里我使用了本地的eclipse来操作,首先导入相关的jar包
- 导入kafka lib文件夹下的所有包
- 导入其他lib 包 https://pan.baidu.com/s/1PtH4fMJfTbRmGb6s9Rruxg
- 导入strom 的lib包
- 导入kafka-strom的工具包 链接:https://pan.baidu.com/s/1KcvTGWnPgFhscof0smwIaw
-
package cn.tedu.kafka_storm; import backtype.storm.Config; import backtype.storm.LocalCluster; import backtype.storm.generated.StormTopology; import backtype.storm.spout.SchemeAsMultiScheme; import backtype.storm.topology.TopologyBuilder; import storm.kafka.KafkaSpout; import storm.kafka.SpoutConfig; import storm.kafka.StringScheme; import storm.kafka.ZkHosts; public class CatTopology { public static void main(String[] args) { Config conf=new Config(); //这是由kafka-strom整合包提供的 ZkHosts zkHosts=new ZkHosts("hadoop01:2181,hadoop02:2181"); //--配置kafkaSpout的config //--参数1:表示zkpper的地址 //--参数2:消费的主题名 //--参数3和参数4和strom的ack机制有关,表示会在zk下注册一个/info/wb的目录,这个目录 //--用于管理spout发射的tuple的id信息,用于实现at least once 语义(最少一次) //--如果都写为null,则at most once(最多一次) SpoutConfig sc=new SpoutConfig(zkHosts,"cat","/info","my"); //--制定消费的数据源的数据为普通文本格式 //-- 如果下想自定义,就是创一个类实现Scheme接口 //-- 重写deserialize方法,Fields方法规定kafkaSpout的字段名,方便使用trident框架来截取 sc.scheme=new SchemeAsMultiScheme(new StringScheme()); //通过工具类,从kafka消费数据 KafkaSpout spout= new KafkaSpout(sc); PrintBolt printBolt=new PrintBolt(); TopologyBuilder builder=new TopologyBuilder(); builder.setSpout("kafkaSpout", spout); builder.setBolt("printBolt",printBolt).shuffleGrouping("kafkaSpout");//随机分配 StormTopology topology = builder.createTopology(); //本地运行 LocalCluster cluster=new LocalCluster(); cluster.submitTopology("CatTopology", conf,topology ); } }
创建Topology类
-
创建bolt类
四、集群的启动与输出
- 首先创建一个类作为flume的source
-
package cn.tedu; import org.apache.log4j.Logger; public class MessageSend { public static void main(String[] args) throws InterruptedException { Logger logger=Logger.getLogger(MessageSend.class); for(int i=0;i<50;i++){ logger.info("this is"+i+"message"); Thread.sleep(10000); System.out.println("Printing "+i+" message"); } } }
这里用到log4j发送消息到flume,相关jar包:https://pan.baidu.com/s/1IRV6fiDBsLrLzDKYXSbZRg (包含了log4j的配置)
-
log4j配置
启动集群顺序 zookeeper---flume---kafka---strom
为了方便我把命令全部放在了一起:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Zookeeper:
-
-
- 启动:/home/software/zookeeper-3.4.7/bin/zkServer.sh start
- 查看zookeeper状态 /home/software/zookeeper-3.4.7/bin/zkServer.sh status
-
Hbase:
-
-
- 启动Hbase:/home/software/hbase-0.98.17-hadoop2/bin/start-hbase.sh
-
Storm :
-
-
- 启动nimbus:/home/software/apache-storm-0.9.3/bin/storm nimbus >/dev/null 2>&1 &
- 启动supervisor :/home/software/apache-storm-0.9.3/bin/storm supervisor >/dev/null 2>&1 &
- 启动ui :/home/software/apache-storm-0.9.3/bin/storm ui >/dev/null 2>&1 &
-
Kafka :
-
-
- cd /home/software/kafka_2.10-0.10.0.1/bin/
- 启动::sh kafka-server-start.sh ../config/server.properties
- 创建节点 :sh kafka-topics.sh --create --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --replication-factor 1 --partitions 1 --topic weblog
- 查看topic : :sh kafka-topics.sh --list --zookeeper hadoop01:2181
- 启动producer :sh kafka-console-producer.sh --broker-list hadoop01:9092,hadoop02:9092,hadoop03:9092 --topic enbook
- 启动counsumer : sh kafka-console-consumer.sh --zookeeper hadoop01:2181 --topic enbook --from-beginning enbook --from-beginning
-
flume命令:cd /home/software/apache-flume-1.6.0-bin/bin
-
-
- 启动flume :./flume-ng agent -n a1 -c ../conf -f ../conf/hdfs.conf Dflume.root.logger=INFO,console
-
- ./flume-ng agent --conf /home/software/apache-flume-1.6.0-bin/conf --conf-file /home/software/apache-flume-1.6.0-bin/conf/my.conf --name a1 -Dflume.root.logger=INFO,console
-
- 启动flume :./flume-ng agent -n a1 -c ../conf -f ../conf/hdfs.conf Dflume.root.logger=INFO,console
-
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
最后strom本地输出
成功!