flume kafka spark streaming

6 篇文章 0 订阅
3 篇文章 0 订阅

flume kafka spark streaming
安装flume 目前1.6 可能不支持Taildir(猜测) ,下载1.7/1.8版本下载地址
http://www.apache.org/dyn/closer.lua/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz
去官网自己找吧。
1.先搭建flume
将下载的包解压 tar -zxvf **
1)配置conf
cp flume-env.sh.template flume-env.sh
配置java home(也可以不配置 环境变量配置的详细就可以省略)
这里我就没有配置,贴出/etc/profile 内容

export JAVA_HOME=/usr/local/soft/jdk/jdk1.8.0_45
export JRE_HOME=${JAVA_HOME}/jre 
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib 
export  FLUME_HOME=/usr/local/soft/apache-flume-1.8.0-bin
export  FLUME_CONF_DIR=$FLUME_HOME/conf
export  FLUME_PATH=$FLUME_HOME/bin
export PATH=$SCALA_HOME/bin:$SPARK_HOME/bin:$HADOOP_HOME/bin:${JAVA_HOME}/bin:$FLUME_HOME/bin:$PATH

2)关键配置
kafkanode.conf 配置如下

#Agent
flumeAgent.channels = c1
flumeAgent.sources  = s1
flumeAgent.sinks    = k1 
#flumeAgent Spooling Directory Source
#注(1)
flumeAgent.sources.s1.type = TAILDIR
flumeAgent.sources.s1.positionFile = /opt/apps/log4j/taildir_position.json
flumeAgent.sources.s1.fileHeader = true
#flumeAgent.sources.s1.deletePolicy =immediate
#flumeAgent.sources.s1.batchSize =1000
flumeAgent.sources.s1.channels =c1
flumeAgent.sources.s1.filegroups = f1 f2
flumeAgent.sources.s1.filegroups.f1=/usr/logs/.*log.*
flumeAgent.sources.s1.filegroups.f2=/logs/.*log.*
#flumeAgent.sources.s1.deserializer.maxLineLength =1048576
#flumeAgent FileChannel
#注(2)
flumeAgent.channels.c1.type = file
flumeAgent.channels.c1.checkpointDir = /var/flume/spool/checkpoint
flumeAgent.channels.c1.dataDirs = /var/flume/spool/data
flumeAgent.channels.c1.capacity = 200000000
flumeAgent.channels.c1.keep-alive = 30
flumeAgent.channels.c1.write-timeout = 30
flumeAgent.channels.c1.checkpoint-timeout=600
# flumeAgent Sinks
#注(3)
flumeAgent.sinks.k1.channel = c1
flumeAgent.sinks.k1.type = avro
# connect to CollectorMainAgent
flumeAgent.sinks.k1.hostname = data17.Hadoop
flumeAgent.sinks.k1.port = 44444

kafka.conf 配置如下

#flumeConsolidationAgent
flumeConsolidationAgent.channels = c1
flumeConsolidationAgent.sources  = s1
flumeConsolidationAgent.sinks    = k1 

#flumeConsolidationAgent Avro Source
#注(4)
flumeConsolidationAgent.sources.s1.type = avro
flumeConsolidationAgent.sources.s1.channels = c1
flumeConsolidationAgent.sources.s1.bind = data17.Hadoop
flumeConsolidationAgent.sources.s1.port = 44444
flumeConsolidationAgent.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink  
flumeConsolidationAgent.sinks.k1.topic = myflume
flumeConsolidationAgent.sinks.k1.brokerList = data14.Hadoop:9092,data15.Hadoop:9092,data16.Hadoop:9092
flumeConsolidationAgent.sinks.k1.requiredAcks = 1  
flumeConsolidationAgent.sinks.k1.batchSize = 20  
flumeConsolidationAgent.sinks.k1.channel = c1
flumeConsolidationAgent.channels.c1.type = file
flumeConsolidationAgent.channels.c1.checkpointDir = /var/flume/spool/checkpoint
flumeConsolidationAgent.channels.c1.dataDirs = /var/flume/spool/data
flumeConsolidationAgent.channels.c1.capacity = 200000000
flumeConsolidationAgent.channels.c1.keep-alive = 30
flumeConsolidationAgent.channels.c1.write-timeout = 30
flumeConsolidationAgent.channels.c1.checkpoint-timeout=600

kafka.conf 是主节点 启动命令

bin/flume-ng agent --conf conf --conf-file conf/kafka.conf --name flumeConsolidationAgent -Dflume.root.logger=DEBUG,console

kafkanode.conf 是副节点 可以有多个 启动命令

bin/flume-ng agent --conf conf --conf-file conf/kafkanode.conf --name flumeAgent -Dflume.root.logger=DEBUG,console

【副节点配置都一样】
2.kafka 配置
这里就不说了因为没有什么特殊的配置
用原有的kafka就行,新建一个topic 就行了 命令

/usr/local/soft/kafka_2.11-0.9.0.1/bin/kafka-topics.sh --create --topic myflume --replication-factor 2 --partitions 5 --zookeeper data4.Hadoop:2181

3.spark streaming

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import kafka.serializer.StringDecoder
import scala.collection.immutable.HashMap
import org.apache.log4j.Level
import org.apache.log4j.Logger

object RealTimeMonitorStart extends Serializable {
  def main(args: Array[String]): Unit = {
    Logger.getLogger("org.apache.spark").setLevel(Level.WARN);
    Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.ERROR);

    val conf = new SparkConf().setAppName("stocker").setMaster("local[2]")
    val sc = new SparkContext(conf)

    val ssc = new StreamingContext(sc, Seconds(1))

    // Kafka configurations



    val topicMap =  "myflume".split(",").map((_, 3)).toMap
    print(topicMap)
    val kafkaStrings = KafkaUtils.createStream(ssc,"data4.Hadoop:2181,data5.Hadoop:2181,data6.Hadoop:2181","myflumegroup",topicMap)
    val urlClickLogPairsDStream = kafkaStrings.flatMap(_._2.split(" ")).map((_, 1))

    val urlClickCountDaysDStream = urlClickLogPairsDStream.reduceByKeyAndWindow(
      (v1: Int, v2: Int) => {
        v1 + v2
      },
      Seconds(60),
      Seconds(5));

    urlClickCountDaysDStream.print();

    ssc.start()
    ssc.awaitTermination()
  }
}

目前就这些,有疑问评论留言。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值