SparkStreaming(11):高级数据源flume-pull方式(生产)

1.环境

(1)生产环境

flume1.6.0

spark2.1.0

(2)下载对应依赖

备注:一定要将依赖都放入flume的Flume’s classpath内,否则flume运行有问题。(遇到过坑~~~)

(i) Custom sink JAR:

 groupId = org.apache.spark
 artifactId = spark-streaming-flume-sink_2.11
 version = 2.1.0

(ii) Scala library JAR:

 groupId = org.scala-lang
 artifactId = scala-library
 version = 2.11.7

(iii) Commons Lang 3 JAR: 

 groupId = org.apache.commons
 artifactId = commons-lang3
 version = 3.5

 

2.fluem的配置文件flume_pull_streaming.conf

simple-agent.sources = netcat-source
simple-agent.sinks = spark-sink
simple-agent.channels = memory-channel

simple-agent.sources.netcat-source.type = netcat
simple-agent.sources.netcat-source.bind = hadoop
simple-agent.sources.netcat-source.port = 44444

simple-agent.sinks.spark-sink.type = org.apache.spark.streaming.flume.sink.SparkSink
simple-agent.sinks.spark-sink.hostname =  hadoop
simple-agent.sinks.spark-sink.port = 41414

simple-agent.channels.memory-channel.type = memory

simple-agent.sources.netcat-source.channels = memory-channel
simple-agent.sinks.spark-sink.channel = memory-channel

3.scala代码

package Spark

import org.apache.spark.SparkConf
import org.apache.spark.streaming.flume.FlumeUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
  * sparkstreaming 整合flume的第二种方式
  */
object FlumePullWordCount_product_server {
  def main(args: Array[String]): Unit = {

    //实际生产使用
    if(args.length!=2){
      System.err.println("Usage:FlumePullWordCount_product <hostname><port>")
      System.exit(1)
    }

    val Array(hostname,port)=args

    var sparkConf=new SparkConf() //.setMaster("local[2]").setAppName("FlumePullWordCount_product")
    val ssc=new StreamingContext(sparkConf,Seconds(5))

    //TODO:如何使用Sparkfluming 整合flume
    val flumeStream= FlumeUtils.createPollingStream(ssc,hostname,port.toInt)

    flumeStream.map(x=>new String(x.event.getBody.array()).trim)
      .flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()

    ssc.start()
    ssc.awaitTermination()

  }

}


4.测试

(1)将代码打包

(2)启动flume

	bin/flume-ng agent  \
	--name simple-agent \
	--conf conf \
	--conf-file conf/flume_pull_streaming.conf \
	-Dflume.root.logger=INFO,console

(3)启动telnet

telnet hadoop 44444

(4)开启hdfs(如不开启,会报错)

(5)提交spark任务

	bin/spark-submit \
	--class Spark.FlumePullWordCount_product_server \
	--master local[2] \
	--packages org.apache.spark:spark-streaming-flume_2.11:2.1.0 \
	/opt/datas/lib/scalaProjectMaven.jar \
	hadoop 41414

(6)telnet测试输入

	OK
	s d f s 
	OK
	sd  fd f
	OK

(结果,成功!)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值