spark streaming+flume avro实时计算

1.编写测试代码,制作成jar包(spark提供的测试代码如下,已经编译好了,自己写代码,要用maven或者sbt制做jar包:)

package org.apache.spark.examples.streaming

import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming._
import org.apache.spark.streaming.flume._
import org.apache.spark.util.IntParam
import org.apache.spark.streaming.flume.FlumeUtils

object FlumeEventCount {
  def main(args: Array[String]) {
    if (args.length < 2) {
      System.err.println(
        "Usage: FlumeEventCount <host> <port>")
      System.exit(1)
    }

    //StreamingExamples.setStreamingLogLevels()

    val Array(host, IntParam(port)) = args

    val batchInterval = Milliseconds(2000)

    // Create the context and set the batch size
    val sparkConf = new SparkConf().setAppName("FlumeEventCount")
    val ssc = new StreamingContext(sparkConf, batchInterval)

    // Create a flume stream
    val stream = FlumeUtils.createStream(ssc, host, port, StorageLevel.MEMORY_ONLY_SER_2)

    // Print out the count of events received from this server in each batch
    stream.count().map(cnt => "Received " + cnt + " flume events." ).print()

    ssc.start()
    ssc.awaitTermination()
  }
}

1.启动作业(如果是自己写的代码,需要用spark-submit运行jar包),此时会看到一些日志信息
/opt/spark/bin/run-example org.apache.spark.examples.streaming.FlumeEventCount localhost 4141
。。。
15/11/16 21:38:24 INFO spark.SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:861
15/11/16 21:38:24 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 14 (MapPartitionsRDD[24] at map at FlumeEventCount.scala:64)
15/11/16 21:38:24 INFO scheduler.TaskSchedulerImpl: Adding task set 14.0 with 1 tasks
15/11/16 21:38:24 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 14.0 (TID 79, 192.168.10.155, PROCESS_LOCAL, 1980 bytes)
15/11/16 21:38:24 INFO storage.BlockManagerInfo: Added broadcast_11_piece0 in memory on 192.168.10.155:55101 (size: 1831.0 B, free: 530.3 MB)
15/11/16 21:38:24 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to 192.168.10.155:47912
15/11/16 21:38:24 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 14.0 (TID 79) in 48 ms on 192.168.10.155 (1/1)
15/11/16 21:38:24 INFO scheduler.DAGScheduler: ResultStage 14 (print at FlumeEventCount.scala:64) finished in 0.050 s
15/11/16 21:38:24 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 14.0, whose tasks have all completed, from pool 
15/11/16 21:38:24 INFO scheduler.DAGScheduler: Job 7 finished: print at FlumeEventCount.scala:64, took 0.064595 s
-------------------------------------------
Time: 1447727904000 ms
-------------------------------------------
Received 0 flume events.


2.另外打开一个终端,配置flume avro
root@hadoop1:/opt/flume/conf# cat avro.conf 
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 41414

a1.sinks = k1
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = localhost
a1.sinks.k1.port = 4141

a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000

3.运行flume avro服务端
$FLUME_HOME/bin/flume-ng agent -n a1 -c conf -f $FLUME_HOME/conf/avro.conf

4.运行flume avro客户端,并输入5行数据:1,2,3,4,5
$FLUME_HOME/bin/flume-ng avro-client -H localhost -p 41414
1
2
3
4
5

5.在第一步(1.启动作业)的终端可以看到如下内容:

Time: 1447689682000 ms
-------------------------------------------
Received 0 flume events.

15/11/17 00:01:22 WARN BlockManager: Block input-0-1447689682600 replicated to only 0 peer(s) instead of 1 peers
-------------------------------------------
Time: 1447689684000 ms
-------------------------------------------
Received 5 flume events.
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值