spark stream应用-从flume获取数据

原创 2018年04月16日 09:27:47

本文章主要实现spark streaming通过两种方式从flume获取数据

1 基于pull模式

import java.util.Arrays;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.flume.FlumeUtils;
import org.apache.spark.streaming.flume.SparkFlumeEvent;

import scala.Tuple2;

/**
 * 基于Flume Poll方式的实时wordcount程序
 * @author Administrator
 *
 */
public class FlumePollWordCount {

   public static void main(String[] args) {
      SparkConf conf = new SparkConf()
            .setMaster("local[2]")
            .setAppName("FlumePollWordCount");  
      JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.seconds(5));
      
      JavaReceiverInputDStream<SparkFlumeEvent> lines =
            FlumeUtils.createPollingStream(jssc, "192.168.0.103", 8888);  
      
      JavaDStream<String> words = lines.flatMap(
            
            new FlatMapFunction<SparkFlumeEvent, String>() {

               private static final long serialVersionUID = 1L;

               @Override
               public Iterable<String> call(SparkFlumeEvent event) throws Exception {
                  String line = new String(event.event().getBody().array());  
                  return Arrays.asList(line.split(" "));   
               }
               
            });
      
      JavaPairDStream<String, Integer> pairs = words.mapToPair(
            
            new PairFunction<String, String, Integer>() {

               private static final long serialVersionUID = 1L;

               @Override
               public Tuple2<String, Integer> call(String word) throws Exception {
                  return new Tuple2<String, Integer>(word, 1);
               }
               
            });
      
      JavaPairDStream<String, Integer> wordCounts = pairs.reduceByKey(
            
            new Function2<Integer, Integer, Integer>() {

               private static final long serialVersionUID = 1L;

               @Override
               public Integer call(Integer v1, Integer v2) throws Exception {
                  return v1 + v2;
               }
               
            });
      
      wordCounts.print();
      
      jssc.start();
      jssc.awaitTermination();
      jssc.close();
   }
   
}

2 基于push模式


Spark应用实战

-
  • 1970年01月01日 08:00

SparkFlumeEvent:spark streaming连接flume,从SparkFlumeEvent中获取记录内容

JavaReceiverInputDStream flumeStream = FlumeUtils.createStream(jssc, args[0], Integer.parseInt(args...
  • aaa1117a8w5s6d
  • aaa1117a8w5s6d
  • 2015-01-24 17:50:12
  • 3070

Flume和SparkStream结合的两种方式--push

Flume和SparkStream结合的两种方式--push
  • zhaoxiangchong
  • zhaoxiangchong
  • 2017-10-28 22:37:22
  • 154

SparkStreaming整合Flume(一)Push方式的整合

SparkStreaming整合Flume(一)Push方式的整合
  • Wing_93
  • Wing_93
  • 2017-11-09 19:43:19
  • 136

第88讲:Spark Streaming从Flume Poll数据

本节课分成二部分讲解: 一、Spark Streaming on Polling from Flume实战 二、Spark Streaming on Polling from Flume源码 第...
  • qq_21234493
  • qq_21234493
  • 2016-05-03 11:36:34
  • 2399

SparkStreaming数据源Flume实际案例分享

本期内容: 1.Spark Streaming on polling from Flume实战 2.Spark Streaming on polling from Flume源码 FlumeC...
  • cary_1991
  • cary_1991
  • 2016-05-02 07:56:28
  • 2364

Flume推送数据到SparkStreaming

Flume推送数据给streaming其实是配置   把数据推送给端口,streaming直接去读端口。   Flume的安装: 1.配置系统环境变量 2.配置flume的conf文件,里...
  • qq_35138768
  • qq_35138768
  • 2016-05-27 16:11:22
  • 613

spark读取kafka数据(两种方式比较及flume配置文件)

Kafka topic及partition设计     1、对于银行应用日志,一个系统建一个topic,每台主机对应一个partition,规则为,flume采集时,同一个应用,数据送到同一个top...
  • liguangzhu620
  • liguangzhu620
  • 2017-12-27 23:38:13
  • 292

Spark Streaming从Flume读取数据流(pull模式)

1.jar包准备参考官方文档: http://spark.apache.org/docs/latest/streaming-flume-integration.html当前测试flume使用到的jar...
  • moxiaomomo
  • moxiaomomo
  • 2017-10-19 13:06:31
  • 615

flume+kafka+spark streaming日志流式处理系统搭建实验

大约2/3年前,基于flume,kafka,storm架构的流式处理系统几乎成为成为业界事实上的标准。时至今日,它依然在流处理方面有着广泛的应用场景。伴随着spark的强势崛起,其内置的spark s...
  • liuwei0376
  • liuwei0376
  • 2017-03-08 18:08:26
  • 1909
收藏助手
不良信息举报
您举报文章:spark stream应用-从flume获取数据
举报原因:
原因补充:

(最多只允许输入30个字)