Spark Streaming整合Flume

最新推荐文章于 2020-08-20 19:43:14 发布

江湖侠客

最新推荐文章于 2020-08-20 19:43:14 发布

阅读量205

点赞数

分类专栏： Spark生态技术【初学】

本文链接：https://blog.csdn.net/weixin_39868387/article/details/105472236

版权

Spark生态技术【初学】专栏收录该内容

27 篇文章 0 订阅

订阅专栏

Spark Streaming整合Flume方式有两种

方式一：Flume-style Push-based Approach

pom文件依赖

<dependencies>
   <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-flume_2.11</artifactId>
        <version>2.0.2</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>
</dependencies>

  <!-- 打包-->
    <build>
        <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>2.3.2</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

Push方式整合之Flume Agent配置

[root@hadoop1 conf]# vim flume_push_streaming.conf

//添加参数
simple-agent.sources = netcat-source
simple-agent.sinks = avro-sink
simple-agent.channel = memory-channel

simple-agent.sources.netcat-source.type = netcat
simple-agent.sources.netcat-source.bind = hadoop1.x
simple-agent.sources.netcat-source.port = 44444

simple-agent.sinks.avro-sink.type = avro
simple-agent.sinks.avro-sink.hostname = 192.168.126.171
simple-agent.sinks.avro-sink.port = 41414

simple-agent.channels.memory-channel.type = memory

simple-agent.sources.netcat-source.channels = memory-channel
simple-agent.sinks.avro-sink.channel = memory-channel

Spark Streaming应用，代码编写

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.flume.FlumeUtils


/**
 * Spark Streaming整合Flume的第一种方式
 */
object FlumePushWordCount {
  def main(args: Array[String]): Unit = {
    val Array(hostname, port) = args

    val sparkConf = new SparkConf() //.setMaster("local[2]").setAppName("FlumePushWordCount")
    val ssc = new StreamingContext(sparkConf, Seconds(5))

    //TODO... 如何使用SparkStreaming整合Flume
    val flumeStream = FlumeUtils.createStream(ssc, hostname, port.toInt)

    flumeStream.map(x=> new String(x.event.getBody.array()).trim)
      .flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()

    ssc.start()
    ssc.awaitTermination()
  }
}

传参步骤
在这里插入图片描述

虚拟机启动flume服务

[root@hadoop1 bin]# ./flume-ng agent --name simple-agent  conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/flume_push_streaming.conf -Dflume.root.logger=INFO,console

加载过程图：
在这里插入图片描述

运行程序，控制台打印
在这里插入图片描述

提交到生产环境中去，需要打包
在这里插入图片描述
自己把这个jar包上传到lib文件上

[root@hadoop1 lib]# rz -be
rz waiting to receive.
Starting zmodem transfer.  Press Ctrl+C to cancel.
Transferring sparktrain-1.0-SNAPSHOT.jar...
  100%       7 KB       7 KB/sec    00:00:01       0 Errors  


[root@hadoop1 lib]# pwd
/home/hadoop/lib
[root@hadoop1 lib]# ll
total 8
-rw-r--r--. 1 root root 7608 Apr 12 13:43 sparktrain-1.0-SNAPSHOT.jar
[root@hadoop1 lib]#

//进程有
[root@hadoop1 spark]# jps
12801 ResourceManager
12930 NodeManager
12470 DataNode
12646 SecondaryNameNode
13750 Jps
12071 Application
12330 NameNode

[root@hadoop1 spark]#  spark-submit  --class com.imooc.spark.FlumePushWordCount  --master local[2]  --packages org.apache.spark:spark-streaming-flume_2.11:2.2.0 /home/hadoop/sparktrain-1.0-SNAPSHOT.jar  hadoop1.x 41414  

  
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/etc/hadoop/module/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.spark#spark-streaming-flume_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0

发生错误：

20/04/13 16:33:29 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://hadoop1.x:9000/directory
        at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
        at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
        at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:93)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:531)
        at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:836)
        at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:84)
        at com.imooc.spark.FlumePushWordCount$.main(FlumePushWordCount.scala:15)
        at com.imooc.spark.FlumePushWordCount.main(FlumePushWordCount.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/04/13 16:33:29 INFO ShutdownHookManager: Shutdown hook called
20/04/13 16:33:29 INFO ShutdownHookManager: Deleting directory /tmp/spark-1c4c0da4-e097-48c2-9e91-b97d81965a0a

待解决

江湖侠客

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark Streaming整合Flume

Spark Streaming整合Flume方式有两种方式一：Flume-style Push-based Approachpom文件依赖<dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>s...
复制链接

扫一扫