简易实现Flume->Kafka(Zk)->SparkStreaming->Mysql

1 篇文章 0 订阅
1 篇文章 0 订阅

简易实现Flume->Kafka->SparkStreaming->Mysql

	本文章主要实现Spark简单的Spark流式处理,通过Flume监听文件传入Kafka消费者消费后经由SparkStreaming进行处理传入Mysql

1. Flume与Kafka配置

1.1Flume配置

在Flume /conf/group/目录下面创建flume-file-kafka.conf

agent.sources = r1
agent.channels = c1
agent.sinks = s1

agent.sources.r1.type = spooldir
agent.sources.r1.spoolDir = /opt/module/data/log/ #存放监控文件目录
agent.sources.r1.fileHeader = true

# Each sink's type must be defined

#agent.sinks.s1.type = logger

agent.sinks.s1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.s1.topic = expr # 传入kafka主题名
agent.sinks.s1.brokerList = hadoop101:9092
agent.sinks.s1.requiredAcks = 1
agent.sinks.s1.batchSize = 2

# Each channel's type is defined.
agent.channels.c1.type = memory
agent.channels.c1.capacity = 100
agent.sources.r1.channels = c1
agent.sinks.s1.channel = c1

1.2 创建kafka主题与生产者消费者

(1)创建主题

./bin/kafka-topics.sh --create --bootstrap-server hadoop101:9092 --replicationfactor 3 --partitions 2 --topic expr

(2)创建生产者

./kafka-console-producer.sh --broker-list hadoop101:9092 --topic expr

(3)创建消费者

./kafka-console-consumer.sh --bootstrap-server hadoop101:9092  --topic atguigu

1.3 将vegetable.txt 存放到/opt/module/data/log/下

在这里插入图片描述

2.启动flume

2.1 启动flume并观察

./bin/flume-ng agent --conf conf -f ./conf/group/flume-file-kafka.conf -n agent -Dflume.root.logger=INFO,console

在kafka中可看到如下数据
在这里插入图片描述

2.2 改变数据后缀观察

这时vegetable.txt数据已经被消费,我们需要改变后缀让其重新被监控到

 mv vegetable.txt.COMPLETED  vegetable.txt

可以看到kafka会继续接收到数据

3.Mysql创建数据库与表

创建spark处理后的结果数据表

show databases ;
drop database test1;
create database test_1 default charset utf8;
use test_1;
create table price_test(
    time timestamp,
    type varchar(100),
    price double
)

4.编写并启动Spark程序

pom.xml依赖

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.example</groupId>
  <artifactId>sparktest</artifactId>
  <version>1.0-SNAPSHOT</version>
  <inceptionYear>2008</inceptionYear>
  <properties>
    <scala.version>2.12.7</scala.version>
    <spark.version>3.0.0</spark.version>
  </properties>

  <repositories>
    <repository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </repository>
  </repositories>

  <pluginRepositories>
    <pluginRepository>
      <id>scala-tools.org</id>
      <name>Scala-Tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </pluginRepository>
  </pluginRepositories>

  <dependencies>
<!--    <dependency>-->
<!--      <groupId>org.scala-lang</groupId>-->
<!--      <artifactId>scala-library</artifactId>-->
<!--      <version>${scala.version}</version>-->
<!--    </dependency>-->
    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>5.1.49</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.12</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kafka-0-10_2.12</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-core</artifactId>
      <version>2.10.1</version>
    </dependency>
    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-databind</artifactId>
      <version>2.10.1</version>
    </dependency>
    <dependency>
      <groupId>com.alibaba</groupId>
      <artifactId>fastjson</artifactId>
      <version>1.2.83</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.4</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.specs</groupId>
      <artifactId>specs</artifactId>
      <version>1.2.5</version>
      <scope>test</scope>
    </dependency>
  </dependencies>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
          <args>
            <arg>-target:jvm-1.5</arg>
          </args>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-eclipse-plugin</artifactId>
        <configuration>
          <downloadSources>true</downloadSources>
          <buildcommands>
            <buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
          </buildcommands>
          <additionalProjectnatures>
            <projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
          </additionalProjectnatures>
          <classpathContainers>
            <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
            <classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
          </classpathContainers>
        </configuration>
      </plugin>
    </plugins>
  </build>
  <reporting>
    <plugins>
      <plugin>
        <groupId>org.scala-tools</groupId>
        <artifactId>maven-scala-plugin</artifactId>
        <configuration>
          <scalaVersion>${scala.version}</scalaVersion>
        </configuration>
      </plugin>
    </plugins>
  </reporting>
</project>

test.scala

package org.example

import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord}
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}

import java.sql.{Connection, DriverManager, PreparedStatement}
object spark_test {
  def main(args: Array[String]): Unit = {
    //1.创建 SparkConf
    val sparkConf: SparkConf = new
        SparkConf().setAppName("ReceiverWordCount").setMaster("local[*]")
    //2.创建 StreamingContext
    val ssc = new StreamingContext(sparkConf, Seconds(3))
    //3.定义 Kafka 参数
    val kafkaPara: Map[String, Object] = Map[String, Object](
      ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG ->
        "192.168.226.40:9092",
      ConsumerConfig.GROUP_ID_CONFIG->
        "expr",
      "key.deserializer" ->
        "org.apache.kafka.common.serialization.StringDeserializer",
      "value.deserializer" ->
        "org.apache.kafka.common.serialization.StringDeserializer"
    )
    //4.读取 Kafka 数据创建 DStream
    val kafkaDStream: InputDStream[ConsumerRecord[String, String]] =
      KafkaUtils.createDirectStream[String, String](ssc, LocationStrategies.PreferConsistent,
        ConsumerStrategies.Subscribe[String, String](Set("expr"), kafkaPara))
    //5.将每条消息的 KV 取出
    val valueDStream: DStream[String] = kafkaDStream.map(record => record.value())
    //6.处理每一行数据 进行分割
    val value: DStream[(String, String, String)] = valueDStream.flatMap(_.split("\n")).map {
      line: String =>
        val data: (String, String, String) = (line.split(" ")(0), line.split(" ")(1), line.split(" ")(5))
        (data._1, data._2, data._3)
    }
    value.print()
    value.foreachRDD(rdd => rdd.foreachPartition(line => {
      Class.forName("com.mysql.jdbc.Driver")
      //获取mysql连接
      val conn = DriverManager.getConnection("jdbc:mysql://192.168.226.40:3306/test_1?useSSL=false&useUnicode=true&characterEncoding=UTF-8"
        , "root", "123456")
      //把数据写入mysql
      try {
        for (row <- line) {
          val sql = "insert into price_test(time,type,price) values('" + row._1 + "','" + row._2 + "','"+row._3+"')"
          conn.prepareStatement(sql).executeUpdate()
        }
      } finally {
        conn.close()
      }
    }))
      //7.开启任务
      ssc.start()
      ssc.awaitTermination()
  }
}

5.查看结果

顺序一定是先启动kafka消费者->后启动Flume->马上启动scala程序->(程序如果不报错直接停止,再次启动即可,然后改变文件后缀重新监听数据)
在这里插入图片描述在这里插入图片描述

文章差不多到这里就结束了,案例还是非常基础的,如果想要数据的来自己实现一遍的,关注然后私信我即可,共勉。

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值