Flink1.10从0到1:(三)WordCount

一、开发环境

        语言:Scala_2.11.12

       Apache Flink:1.10

二、开发工具

        官方建议使用 IntelliJ IDEA,因为它默认集成了 Scala和 Maven 环境,使用更加方便,当然使用 Eclipse 也是可以的。开发 Flink 程序时,可以使用 Java、Python 或者 Scala 语言,本人使用 Scala,因为使用 Scala 实现函数式编程会比较简洁。

三、配置依赖

        开发 Flink 应用程序需要最低限度的 API 依赖。最低的依赖库包括:flink-scala 和flink-streaming-scala。大多数应用需要依赖特定的连接器或其他类库,例如 Kafka 的连接器、TableAPI、CEP 库等。这些不是 Flink 核心依赖的一部分,因此必须作为依赖项手动添加到应用程序中。
 
        依赖中有一些spark及其他依赖,可根据自己情况自行选择
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
    <scala.version>2.11</scala.version>
    <spark.version>2.4.5</spark.version>
    <flink.version>1.10.0</flink.version>
    <scala.binary.version>2.11</scala.binary.version>
  </properties>

  <dependencies>
    <!-- Spark-core -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_${scala.version}</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <!-- SparkSQL -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>

    <!-- SparkSQL  on  Hive-->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <!--mysql依赖的jar包-->
    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>5.1.47</version>
    </dependency>
    <!--SparkStreaming-->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <!-- kafka -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka_2.12</artifactId>
      <version>2.1.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka-clients</artifactId>
      <version>2.1.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka-streams</artifactId>
      <version>2.1.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka-streams-scala_2.11</artifactId>
      <version>2.1.0</version>
    </dependency>


    <!-- flink-kafka -->
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-scala_2.11</artifactId>
      <version>${flink.version}</version>
    </dependency>

    <!-- flink -->
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-core</artifactId>
      <version>${flink.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-clients_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-connector-kafka-0.10_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.flink</groupId>
      <artifactId>flink-walkthrough-common_${scala.binary.version}</artifactId>
      <version>${flink.version}</version>
    </dependency>



    <!--连接 Redis 需要的包-->
    <dependency>
      <groupId>redis.clients</groupId>
      <artifactId>jedis</artifactId>
      <version>3.0.1</version>
    </dependency>

    <!-- Scala 包-->
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.11.8</version>
    </dependency>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-compiler</artifactId>
      <version>2.11.8</version>
    </dependency>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-reflect</artifactId>
      <version>2.11.8</version>
    </dependency>


    <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-log4j12</artifactId>
      <version>1.7.7</version>
      <scope>runtime</scope>
    </dependency>
    <dependency>
      <groupId>log4j</groupId>
      <artifactId>log4j</artifactId>
      <version>1.2.17</version>
      <scope>runtime</scope>
    </dependency>
    <dependency>
      <groupId>net.sf.json-lib</groupId>
      <artifactId>json-lib</artifactId>
      <version>2.2.2</version>
      <classifier>jdk15</classifier>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <!-- java编译插件 -->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>2.0.2</version>
        <executions>
          <execution>
            <phase>compile</phase>
            <goals>
              <goal>compile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
      <!-- scala编译插件 -->
      <plugin>
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.3.2</version>
        <executions>
          <execution>
            <id>scala-compile-first</id>
            <phase>process-resources</phase>
            <goals>
              <goal>add-source</goal>
              <goal>compile</goal>
            </goals>
          </execution>
          <execution>
            <id>scala-test-compile</id>
            <phase>process-test-resources</phase>
            <goals>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>

    </plugins>
  </build>

四、案例

案例一:基于Flink1.10的流处理WordCount(实时)

Demo:

import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}

/**
 * 测试教程:
 * 在hadoop04上运行 nc -lk 8888 然后打单词即可
 */
object FlinkStreamWordCount {
  def main(args: Array[String]): Unit = {

    //1、初始化流计算环境
    val streamEnv: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    //设置并行度
    streamEnv.setParallelism(12)
    //2、导入隐式转换
    import org.apache.flink.streaming.api.scala._

    //3、读取数据
    val stream: DataStream[String] = streamEnv.socketTextStream("hadoop003", 8888)

    //4、聚会累加算子
    val result: DataStream[(String,Int)] = stream.flatMap(_.split(" "))
      .map((_, 1))
      .keyBy(0) //分组算子 0或者1 代表下标,前面DateStream[String,Int]的下标 0代表单词 1代表出现的次数
      .sum(1) //聚合算子
    //5、打印结果
    result.print("结果")//结果后面的数子表示线程编号
    //6、启动流计算程序
    streamEnv.execute("WordCount")
  }
}

运行结果:

注:

1、服务器要安装软件:yum install nc

2、结果后面的数子为线程编号。

        例如:结果:10> (apache,1) 其中10为线程编号。

案例二:基于Flink1.10的批处理WordCount(离线)

Demo:

import org.apache.flink.api.scala._
import org.apache.flink.api.scala.ExecutionEnvironment

object FlinkBatchWordCount {
  def main(args: Array[String]): Unit = {
    //初始化flink批处理环境
    val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment

    //使用相对路径获得完整的文件路径
    val datePath: URL= getClass.getResource("/WordCount.txt")

    //flink的DateSet可以类比spark的RDD
    val date: DataSet[String] = env.readTextFile(datePath.getPath)

    date.flatMap(_.split(" "))
      .map((_,1))
      .groupBy(0)
      .sum(1)
      .print()

  }
}

结果:

 

感悟:

        劳动一日,可得一夜的安眠;勤劳一生,可得幸福的长眠。

 

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值