Flink的APIcreateRemoteEnvironment的使用教程

最新推荐文章于 2023-05-18 20:50:20 发布

聆听金生

最新推荐文章于 2023-05-18 20:50:20 发布

阅读量4.2k

点赞数 1

分类专栏： flink 文章标签： flink

本文链接：https://blog.csdn.net/weixin_41609807/article/details/105682416

版权

flink 专栏收录该内容

7 篇文章 4 订阅

订阅专栏

一、createRemoteEnvironment

API 介绍

  /**
 1. Creates a remote execution environment. The remote environment sends (parts of) the program to
 2. a cluster for execution. Note that all file paths used in the program must be accessible from
 3. the cluster. The execution will use the cluster's default parallelism, unless the
 4. parallelism is set explicitly via [[ExecutionEnvironment.setParallelism()]].
 5.  6. @param host The host name or address of the master (JobManager),
 7.             where the program should be executed.
 8. 			hostName与端口在实际使用时不一定是所谓的JobManager
 9. @param port The port of the master (JobManager), where the program should be executed.
 10. @param jarFiles The JAR files with code that needs to be shipped to the cluster. If the
 11.                 program uses
 12.                 user-defined functions, user-defined input formats, or any libraries,
 13.                 those must be
 14.                 provided in the JAR files.
   */
  def createRemoteEnvironment(host: String, port: Int, jarFiles: String*): ExecutionEnvironment = {
    new ExecutionEnvironment(JavaEnv.createRemoteEnvironment(host, port, jarFiles: _*))
  }

2 . 使用

package com.yss.flink

import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.core.fs.FileSystem

/**
 * @ description: ${description}
 *
 * @ author: wangshuai
 *
 * @ create: 2020-04-22 10:07
 **/
object WordCount {
  def main(args: Array[String]): Unit = {
    // get the execution environment
    //        val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
    val env: ExecutionEnvironment = ExecutionEnvironment.createRemoteEnvironment("master", 8082, "D:\\ysstech\\henghe\\flink_on_yarn\\src\\main\\java\\flink_on_yarn-1.0-SNAPSHOT.jar")

    import org.apache.flink.api.scala._

    // get input data by connecting to the socket
    //    val text = env.socketTextStream("localhost", port, '\n')
    val text = env.readTextFile("hdfs://master:8020/LICENSE.txt")
    //     parse the data, group it, window it, and aggregate the counts
//    val windowCounts = text
//      .flatMap { w => w.split("\\s") }
//      .map { w => WordWithCount(w, 1) }
//      //      .timeWindow(Time.seconds(5), Time.seconds(1))
//      .groupBy(0)
//      .sum(1)
//
//    // print the results with a single thread, rather than in parallel
//        windowCounts.writeAsText("hdfs://master:8020/wordcount.txt", FileSystem.WriteMode.OVERWRITE)
//        windowCounts.print()
    text.print()
    //    env.execute("Socket Window WordCount")
  }

  // Data type for words with count
  case class WordWithCount(word: String, count: Long)

}

pom文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.yss</groupId>
    <artifactId>flink_on_yarn</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <flink.version>1.10.0</flink.version>
        <hadoop.version>2.8.3</hadoop.version>
        <scala.version>2.11</scala.version>
        <scala.lib.version>2.11.8</scala.lib.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.lib.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-core</artifactId>
            <version>${flink.version}</version>

        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-hadoop-fs</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-scala_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.11</artifactId>
            <version>${flink.version}</version>

        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.1.3</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-dependencyfile</arg>
                                <arg>${project.build.directory}/.scala_dependencies</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.13</version>
                <configuration>
                    <useFile>false</useFile>
                    <disableXmlReport>true</disableXmlReport>
                    <includes>
                        <include>**/*Test.*</include>
                        <include>**/*Suite.*</include>
                    </includes>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

启动flink on yarn session

/bin/yarn-session.sh -n 2 -jm 1024 -tm 1024 -d

启动成功
在这里插入图片描述
登录启动成功后给出的地址可以看到管理界面以及提交运行的程序DAG图、以及程序的状态通过做菜单可以看到相关配置：

应该使用的HostName、Port。这一点有个坑。看API的都以为是jobmanager.rpc.address、jobmanager.rpc.port配置项。其实不然在yarn、k8s时会出现rest.address、rest.port覆盖上述接口的责任。即：红色箭头为官方推荐，绿色为实际运行。有关说明请查看官网
如下说明：

Hostnames / Ports
These options are only necessary for standalone application- or session deployments (simple standalone or Kubernetes).
If you use Flink with Yarn, Mesos, or the active Kubernetes integration, the hostnames and ports are automatically discovered.
rest.address, rest.port: These are used by the client to connect to Flink. Set this to the hostname where the master (JobManager) runs, or to the hostname of the (Kubernetes) service in front of the Flink Master’s REST interface.
The jobmanager.rpc.address (defaults to “localhost”) and jobmanager.rpc.port (defaults to 6123) config entries are used by the TaskManager to connect to the JobManager/ResourceManager. Set this to the hostname where the master (JobManager) runs, or to the hostname of the (Kubernetes internal) service for the Flink master (JobManager). This option is ignored on setups with high-availability where the leader election mechanism is used to discover this automatically.

在这里插入图片描述