Flink的APIcreateRemoteEnvironment的使用教程

一、createRemoteEnvironment

  1. API 介绍
  /**
 1. Creates a remote execution environment. The remote environment sends (parts of) the program to
 2. a cluster for execution. Note that all file paths used in the program must be accessible from
 3. the cluster. The execution will use the cluster's default parallelism, unless the
 4. parallelism is set explicitly via [[ExecutionEnvironment.setParallelism()]].
 5.  6. @param host The host name or address of the master (JobManager),
 7.             where the program should be executed.
 8. 			hostName与端口在实际使用时不一定是所谓的JobManager
 9. @param port The port of the master (JobManager), where the program should be executed.
 10. @param jarFiles The JAR files with code that needs to be shipped to the cluster. If the
 11.                 program uses
 12.                 user-defined functions, user-defined input formats, or any libraries,
 13.                 those must be
 14.                 provided in the JAR files.
   */
  def createRemoteEnvironment(host: String, port: Int, jarFiles: String*): ExecutionEnvironment = {
    new ExecutionEnvironment(JavaEnv.createRemoteEnvironment(host, port, jarFiles: _*))
  }

2 . 使用

package com.yss.flink

import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.core.fs.FileSystem

/**
 * @ description: ${description}
 *
 * @ author: wangshuai
 *
 * @ create: 2020-04-22 10:07
 **/
object WordCount {
  def main(args: Array[String]): Unit = {
    // get the execution environment
    //        val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
    val env: ExecutionEnvironment = ExecutionEnvironment.createRemoteEnvironment("master", 8082, "D:\\ysstech\\henghe\\flink_on_yarn\\src\\main\\java\\flink_on_yarn-1.0-SNAPSHOT.jar")

    import org.apache.flink.api.scala._

    // get input data by connecting to the socket
    //    val text = env.socketTextStream("localhost", port, '\n')
    val text = env.readTextFile("hdfs://master:8020/LICENSE.txt")
    //     parse the data, group it, window it, and aggregate the counts
//    val windowCounts = text
//      .flatMap { w => w.split("\\s") }
//      .map { w => WordWithCount(w, 1) }
//      //      .timeWindow(Time.seconds(5), Time.seconds(1))
//      .groupBy(0)
//      .sum(1)
//
//    // print the results with a single thread, rather than in parallel
//        windowCounts.writeAsText("hdfs://master:8020/wordcount.txt", FileSystem.WriteMode.OVERWRITE)
//        windowCounts.print()
    text.print()
    //    env.execute("Socket Window WordCount")
  }

  // Data type for words with count
  case class WordWithCount(word: String, count: Long)

}


  1. pom文件
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.yss</groupId>
    <artifactId>flink_on_yarn</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <flink.version>1.10.0</flink.version>
        <hadoop.version>2.8.3</hadoop.version>
        <scala.version>2.11</scala.version>
        <scala.lib.version>2.11.8</scala.lib.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.lib.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-core</artifactId>
            <version>${flink.version}</version>

        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-hadoop-fs</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-scala_2.11</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.11</artifactId>
            <version>${flink.version}</version>

        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.1.3</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-dependencyfile</arg>
                                <arg>${project.build.directory}/.scala_dependencies</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.13</version>
                <configuration>
                    <useFile>false</useFile>
                    <disableXmlReport>true</disableXmlReport>
                    <includes>
                        <include>**/*Test.*</include>
                        <include>**/*Suite.*</include>
                    </includes>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>
  1. 启动flink on yarn session
/bin/yarn-session.sh -n 2 -jm 1024 -tm 1024 -d

启动成功
在这里插入图片描述
登录启动成功后给出的地址可以看到管理界面以及提交运行的程序DAG图、以及程序的状态通过做菜单可以看到相关配置:
在这里插入图片描述

  1. 应该使用的HostName、Port。这一点有个坑。看API的都以为是jobmanager.rpc.address、jobmanager.rpc.port配置项。其实不然在yarn、k8s时会出现rest.address、rest.port覆盖上述接口的责任。即:红色箭头为官方推荐,绿色为实际运行。有关说明请查看官网
    如下说明:

Hostnames / Ports
These options are only necessary for standalone application- or session deployments (simple standalone or Kubernetes).
If you use Flink with Yarn, Mesos, or the active Kubernetes integration, the hostnames and ports are automatically discovered.
rest.address, rest.port: These are used by the client to connect to Flink. Set this to the hostname where the master (JobManager) runs, or to the hostname of the (Kubernetes) service in front of the Flink Master’s REST interface.
The jobmanager.rpc.address (defaults to “localhost”) and jobmanager.rpc.port (defaults to 6123) config entries are used by the TaskManager to connect to the JobManager/ResourceManager. Set this to the hostname where the master (JobManager) runs, or to the hostname of the (Kubernetes internal) service for the Flink master (JobManager). This option is ignored on setups with high-availability where the leader election mechanism is used to discover this automatically.

在这里插入图片描述

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值