当前数据平台的作业提交过程,一直是一个两阶段的提交过程,任务状态的汇报,两阶段通信,造成了任务队列多提交,状态更新过慢,状态不一致等问题。从flink1.11开始,flinkcli改进了flink run 的启动模式,新增了run-application模式。所以,我们就从flink1.11的源码探索一下flinkcli的启动流程,和run-application的启动模式,看有什么新东西,可以优化数据平台的作业提交流程。
当我们编写了一个flink作业提交后,通过jps命令,会发现,后台有一个org.apache.flink.client.cli.CliFrontend 的进程一直在运行,直到我们的作业正确的运行在集群中,这个进程会消失。这本质上其实就是我们的flink安装目录里会带有一个flink作业的提交程序,而且是通过java的main方法运行的。所以我们开始看一下这个org.apache.flink.client.cli.CliFrontend类的main方法,到底有哪些玄机。
//是否开启info级别的日志打印,如果开启,则从环境变量中获取环境信息, // 1. Logs information about the environment, like code revision, current user, Java version,and JVM parameters. EnvironmentInformation.logEnvironmentInfo(LOG, "Command Line Client", args); // 2. find the configuration directory final String configurationDirectory = getConfigurationDirectoryFromEnv(); // 3. load the global configuration //加载flink-conf.yaml中的全局配置转成Configuration对象 final Configuration configuration = GlobalConfiguration.loadConfiguration(configurationDirectory); // 4. load the custom command lines //加载所有的命令行模式 //先加载flinkYarnSessionCLI = "org.apache.flink.yarn.cli.FlinkYarnSessionCli"; //如果环境中没有则加载errorYarnSessionCLI = "org.apache.flink.yarn.cli.FallbackYarnSessionCli" //无论加载哪一种CLI,最后将创建默认的CLI=> new DefaultCLI(configuration) //最后必须添加DefaultCLI,因为getActiveCustomCommandLine(..)将按顺序获取活跃的CustomCommandLine, //并且DefaultCLI isActive始终返回true。 final List customCommandLines = loadCustomCommandLines( configuration, configurationDirectory); try { //5. 创建当前类的对象,在创建对象过程中,初始化文件系统,同时为每个customCommandLines的对象添加 //org.apache.commons.cli.Options对象 final CliFrontend cli = new CliFrontend( configuration, customCommandLines); //6. 通过配置文件,加载安全相关配置信息 SecurityUtils.install(new SecurityConfiguration(cli.configuration)); //7. 执行用户端程序 int retCode = SecurityUtils.getInstalledContext() .runSecured(() -> cli.parseParameters(args)); System.exit(retCode); } catch (Throwable t) { final Throwable strippedThrowable = ExceptionUtils.stripException(t, UndeclaredThrowableException.class); LOG.error("Fatal error while running command line interface.", strippedThrowable); strippedThrowable.printStackTrace(); System.exit(31); }
逐步分析:
一、日志打印:
/*** Logs information about the environment, like code revision, current user, Java version,* and JVM parameters.** @param log The logger to log the information to. * @param componentName The component name to mention in the log. * @param commandLineArgs The arguments accompanying the starting the component. */public static void logEnvironmentInfo(Logger log, String componentName, String[] commandLineArgs) { //判断info级别是否开启 if (log.isInfoEnabled()) { //1. 获取代码git的最终提交id和日期 // public static RevisionInformation getRevisionInformation() { //return new RevisionInformation(getGitCommitIdAbbrev(), // getGitCommitTimeString());} RevisionInformation rev = getRevisionInformation(); //获取flink的version String version = getVersion(); //获取scala版本信息 String scalaVersion = getScalaVersion(); //JVM版本,利用JavaSDK自带的ManagementFactory类来获取。 String jvmVersion = getJvmVersion(); //JVM的启动参数,也是通过JavaSDK自带的ManagementFactory类来获取 String[] options = getJvmStartupOptionsArray(); //从环境变量中获取JAVA_HOME String javaHome = System.getenv("JAVA_HOME"); //获取flink预配置日志 String inheritedLogs = System.getenv("FLINK_INHERITED_LOGS"); //JVM的最大堆内存大小,单位Mb long maxHeapMegabytes = getMaxJvmHeapMemory() >>> 20; if (inheritedLogs != null) { log.info("--------------------------------------------------------------------------------"); log.info(" Preconfiguration: "); log.info(inheritedLogs); } log.info("--------------------------------------------------------------------------------"); log.info(" Starting " + componentName + " (Version: " + version + ", Scala: " + scalaVersion + ", " + "Rev:" + rev.commitId + ", " + "Date:" + rev.commitDate + ")"); log.info(" OS current user: " + System.getProperty("user.name")); log.info(" Current Hadoop/Kerberos user: " + getHadoopUser()); log.info(" JVM: " + jvmVersion); log.info(" Maximum heap size: " + maxHeapMegabytes + " MiBytes"); log.info(" JAVA_HOME: " + (javaHome == null ? "(not set)" : javaHome)); String hadoopVersionString = getHadoopVersionString(); //打印hadoop的版本信息 if (hadoopVersionString != null) { log.info(" Hadoop version: " + hadoopVersionString); } else { log.info(" No Hadoop Dependency available"); } if (options.length == 0) { log.info(" JVM Options: (none)"); } else { log.info(" JVM Options:"); for (String s: options) { log.info(" " + s); } } if (commandLineArgs == null || commandLineArgs.length == 0) { log.info(" Program Arguments: (none)"); } else { log.info(" Program Arguments:"); for (String s: commandLineArgs) { log.info(" " + s); } } log.info(" Classpath: " + System.getProperty("java.class.path")); log.info("--------------------------------------------------------------------------------"); }}
二:检查配置文件目录
public static String getConfigurationDirectoryFromEnv() { //从环境变量中获取flink的conf文件 String location = System.getenv(ConfigConstants.ENV_FLINK_CONF_DIR); if (location != null) { //检查是否存在,如果没有抛出异常 if (new File(location).exists()) { return location; } else { throw new RuntimeException("The configuration directory '" + location + "', specified in the '" + ConfigConstants.ENV_FLINK_CONF_DIR + "' environment variable, does not exist."); } } //如果为空,则检查上级目录的conf是否存在,存在则返回 else if (new File(CONFIG_DIRECTORY_FALLBACK_1).exists()) { location = CONFIG_DIRECTORY_FALLBACK_1; } //如果为空,则检查当前目录的conf是否存在,存在则返回 else if (new File(CONFIG_DIRECTORY_FALLBACK_2).exists()) { location = CONFIG_DIRECTORY_FALLBACK_2; } else { throw new RuntimeException("The configuration directory was not specified. " + "Please specify the directory containing the configuration file through the '" + ConfigConstants.ENV_FLINK_CONF_DIR + "' environment variable."); } return location;}
三:加载配置
//调用加载配置项,动态参数默认为空public static Configuration loadConfiguration(final String configDir) { return loadConfiguration(configDir, null);}/*** Loads the configuration files from the specified directory. If the dynamic properties* configuration is not null, then it is added to the loaded configuration.** @param configDir directory to load the configuration from * @param dynamicProperties configuration file containing the dynamic properties. Null if none. * @return The configuration loaded from the given configuration directory */public static Configuration loadConfiguration(final String configDir, @Nullable final Configuration dynamicProperties) { //先判断配置文件是否传递 if (configDir == null) { throw new IllegalArgumentException("Given configuration directory is null, cannot load configuration"); } //检查配置目录是否存在 final File confDirFile = new File(configDir); if (!(confDirFile.exists())) { throw new IllegalConfigurationException( "The given configuration directory name '" + configDir + "' (" + confDirFile.getAbsolutePath() + ") does not describe an existing directory."); } // get Flink yaml configuration file //再检查link的配置文件flink-conf.yaml是否存在 final File yamlConfigFile = new File(confDirFile, FLINK_CONF_FILENAME); if (!yamlConfigFile.exists()) { throw new IllegalConfigurationException( "The Flink config file '" + yamlConfigFile + "' (" + confDirFile.getAbsolutePath() + ") does not exist."); } //将配置文件转换为配置对象 Configuration configuration = loadYAMLResource(yamlConfigFile); //判断是否有动态传递的配置 if (dynamicProperties != null) { configuration.addAll(dynamicProperties); } return configuration;}
四:封装用户输入的命令行参数
public static ListloadCustomCommandLines(Configuration configuration, String configurationDirectory) { List customCommandLines = new ArrayList<>(); customCommandLines.add(new GenericCLI(configuration, configurationDirectory)); // Command line interface of the YARN session, with a special initialization here // to prefix all options with y/yarn. final String flinkYarnSessionCLI = "org.apache.flink.yarn.cli.FlinkYarnSessionCli"; try { //通过反射将flinkYarnSessionCLI 加载进内存,如果当前环境中没有加载则抛出异常 //yarn的会话选项参数以y或者yarn为前缀 customCommandLines.add( loadCustomCommandLine(flinkYarnSessionCLI, configuration, configurationDirectory, "y", "yarn")); } catch (NoClassDefFoundError | Exception e) { //加载errorYarnSessionCLI的实例 final String errorYarnSessionCLI = "org.apache.flink.yarn.cli.FallbackYarnSessionCli"; try { LOG.info("Loading FallbackYarnSessionCli"); customCommandLines.add( loadCustomCommandLine(errorYarnSessionCLI, configuration)); } catch (Exception exception) { //如果没有则继续 LOG.warn("Could not load CLI class {}.", flinkYarnSessionCLI, e); } } // Tips: DefaultCLI must be added at last, because getActiveCustomCommandLine(..) will get the // active CustomCommandLine in order and DefaultCLI isActive always return true. //将默认的cli加入到用户命令行中,一定最后加入。因为默认的CLI是活跃的 customCommandLines.add(new DefaultCLI(configuration)); return customCommandLines; } private static CustomCommandLine loadCustomCommandLine(String className, Object... params) throws Exception { //将类名实例化,该实例需要实现了org.apache.flink.client.cli.CustomCommandLine Class extends CustomCommandLine> customCliClass = Class.forName(className).asSubclass(CustomCommandLine.class); // construct class types from the parameters // 获取所有参数的类型 Class>[] types = new Class>[params.length]; for (int i = 0; i < params.length; i++) { checkNotNull(params[i], "Parameters for custom command-lines may not be null."); types[i] = params[i].getClass(); } //通过参数类型获取构造器实例 Constructor extends CustomCommandLine> constructor = customCliClass.getConstructor(types); //实例化对象,此处调用cli实现类的构造方法。 return constructor.newInstance(params); }
org.apache.flink.yarn.cli.FlinkYarnSessionCli类构造方法:
//反射调用当前构造器实例化 public FlinkYarnSessionCli( Configuration configuration, String configurationDirectory, String shortPrefix, String longPrefix) throws FlinkException { this(configuration, new DefaultClusterClientServiceLoader(), configurationDirectory, shortPrefix, longPrefix, true); } //调用最终构造器 public FlinkYarnSessionCli( Configuration configuration, ClusterClientServiceLoader clusterClientServiceLoader, String configurationDirectory, String shortPrefix, String longPrefix, //交互式的参数输入 boolean acceptInteractiveInput) throws FlinkException { super(configuration); this.clusterClientServiceLoader = checkNotNull(clusterClientServiceLoader); this.configurationDirectory = checkNotNull(configurationDirectory); this.acceptInteractiveInput = acceptInteractiveInput; // Create the command line options //支持的参数选项 query = new Option(shortPrefix + "q", longPrefix + "query", false, "Display available YARN resources (memory, cores)"); applicationId = new Option(shortPrefix + "id", longPrefix + "applicationId", true, "Attach to running YARN session"); queue = new Option(shortPrefix + "qu", longPrefix + "queue", true, "Specify YARN queue."); shipPath = new Option(shortPrefix + "t", longPrefix + "ship", true, "Ship files in the specified directory (t for transfer)"); flinkJar = new Option(shortPrefix + "j", longPrefix + "jar", true, "Path to Flink jar file"); jmMemory = new Option(shortPrefix + "jm", longPrefix + "jobManagerMemory", true, "Memory for JobManager Container with optional unit (default: MB)"); tmMemory = new Option(shortPrefix + "tm", longPrefix + "taskManagerMemory", true, "Memory per TaskManager Container with optional unit (default: MB)"); slots = new Option(shortPrefix + "s", longPrefix + "slots", true, "Number of slots per TaskManager"); dynamicproperties = Option.builder(shortPrefix + "D") .argName("property=value") .numberOfArgs(2) .valueSeparator() .desc("use value for given property") .build(); name = new Option(shortPrefix + "nm", longPrefix + "name", true, "Set a custom name for the application on YARN"); applicationType = new Option(shortPrefix + "at", longPrefix + "applicationType", true, "Set a custom application type for the application on YARN"); zookeeperNamespace = new Option(shortPrefix + "z", longPrefix + "zookeeperNamespace", true, "Namespace to create the Zookeeper sub-paths for high availability mode"); nodeLabel = new Option(shortPrefix + "nl", longPrefix + "nodeLabel", true, "Specify YARN node label for the YARN application"); help = new Option(shortPrefix + "h", longPrefix + "help", false, "Help for the Yarn session CLI."); allOptions = new Options(); allOptions.addOption(flinkJar); allOptions.addOption(jmMemory); allOptions.addOption(tmMemory); allOptions.addOption(queue); allOptions.addOption(query); allOptions.addOption(shipPath); allOptions.addOption(slots); allOptions.addOption(dynamicproperties); allOptions.addOption(DETACHED_OPTION); allOptions.addOption(YARN_DETACHED_OPTION); allOptions.addOption(name); allOptions.addOption(applicationId); allOptions.addOption(applicationType); allOptions.addOption(zookeeperNamespace); allOptions.addOption(nodeLabel); allOptions.addOption(help); // try loading a potential yarn properties file //尝试加载yarn的配置文件 this.yarnPropertiesFileLocation = configuration.getString(YarnConfigOptions.PROPERTIES_FILE_LOCATION); final File yarnPropertiesLocation = getYarnPropertiesLocation(yarnPropertiesFileLocation); yarnPropertiesFile = new Properties(); if (yarnPropertiesLocation.exists()) { LOG.info("Found Yarn properties file under {}.", yarnPropertiesLocation.getAbsolutePath()); try (InputStream is = new FileInputStream(yarnPropertiesLocation)) { yarnPropertiesFile.load(is); } catch (IOException ioe) { throw new FlinkException("Could not read the Yarn properties file " + yarnPropertiesLocation + ". Please delete the file at " + yarnPropertiesLocation.getAbsolutePath() + '.', ioe); } //从yarn的配置项中获取applicationId的字符串 final String yarnApplicationIdString = yarnPropertiesFile.getProperty(YARN_APPLICATION_ID_KEY); if (yarnApplicationIdString == null) { throw new FlinkException("Yarn properties file found but doesn't contain a " + "Yarn application id. Please delete the file at " + yarnPropertiesLocation.getAbsolutePath()); } try { // try converting id to ApplicationId // 将字符串转换为Application对象 yarnApplicationIdFromYarnProperties = ConverterUtils.toApplicationId(yarnApplicationIdString); } catch (Exception e) { throw new FlinkException("YARN properties contain an invalid entry for " + "application id: " + yarnApplicationIdString + ". Please delete the file at " + yarnPropertiesLocation.getAbsolutePath(), e); } } else { yarnApplicationIdFromYarnProperties = null; } }
五:创建CliFrontend对象,即当前类对象
public CliFrontend( Configuration configuration, ClusterClientServiceLoader clusterClientServiceLoader, List customCommandLines) { this.configuration = checkNotNull(configuration); this.customCommandLines = checkNotNull(customCommandLines); this.clusterClientServiceLoader = checkNotNull(clusterClientServiceLoader); //初始化文件系统 FileSystem.initialize(configuration, PluginUtils.createPluginManagerFromRootFolder(configuration)); this.customCommandLineOptions = new Options(); //添加参数选项 for (CustomCommandLine customCommandLine : customCommandLines) { //添加到一般配置 customCommandLine.addGeneralOptions(customCommandLineOptions); //添加到运行时参数 customCommandLine.addRunOptions(customCommandLineOptions); } //设置客户端超时时间 this.clientTimeout = configuration.get(ClientOptions.CLIENT_TIMEOUT); //设置默认的并行度 this.defaultParallelism = configuration.getInteger(CoreOptions.DEFAULT_PARALLELISM); }
六:加载安全配置项
SecurityUtils.install(new SecurityConfiguration(cli.configuration));
七:根据命令行参数运行客户端代码
//调用匿名方法,并返回状态码 int retCode = SecurityUtils.getInstalledContext() //调用当前类的parseParameters方法 .runSecured(() -> cli.parseParameters(args));//匿名方法 /** * Parses the command line arguments and starts the requested action. * * @param args command line arguments of the client. * @return The return code of the program * ACTION_RUN = "run"; * ACTION_RUN_APPLICATION = "run-application"; * ACTION_INFO = "info"; * ACTION_LIST = "list"; * ACTION_CANCEL = "cancel"; * ACTION_STOP = "stop"; * ACTION_SAVEPOINT = "savepoint"; */ public int parseParameters(String[] args) { // check for action if (args.length < 1) { CliFrontendParser.printHelp(customCommandLines); System.out.println("Please specify an action."); return 1; } // get action 获取动作 // flink run -m yarn-cluster -ynm datasync-item-itemcommon -yqu common -ys 2 -ytm 2048 -d -c com.wwdz.bigdata.flink.streaming.ItemDataSyncJob /home/flink/submitjar/trace-etl/datasync/master/20210114/datasync-0.1.jar String action = args[0]; // remove action from parameters //保留参数,移除动作 final String[] params = Arrays.copyOfRange(args, 1, args.length); try { // do action switch (action) { case ACTION_RUN: //启动一般的运行方法,并提交客户端代码 run(params); return 0; case ACTION_RUN_APPLICATION: //启动application模式,并提交客户端代码 runApplication(params); return 0; //获取运行列表 case ACTION_LIST: list(params); return 0; case ACTION_INFO: info(params); return 0; // case ACTION_CANCEL: cancel(params); return 0; case ACTION_STOP: stop(params); return 0; case ACTION_SAVEPOINT: savepoint(params); return 0; case "-h": case "--help": CliFrontendParser.printHelp(customCommandLines); return 0; case "-v": case "--version": String version = EnvironmentInformation.getVersion(); String commitID = EnvironmentInformation.getRevisionInformation().commitId; System.out.print("Version: " + version); System.out.println(commitID.equals(EnvironmentInformation.UNKNOWN) ? "" : ", Commit ID: " + commitID); return 0; default: System.out.printf("\"%s\" is not a valid action.\n", action); System.out.println(); System.out.println("Valid actions are \"run\", \"list\", \"info\", \"savepoint\", \"stop\", or \"cancel\"."); System.out.println(); System.out.println("Specify the version option (-v or --version) to print Flink version."); System.out.println(); System.out.println("Specify the help option (-h or --help) to get help on the command."); return 1; } } catch (CliArgsException ce) { return handleArgException(ce); } catch (ProgramParametrizationException ppe) { return handleParametrizationException(ppe); } catch (ProgramMissingJobException pmje) { return handleMissingJobException(); } catch (Exception e) { return handleError(e); } }
运行方法
/** * Executions the run action. * * @param args Command line arguments for the run action. */ protected void run(String[] args) throws Exception { LOG.info("Running 'run' command."); //获取命令行参数选项 final Options commandOptions = CliFrontendParser.getRunCommandOptions(); //合并用户参数到命令行运行参数中,并封装到命令行对象中 final CommandLine commandLine = getCommandLine(commandOptions, args, true); // evaluate help flag //如果是help命令则返回 if (commandLine.hasOption(HELP_OPTION.getOpt())) { CliFrontendParser.printHelpForRun(customCommandLines); return; } //检查并返回要使用的用户自定义的CustomCommandLine,此时调用的方法为当前类的验证和获取使用的CommandLine方法: // /** // * Gets the custom command-line for the arguments. // * @param commandLine The input to the command-line. // * @return custom command-line which is active (may only be one at a time) // */ // public CustomCommandLine validateAndGetActiveCommandLine(CommandLine commandLine) { // LOG.debug("Custom commandlines: {}", customCommandLines); // //customCommandLines为main 初始化的只有两个元素的CustomCommandLine对象集合 // for (CustomCommandLine cli : customCommandLines) { // LOG.debug("Checking custom commandline {}, isActive: {}", cli, cli.isActive(commandLine)); // if (cli.isActive(commandLine)) { // return cli; // } // } // throw new IllegalStateException("No valid command-line found."); // } final CustomCommandLine activeCommandLine = validateAndGetActiveCommandLine(checkNotNull(commandLine)); //解析提交参数 // public static ProgramOptions create(CommandLine line) throws CliArgsException { // if (isPythonEntryPoint(line) || containsPythonDependencyOptions(line)) { // //当提交端环境是pyflink时,创建py的提交参数 // return createPythonProgramOptions(line); // } else { // //创建java执行参数 // return new ProgramOptions(line); // } // } final ProgramOptions programOptions = ProgramOptions.create(commandLine); //获取运行程序入口 final PackagedProgram program = getPackagedProgram(programOptions); //获取所有依赖的lib的url final List jobJars = program.getJobJarAndDependencies(); //获取有效的参数,并封装到configuration final Configuration effectiveConfiguration = getEffectiveConfiguration( activeCommandLine, commandLine, programOptions, jobJars); LOG.debug("Effective executor configuration: {}", effectiveConfiguration); try { executeProgram(effectiveConfiguration, program); } finally { //清理加载进临时目录的lib program.deleteExtractedLibraries(); } }
1.1 new programOptions(line)方法
protected ProgramOptions(CommandLine line) throws CliArgsException { super(line); //获取程序入口主类 this.entryPointClass = line.hasOption(CLASS_OPTION.getOpt()) ? line.getOptionValue(CLASS_OPTION.getOpt()) : null; //获取jar位置 this.jarFilePath = line.hasOption(JAR_OPTION.getOpt()) ? line.getOptionValue(JAR_OPTION.getOpt()) : null; //数组结构,抽取用户提交的参数 this.programArgs = extractProgramArgs(line); //获取jar文件存在的classpath List classpaths = new ArrayList(); if (line.hasOption(CLASSPATH_OPTION.getOpt())) { for (String path : line.getOptionValues(CLASSPATH_OPTION.getOpt())) { try { classpaths.add(new URL(path)); } catch (MalformedURLException e) { throw new CliArgsException("Bad syntax for classpath: " + path); } } } this.classpaths = classpaths; //获取并行度 if (line.hasOption(PARALLELISM_OPTION.getOpt())) { String parString = line.getOptionValue(PARALLELISM_OPTION.getOpt()); try { parallelism = Integer.parseInt(parString); if (parallelism <= 0) { throw new NumberFormatException(); } } catch (NumberFormatException e) { throw new CliArgsException("The parallelism must be a positive number: " + parString); } } else { //如果没有时,则并行度为-1。此时代表为默认并行度。 parallelism = ExecutionConfig.PARALLELISM_DEFAULT; } //获取运行模式,如果加了-d 或者-yd则后台运行模式 detachedMode = line.hasOption(DETACHED_OPTION.getOpt()) || line.hasOption(YARN_DETACHED_OPTION.getOpt()); //如果加了sae,则为前台运行模式,比如用户使用control+c时shutdown shutdownOnAttachedExit = line.hasOption(SHUTDOWN_IF_ATTACHED_OPTION.getOpt()); //获取savepoint设置,当用户指定了-n时,则忽略无法映射的状态,参考代码见下文 this.savepointSettings = CliFrontendParser.createSavepointRestoreSettings(line); }
1.2 常见运行参数:
org.apache.flink.client.cli.CliFrontendParser
static final Option HELP_OPTION = new Option("h", "help", false, "Show the help message for the CLI Frontend or the action."); static final Option JAR_OPTION = new Option("j", "jarfile", true, "Flink program JAR file."); static final Option CLASS_OPTION = new Option("c", "class", true, "Class with the program entry point (\"main()\" method). Only needed if the " + "JAR file does not specify the class in its manifest."); static final Option CLASSPATH_OPTION = new Option("C", "classpath", true, "Adds a URL to each user code " + "classloader on all nodes in the cluster. The paths must specify a protocol (e.g. file://) and be " + "accessible on all nodes (e.g. by means of a NFS share). You can use this option multiple " + "times for specifying more than one URL. The protocol must be supported by the " + "{@link java.net.URLClassLoader}."); public static final Option PARALLELISM_OPTION = new Option("p", "parallelism", true, "The parallelism with which to run the program. Optional flag to override the default value " + "specified in the configuration."); /** * @deprecated This has no effect anymore, we're keeping it to not break existing bash scripts. */ @Deprecated static final Option LOGGING_OPTION = new Option("q", "sysoutLogging", false, "If present, " + "suppress logging output to standard out."); public static final Option DETACHED_OPTION = new Option("d", "detached", false, "If present, runs " + "the job in detached mode"); public static final Option SHUTDOWN_IF_ATTACHED_OPTION = new Option( "sae", "shutdownOnAttachedExit", false, "If the job is submitted in attached mode, perform a best-effort cluster shutdown " + "when the CLI is terminated abruptly, e.g., in response to a user interrupt, such as typing Ctrl + C."); /** * @deprecated use non-prefixed variant {@link #DETACHED_OPTION} for both YARN and non-YARN deployments */ @Deprecated public static final Option YARN_DETACHED_OPTION = new Option("yd", "yarndetached", false, "If present, runs " + "the job in detached mode (deprecated; use non-YARN specific option instead)"); public static final Option ARGS_OPTION = new Option("a", "arguments", true, "Program arguments. Arguments can also be added without -a, simply as trailing parameters."); public static final Option ADDRESS_OPTION = new Option("m", "jobmanager", true, "Address of the JobManager to which to connect. " + "Use this flag to connect to a different JobManager than the one specified in the configuration."); public static final Option SAVEPOINT_PATH_OPTION = new Option("s", "fromSavepoint", true, "Path to a savepoint to restore the job from (for example hdfs:///flink/savepoint-1537)."); public static final Option SAVEPOINT_ALLOW_NON_RESTORED_OPTION = new Option("n", "allowNonRestoredState", false, "Allow to skip savepoint state that cannot be restored. " + "You need to allow this if you removed an operator from your " + "program that was part of the program when the savepoint was triggered."); static final Option SAVEPOINT_DISPOSE_OPTION = new Option("d", "dispose", true, "Path of savepoint to dispose."); // list specific options static final Option RUNNING_OPTION = new Option("r", "running", false, "Show only running programs and their JobIDs"); static final Option SCHEDULED_OPTION = new Option("s", "scheduled", false, "Show only scheduled programs and their JobIDs"); static final Option ALL_OPTION = new Option("a", "all", false, "Show all programs and their JobIDs"); static final Option ZOOKEEPER_NAMESPACE_OPTION = new Option("z", "zookeeperNamespace", true, "Namespace to create the Zookeeper sub-paths for high availability mode"); ..................... 篇幅限制不在赘述
1.3 获取运行程序入口
//调用链 private PackagedProgram getPackagedProgram(ProgramOptions programOptions) throws ProgramInvocationException, CliArgsException { PackagedProgram program; try { LOG.info("Building program from JAR file"); program = buildProgram(programOptions); } catch (FileNotFoundException e) { throw new CliArgsException("Could not build the program from JAR file: " + e.getMessage(), e); } return program; } //调用此方法 private PackagedProgram( @Nullable File jarFile, List classpaths, @Nullable String entryPointClassName, Configuration configuration, SavepointRestoreSettings savepointRestoreSettings, String... args) throws ProgramInvocationException { this.classpaths = checkNotNull(classpaths); this.savepointSettings = checkNotNull(savepointRestoreSettings); this.args = checkNotNull(args); //检查两者其一不为null checkArgument(jarFile != null || entryPointClassName != null, "Either the jarFile or the entryPointClassName needs to be non-null."); // whether the job is a Python job. this.isPython = isPython(entryPointClassName); // load the jar file if exists //如果存在则将jarFile赋值给当前jarFile对象 this.jarFile = loadJarFile(jarFile); assert this.jarFile != null || entryPointClassName != null; // now that we have an entry point, we can extract the nested jar files (if any) //将所有运行时lib下的包抽取到临时目录 this.extractedTempLibraries = this.jarFile == null ? Collections.emptyList() : extractContainedLibraries(this.jarFile); //封装类加载器 this.userCodeClassLoader = ClientUtils.buildUserCodeClassLoader( getJobJarAndDependencies(), classpaths, getClass().getClassLoader(), configuration); // load the entry point class //找到用户代码主类 this.mainClass = loadMainClass( // if no entryPointClassName name was given, we try and look one up through the manifest entryPointClassName != null ? entryPointClassName : getEntryPointClassNameFromJar(this.jarFile), userCodeClassLoader); //如果没有main方法入口则抛出异常 if (!hasMainMethod(mainClass)) { throw new ProgramInvocationException("The given program class does not have a main(String[]) method."); } }
1.4 运行前参数准备阶段
private Configuration getEffectiveConfiguration( final CustomCommandLine activeCustomCommandLine, final CommandLine commandLine, final ProgramOptions programOptions, final List jobJars) throws FlinkException { //检查 final ExecutionConfigAccessor executionParameters = ExecutionConfigAccessor.fromProgramOptions( checkNotNull(programOptions), checkNotNull(jobJars)); //检查,并且将CMD的参数初始化到Configuration中,接下来以org.apache.flink.yarn.cli.FlinkYarnSessionCli#applyCommandLineOptionsToConfiguration为例 final Configuration executorConfig = checkNotNull(activeCustomCommandLine) .applyCommandLineOptionsToConfiguration(commandLine); final Configuration effectiveConfiguration = new Configuration(executorConfig); executionParameters.applyToConfiguration(effectiveConfiguration); LOG.debug("Effective executor configuration: {}", effectiveConfiguration); return effectiveConfiguration; } @Override public Configuration applyCommandLineOptionsToConfiguration(CommandLine commandLine) throws FlinkException { // we ignore the addressOption because it can only contain "yarn-cluster" final Configuration effectiveConfiguration = new Configuration(configuration); applyDescriptorOptionToConfig(commandLine, effectiveConfiguration); //获取applicationId对象 final ApplicationId applicationId = getApplicationId(commandLine); if (applicationId != null) { final String zooKeeperNamespace; if (commandLine.hasOption(zookeeperNamespace.getOpt())){ //获取zk命名空间 zooKeeperNamespace = commandLine.getOptionValue(zookeeperNamespace.getOpt()); } else { zooKeeperNamespace = effectiveConfiguration.getString(HA_CLUSTER_ID, applicationId.toString()); } //获取集群的id effectiveConfiguration.setString(HA_CLUSTER_ID, zooKeeperNamespace); //获取yarn applicationId effectiveConfiguration.setString(YarnConfigOptions.APPLICATION_ID, ConverterUtils.toString(applicationId)); //获取部署模式 effectiveConfiguration.setString(DeploymentOptions.TARGET, YarnSessionClusterExecutor.NAME); } else { effectiveConfiguration.setString(DeploymentOptions.TARGET, YarnJobClusterExecutor.NAME); } //获取jm分配内存 if (commandLine.hasOption(jmMemory.getOpt())) { String jmMemoryVal = commandLine.getOptionValue(jmMemory.getOpt()); if (!MemorySize.MemoryUnit.hasUnit(jmMemoryVal)) { jmMemoryVal += "m"; } effectiveConfiguration.set(JobManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.parse(jmMemoryVal)); } //获取tm分配内存 if (commandLine.hasOption(tmMemory.getOpt())) { String tmMemoryVal = commandLine.getOptionValue(tmMemory.getOpt()); if (!MemorySize.MemoryUnit.hasUnit(tmMemoryVal)) { tmMemoryVal += "m"; } effectiveConfiguration.set(TaskManagerOptions.TOTAL_PROCESS_MEMORY, MemorySize.parse(tmMemoryVal)); } //获取slots个数 if (commandLine.hasOption(slots.getOpt())) { effectiveConfiguration.setInteger(TaskManagerOptions.NUM_TASK_SLOTS, Integer.parseInt(commandLine.getOptionValue(slots.getOpt()))); } //检查其他命令行参数 dynamicPropertiesEncoded = encodeDynamicProperties(commandLine); if (!dynamicPropertiesEncoded.isEmpty()) { Map dynProperties = getDynamicProperties(dynamicPropertiesEncoded); for (Map.Entry dynProperty : dynProperties.entrySet()) { effectiveConfiguration.setString(dynProperty.getKey(), dynProperty.getValue()); } } //检查是否是yarn的Detached模式,最后将提取的参数返回 if (isYarnPropertiesFileMode(commandLine)) { return applyYarnProperties(effectiveConfiguration); } else { return effectiveConfiguration; } }
1.5 执行用户程序
protected void executeProgram(final Configuration configuration, final PackagedProgram program) throws ProgramInvocationException { ClientUtils.executeProgram(new DefaultExecutorServiceLoader(), configuration, program, false, false); } public static void executeProgram( PipelineExecutorServiceLoader executorServiceLoader, Configuration configuration, PackagedProgram program, boolean enforceSingleJobExecution, boolean suppressSysout) throws ProgramInvocationException { checkNotNull(executorServiceLoader); //获取上一部初始化的PackagedProgram内置的classLoader final ClassLoader userCodeClassLoader = program.getUserCodeClassLoader(); final ClassLoader contextClassLoader = Thread.currentThread().getContextClassLoader(); try { Thread.currentThread().setContextClassLoader(userCodeClassLoader); LOG.info("Starting program (detached: {})", !configuration.getBoolean(DeploymentOptions.ATTACHED)); //设置上下文环境 ContextEnvironment.setAsContext( executorServiceLoader, configuration, userCodeClassLoader, enforceSingleJobExecution, suppressSysout); //设置StreamContext上下文环境 StreamContextEnvironment.setAsContext( executorServiceLoader, configuration, userCodeClassLoader, enforceSingleJobExecution, suppressSysout); try { //调用用户主类方法,最终执行程序 program.invokeInteractiveModeForExecution(); } finally { //恢复上下文环境 ContextEnvironment.unsetAsContext(); //恢复StreamContext上下文环境 StreamContextEnvironment.unsetAsContext(); } } finally { Thread.currentThread().setContextClassLoader(contextClassLoader); } }
至此,基于org.apache.flink.yarn.cli.FlinkYarnSessionCli和run模式的运行流程结束,通过执行用户端代码的过程,发现1.11的和早期版本已经差别很大,但是基于1.11又有了run-application运行模式,这个模式和之前的run运行模式非常相似。
protected void runApplication(String[] args) throws Exception { LOG.info("Running 'run-application' command."); //获取命令行参数选项 final Options commandOptions = CliFrontendParser.getRunCommandOptions(); //合并用户参数到命令行运行参数中,并封装到命令行对象中 final CommandLine commandLine = getCommandLine(commandOptions, args, true); //如果是help命令则返回 if (commandLine.hasOption(HELP_OPTION.getOpt())) { CliFrontendParser.printHelpForRun(customCommandLines); return; } //同上,验证CMD参数,并获取要使用的CommandLine实例 final CustomCommandLine activeCommandLine = validateAndGetActiveCommandLine(checkNotNull(commandLine)); //封装提交端的参数到ProgramOptions对象中 final ProgramOptions programOptions = new ProgramOptions(commandLine); //创建应用部署实例 final ApplicationDeployer deployer = new ApplicationClusterDeployer(clusterClientServiceLoader); //检验jar包是否存在 programOptions.validate(); final URI uri = PackagedProgramUtils.resolveURI(programOptions.getJarFilePath()); //获取有效的参数,并封装到configuration final Configuration effectiveConfiguration = getEffectiveConfiguration( activeCommandLine, commandLine, programOptions, Collections.singletonList(uri.toString())); final ApplicationConfiguration applicationConfiguration = //通过用户的入参,和主类创建一个ApplicationConfiguration配置类实例 new ApplicationConfiguration(programOptions.getProgramArgs(), programOptions.getEntryPointClassName()); //使用deployr实例部署应用 deployer.run(effectiveConfiguration, applicationConfiguration); }
继续看:org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer#run
public void run( final Configuration configuration, final ApplicationConfiguration applicationConfiguration) throws Exception { checkNotNull(configuration); checkNotNull(applicationConfiguration); LOG.info("Submitting application in 'Application Mode'."); //实例化集群客户端 final ClusterClientFactory clientFactory = clientServiceLoader.getClusterClientFactory(configuration); try (final ClusterDescriptor clusterDescriptor = clientFactory.createClusterDescriptor(configuration)) { //获取集群实例 final ClusterSpecification clusterSpecification = clientFactory.getClusterSpecification(configuration); //部署应用 clusterDescriptor.deployApplicationCluster(clusterSpecification, applicationConfiguration); } }
此时部署应用时,调用了ClusterDescriptor接口下的deployApplicationCluster方法,该方法有两个实现类:
YarnClusterDescriptor和StandaloneClusterDescriptor。
我们主要看YarnClusterDescriptor的方法调用:
@Override public ClusterClientProviderdeployApplicationCluster( final ClusterSpecification clusterSpecification, final ApplicationConfiguration applicationConfiguration) throws ClusterDeploymentException { checkNotNull(clusterSpecification); checkNotNull(applicationConfiguration); //获取部署模式 final YarnDeploymentTarget deploymentTarget = YarnDeploymentTarget.fromConfig(flinkConfiguration); //限定部署模式必须为run-application模式 if (YarnDeploymentTarget.APPLICATION != deploymentTarget) { throw new ClusterDeploymentException( "Couldn't deploy Yarn Application Cluster." + " Expected deployment.target=" + YarnDeploymentTarget.APPLICATION.getName() + " but actual one was \"" + deploymentTarget.getName() + "\""); } //吸收flink的配置 applicationConfiguration.applyToConfiguration(flinkConfiguration); //获取jar,并检查是否有多个 final List pipelineJars = flinkConfiguration.getOptional(PipelineOptions.JARS).orElse(Collections.emptyList()); Preconditions.checkArgument(pipelineJars.size() == 1, "Should only have one jar"); try { //开始将应用部署到yarn集群上 return deployInternal( clusterSpecification, "Flink Application Cluster", YarnApplicationClusterEntryPoint.class.getName(), null, false); } catch (Exception e) { throw new ClusterDeploymentException("Couldn't deploy Yarn Application Cluster", e); } }
接下来看最后一步:
/** * 该方法将会阻塞,直到applicationmaster或者jobManager部署到yarn上 * This method will block until the ApplicationMaster/JobManager have been deployed on YARN. * * @param clusterSpecification Initial cluster specification for the Flink cluster to be deployed * @param applicationName name of the Yarn application to start * @param yarnClusterEntrypoint Class name of the Yarn cluster entry point. * @param jobGraph A job graph which is deployed with the Flink cluster, {@code null} if none * @param detached True if the cluster should be started in detached mode */ private ClusterClientProviderdeployInternal( ClusterSpecification clusterSpecification, String applicationName, String yarnClusterEntrypoint, @Nullable JobGraph jobGraph, boolean detached) throws Exception { //获取执行的用户 final UserGroupInformation currentUser = UserGroupInformation.getCurrentUser(); if (HadoopUtils.isKerberosSecurityEnabled(currentUser)) { boolean useTicketCache = flinkConfiguration.getBoolean(SecurityOptions.KERBEROS_LOGIN_USETICKETCACHE); if (!HadoopUtils.areKerberosCredentialsValid(currentUser, useTicketCache)) { throw new RuntimeException("Hadoop security with Kerberos is enabled but the login user " + "does not have Kerberos credentials or delegation tokens!"); } } isReadyForDeployment(clusterSpecification); // ------------------ Check if the specified queue exists -------------------- //检查yarn队列 checkYarnQueues(yarnClient); // ------------------ Check if the YARN ClusterClient has the requested resources -------------- // Create application via yarnClient //创建一个yarnClient final YarnClientApplication yarnApplication = yarnClient.createApplication(); final GetNewApplicationResponse appResponse = yarnApplication.getNewApplicationResponse(); //获取最大资源 Resource maxRes = appResponse.getMaximumResourceCapability(); //获取空闲资源 final ClusterResourceDescription freeClusterMem; try { freeClusterMem = getCurrentFreeClusterResources(yarnClient); } catch (YarnException | IOException e) { failSessionDuringDeployment(yarnClient, yarnApplication); throw new YarnDeploymentException("Could not retrieve information about free cluster resources.", e); } //获取允许使用的最小内存资源 final int yarnMinAllocationMB = yarnConfiguration.getInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 0); final ClusterSpecification validClusterSpecification; try { validClusterSpecification = validateClusterResources( clusterSpecification, yarnMinAllocationMB, maxRes, freeClusterMem); } catch (YarnDeploymentException yde) { failSessionDuringDeployment(yarnClient, yarnApplication); throw yde; } LOG.info("Cluster specification: {}", validClusterSpecification); final ClusterEntrypoint.ExecutionMode executionMode = detached ? ClusterEntrypoint.ExecutionMode.DETACHED : ClusterEntrypoint.ExecutionMode.NORMAL; flinkConfiguration.setString(ClusterEntrypoint.EXECUTION_MODE, executionMode.toString()); //启动appMaster ApplicationReport report = startAppMaster( flinkConfiguration, applicationName, yarnClusterEntrypoint, jobGraph, yarnClient, yarnApplication, validClusterSpecification); // print the application id for user to cancel themselves. //判断是否时detached模式,如果是,则打印applicationId if (detached) { final ApplicationId yarnApplicationId = report.getApplicationId(); logDetachedClusterInformation(yarnApplicationId, LOG); } //打印yarn的信息,如applicationId 并将信息写回flinkConfig,即report setClusterEntrypointInfoToConfig(report); return () -> { try { //将RestClusterClient返回,可以通过客户端获取部署到集群上的flink配置信息 return new RestClusterClient<>(flinkConfiguration, report.getApplicationId()); } catch (Exception e) { throw new RuntimeException("Error while creating RestClusterClient.", e); } }; }
至此,run-application部署模式已经结束。
回顾整个部署过程,org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer#run方法才是整个部署的灵魂。所以我们考虑,当我们自己准备了Configuration configuration, ApplicationConfiguration applicationConfiguration时,可以省略前面的步骤,直接提交我们的作业到集群中。由此我们参考公众号[大数据技术与应用实战]的代码https://github.com/zhangjun0x01/bigdata-examples.git #cluster.SubmitJobApplicationMode.通过自己封装参数的形式,将代码部署到yarn集群中,通过返回的客户端对象,我们可以获取当前应用的applicationId等信息。这样就可以轻量的将flink的部署程序整合进我们自有的大数据平台中。
参考代码:
public static void main(String[] args){ //flink的本地配置目录,为了得到flink的配置 String configurationDirectory = "/Users/user/work/flink/conf/"; //存放flink集群相关的jar包目录 String flinkLibs = "hdfs://hadoopcluster/data/flink/libs"; //用户jar String userJarPath = "hdfs://hadoopcluster/data/flink/user-lib/TopSpeedWindowing.jar"; String flinkDistJar = "hdfs://hadoopcluster/data/flink/libs/flink-yarn_2.11-1.11.0.jar"; YarnClient yarnClient = YarnClient.createYarnClient(); YarnConfiguration yarnConfiguration = new YarnConfiguration(); yarnClient.init(yarnConfiguration); yarnClient.start(); YarnClusterInformationRetriever clusterInformationRetriever = YarnClientYarnClusterInformationRetriever .create(yarnClient); //获取flink的配置 Configuration flinkConfiguration = GlobalConfiguration.loadConfiguration( configurationDirectory); flinkConfiguration.set(CheckpointingOptions.INCREMENTAL_CHECKPOINTS, true); flinkConfiguration.set( PipelineOptions.JARS, Collections.singletonList( userJarPath)); Path remoteLib = new Path(flinkLibs); flinkConfiguration.set( YarnConfigOptions.PROVIDED_LIB_DIRS, Collections.singletonList(remoteLib.toString())); flinkConfiguration.set( YarnConfigOptions.FLINK_DIST_JAR, flinkDistJar); //设置为application模式 flinkConfiguration.set( DeploymentOptions.TARGET, YarnDeploymentTarget.APPLICATION.getName()); //yarn application name flinkConfiguration.set(YarnConfigOptions.APPLICATION_NAME, "jobName"); //规范 ClusterSpecification clusterSpecification = new ClusterSpecification.ClusterSpecificationBuilder() .createClusterSpecification();// 设置用户jar的参数和主类 ApplicationConfiguration appConfig = new ApplicationConfiguration(args, null); YarnClusterDescriptor yarnClusterDescriptor = new YarnClusterDescriptor( flinkConfiguration, yarnConfiguration, yarnClient, clusterInformationRetriever, true); ClusterClientProvider clusterClientProvider = null; try { clusterClientProvider = yarnClusterDescriptor.deployApplicationCluster( clusterSpecification, appConfig); } catch (ClusterDeploymentException e){ e.printStackTrace(); } ClusterClient clusterClient = clusterClientProvider.getClusterClient(); ApplicationId applicationId = clusterClient.getClusterId(); System.out.println(applicationId); }
完结,欢迎指正。
参考:
https://blog.csdn.net/weixin_43161811/article/details/103152867
https://blog.csdn.net/zhangjun5965/article/details/107511615