1.相关配置
public static final ConfigOption<List<String>> SHIP_FILES =
key("yarn.ship-files")
.stringType()
.asList()
.noDefaultValue()
.withDeprecatedKeys("yarn.ship-directories")
.withDescription(
"A semicolon-separated list of files and/or directories to be shipped to the YARN cluster.");
配置就是用于上传第三方文件到yarn上的
2.文件加载
在YarnClusterDescriptor中,读取配置并添加进一个集合
decodeFilesToShipToCluster(flinkConfiguration, YarnConfigOptions.SHIP_FILES)
.ifPresent(this::addShipFiles);
之后在startAppMaster的时候进行文件上传
// Register all files in provided lib dirs as local resources with public visibility
// and upload the remaining dependencies as local resources with APPLICATION visibility.
final List<String> systemClassPaths = fileUploader.registerProvidedLocalResources();
final List<String> uploadedDependencies =
fileUploader.registerMultipleLocalResources(
systemShipFiles.stream()
.map(e -> new Path(e.toURI()))
.collect(Collectors.toSet()),
Path.CUR_DIR,
LocalResourceType.FILE);
systemClassPaths.addAll(uploadedDependencies);
最终的文件上传在YarnApplicationFileUploader的uploadLocalFileToRemote接口,将数据上传HDFS。
final Path applicationDir = getApplicationDirPath(homeDir, applicationId);
final String suffix =
(relativeDstPath.isEmpty() ? "" : relativeDstPath + "/") + localSrcPath.getName();
final Path dst = new Path(applicationDir, suffix);
LOG.debug(
"Copying from {} to {} with replication factor {}",
localSrcPath,
dst,
replicationFactor);
fileSystem.copyFromLocalFile(false, true, localSrcPath, dst);
fileSystem.setReplication(dst, (short) replicationFactor);
return dst;
这里需要注意一点,上传是基于fileSystem进行的(包括homeDir的定义),fileSystem是在创建YarnApplicationFileUploader是传入的。homeDir和fileSystem都是由yarn.staging-directory确定的,如果没有配置,则由fs.defaultFS决定,默认是本地目录
Path stagingDirPath = getStagingDir(fs);
FileSystem stagingDirFs = stagingDirPath.getFileSystem(yarnConfiguration);
final YarnApplicationFileUploader fileUploader =
YarnApplicationFileUploader.from(
stagingDirFs,
stagingDirPath,
providedLibDirs,
appContext.getApplicationId(),
getFileReplication());
3.启动
根据启动类型,启动入口为YarnApplicationClusterEntryPoint、YarnJobClusterEntrypoint等,启用的rest都是MiniDispatcherRestEndpoint。
这里最终要的一个点是配置文件目录,同时也是yarn.ship-files文件上的目录
Map<String, String> env = System.getenv();
final String workingDirectory = env.get(ApplicationConstants.Environment.PWD.key());
最终加载配置时传入配置目录
final Configuration configuration =
YarnEntrypointUtils.loadConfiguration(workingDirectory, dynamicParameters, env);
4.补充
注意有一个配置,可以用来提升Application的效率,防止每次下发Flink的jar文件。后续的文件上传流程会
public static final ConfigOption<List<String>> PROVIDED_LIB_DIRS =
key("yarn.provided.lib.dirs")
.stringType()
.asList()
.noDefaultValue()
.withDescription(
"A semicolon-separated list of provided lib directories. They should be pre-uploaded and "
+ "world-readable. Flink will use them to exclude the local Flink jars(e.g. flink-dist, lib/, plugins/)"
+ "uploading to accelerate the job submission process. Also YARN will cache them on the nodes so that "
+ "they doesn't need to be downloaded every time for each application. An example could be "
+ "hdfs://$namenode_address/path/of/flink/lib");
如果这个配置为空,SHIP_FILES的时候会加上flink的jar包目录
if (providedLibDirs == null || providedLibDirs.isEmpty()) {
addLibFoldersToShipFiles(systemShipFiles);
}
jar包目录来自环境变量
String libDir = System.getenv().get(ENV_FLINK_LIB_DIR);