目录
Configuration类是hadoop的配置类,而客户端获取配置最常用的方式,就是Java Configuration类的addResource()方法和set()方法。此外,还可以通过shell加 -D 的方式,获取指定配置项。本文将通过以下代码,深入源码,探究Client的配置加载过程。
本次分析,hadoop版本为3.3.4
@Test
public void test2() throws Exception {
Configuration conf = new Configuration();
conf.addResource(new Path("/Users/didi/core-site.xml"));
conf.addResource(new Path("/Users/didi/hdfs-site.xml"));
conf.set("aaa", "bbb");
FsShell shell = new FsShell(conf);
String[] args = {"-D", "ccc=ddd", "-cat", "hdfs://ns1/input/test1.txt"};
int res;
try {
res = ToolRunner.run(shell, args);
} catch (Exception e) {
throw new RuntimeException(e);
} finally {
shell.close();
}
System.exit(res);
}
1. addResource()方法
我们从conf.addResource(new Path("/Users/didi/core-site.xml"));开始分析。
(1). addResource()方法有很多重载的方法:
但是,实际上他们都在做同一件事情:新建resource类把这些配置来源包装起来,然后存在resources里。其中,resource是Configuration的内部类,专门用来标记这些配置项来源;resources则是有序存放resource的ArrayList。addResource()方法如下:
public void addResource(Path file) {
addResourceObject(new Resource(file));
}
private synchronized void addResourceObject(Resource resource) {
resources.add(resource); // add to resources
restrictSystemProps |= resource.isParserRestricted();
loadProps(properties, resources.size() - 1, false);
}
(2). 这里,loadProps()方法将resources中除去本次放置的资源外的其他资源全部进行加载:
private synchronized void loadProps(final Properties props,
final int startIdx, final boolean fullReload) {
if (props != null) {
Map<String, String[]> backup =
updatingResource != null
? new ConcurrentHashMap<>(updatingResource) : null;
loadResources(props, resources, startIdx, fullReload, quietmode);
if (overlay != null) {
props.putAll(overlay);
if (backup != null) {
for (Map.Entry<Object, Object> item : overlay.entrySet()) {
String key = (String) item.getKey();
String[] source = backup.get(key);
if (source != null) {
updatingResource.put(key, source);
}
}
}
}
}
}
代码有两个作用:为加载的属性注明来源和加载已有资源。由于此处不满足加载条件props != null,因此会跳出代码。接下来的流程里还会调用这个方法,我们还会再来分析,主要是分析loadResources()这个加载资源的方法。
conf.addResource(new Path("/Users/didi/hdfs-site.xml"));同样的方法,不再赘述。
综上,addResource()将资源加入资源列表,如果之前已加载过资源(properties不为空),则加载除了新加资源外的所有资源。
2. conf.set("aaa", "bbb")
这个方法的作用为,将key为aaa,value为bbb的属性加载到properties中去。
其中,properties是Configuration的属性,以key-value形式存放已经加载的配置,如dfs.namenode.rpc-address ——> hadoop01:8020。这里写aaa和bbb,纯粹是为了告诉读者,这个key-value是可以随便写的,但是加载过后能否起到作用,就看内部程序有没有定义相关逻辑了。
(1). 再来看conf.set():
public void set(String name, String value, String source) {
Preconditions.checkArgument(
name != null,
"Property name must not be null");
Preconditions.checkArgument(
value != null,
"The value of property %s must not be null", name);
name = name.trim();
DeprecationContext deprecations = deprecationContext.get();
if (deprecations.getDeprecatedKeyMap().isEmpty()) {
getProps();
}
getOverlay().setProperty(name, value);
getProps().setProperty(name, value);
String newSource = (source == null ? "programmatically" : source);
if (!isDeprecated(name)) {
putIntoUpdatingResource(name, new String[] {newSource});
String[] altNames = getAlternativeNames(name);
if(altNames != null) {
for(String n: altNames) {
if(!n.equals(name)) {
getOverlay().setProperty(n, value);
getProps().setProperty(n, value);
putIntoUpdatingResource(n, new String[] {newSource});
}
}
}
}
else {
String[] names = handleDeprecation(deprecationContext.get(), name);
String altSource = "because " + name + " is deprecated";
for(String n : names) {
getOverlay().setProperty(n, value);
getProps().setProperty(n, value);
putIntoUpdatingResource(n, new String[] {altSource});
}
}
}
里面包含了很多处理逻辑,包括空值处理、过期属性处理、覆盖属性加载和设置、属性的加载和设置、属性来源的处理。注意overlay,这是Configuration类的属性,用来保存即将更新的属性。这里将定义的"aaa" ——> "bbb" 属性放在了overlay里,并在之后通过setproperty()方法设置到了properties中。
(2). 重点关注getProps():
protected synchronized Properties getProps() {
if (properties == null) {
properties = new Properties();
loadProps(properties, 0, true);
}
return properties;
}
private void loadResources(Properties properties,
ArrayList<Resource> resources,
int startIdx,
boolean fullReload,
boolean quiet) {
if(loadDefaults && fullReload) {
for (String resource : defaultResources) {
loadResource(properties, new Resource(resource, false), quiet);
}
}
for (int i = startIdx; i < resources.size(); i++) {
Resource ret = loadResource(properties, resources.get(i), quiet);
if (ret != null) {
resources.set(i, ret);
}
}
this.addTags(properties);
}
这次,初始化了properties,再调用loadProps()方法,进入loadResources()方法。
(3). loadResources()方法主要加载默认的配置和已放入resources中的资源,重点方法就是loadResource()方法,来加载单个resource:
private Resource loadResource(Properties properties,
Resource wrapper, boolean quiet) {
String name = UNKNOWN_RESOURCE;
try {
Object resource = wrapper.getResource();
name = wrapper.getName();
boolean returnCachedProperties = false;
if (resource instanceof InputStream) {
returnCachedProperties = true;
} else if (resource instanceof Properties) {
overlay(properties, (Properties)resource);
}
XMLStreamReader2 reader = getStreamReader(wrapper, quiet);
if (reader == null) {
if (quiet) {
return null;
}
throw new RuntimeException(resource + " not found");
}
Properties toAddTo = properties;
if(returnCachedProperties) {
toAddTo = new Properties();
}
List<ParsedItem> items = new Parser(reader, wrapper, quiet).parse();
for (ParsedItem item : items) {
loadProperty(toAddTo, item.name, item.key, item.value,
item.isFinal, item.sources);
}
reader.close();
if (returnCachedProperties) {
overlay(properties, toAddTo);
return new Resource(toAddTo, name, wrapper.isParserRestricted());
}
return null;
} catch (IOException e) {
LOG.error("error parsing conf " + name, e);
throw new RuntimeException(e);
} catch (XMLStreamException e) {
LOG.error("error parsing conf " + name, e);
throw new RuntimeException(e);
}
}
抛开条件项和控制项不谈,真正重要的代码只有构建reader和parse并读loadProperty两部分:
① (4). XMLStreamReader2 reader = getStreamReader(wrapper, quiet)
private XMLStreamReader2 getStreamReader(Resource wrapper, boolean quiet)
throws XMLStreamException, IOException {
Object resource = wrapper.getResource();
boolean isRestricted = wrapper.isParserRestricted();
XMLStreamReader2 reader = null;
if (resource instanceof URL) { // an URL resource
reader = (XMLStreamReader2)parse((URL)resource, isRestricted);
} else if (resource instanceof String) { // a CLASSPATH resource
URL url = getResource((String)resource);
reader = (XMLStreamReader2)parse(url, isRestricted);
} else if (resource instanceof Path) { // a file resource
// Can't use FileSystem API or we get an infinite loop
// since FileSystem uses Configuration API. Use java.io.File instead.
File file = new File(((Path)resource).toUri().getPath())
.getAbsoluteFile();
if (file.exists()) {
if (!quiet) {
LOG.debug("parsing File " + file);
}
reader = (XMLStreamReader2)parse(new BufferedInputStream(
Files.newInputStream(file.toPath())), ((Path) resource).toString(),
isRestricted);
}
} else if (resource instanceof InputStream) {
reader = (XMLStreamReader2)parse((InputStream)resource, null,
isRestricted);
}
return reader;
}
这个方法根据resource的类型进行parse(),并返回相应的reader。
(5). 所有重载的parse()方法都会指向如下方法:
private XMLStreamReader parse(InputStream is, String systemIdStr,
boolean restricted) throws IOException, XMLStreamException {
if (!quietmode) {
LOG.debug("parsing input stream " + is);
}
if (is == null) {
return null;
}
SystemId systemId = SystemId.construct(systemIdStr);
ReaderConfig readerConfig = XML_INPUT_FACTORY.createPrivateConfig();
if (restricted) {
readerConfig.setProperty(XMLInputFactory.SUPPORT_DTD, false);
}
return XML_INPUT_FACTORY.createSR(readerConfig, systemId,
StreamBootstrapper.getInstance(null, systemId, is), false, true);
}
使用WstxInputFactory.createSR()方法包装输入流和id,返回XMLStreamReader。
② (6). Paser.parse()
List<ParsedItem> items = new Parser(reader, wrapper, quiet).parse();
List<ParsedItem> parse() throws IOException, XMLStreamException {
while (reader.hasNext()) {
parseNext();
}
return results;
}
void parseNext() throws IOException, XMLStreamException {
switch (reader.next()) {
case XMLStreamConstants.START_ELEMENT:
handleStartElement();
break;
case XMLStreamConstants.CHARACTERS:
case XMLStreamConstants.CDATA:
if (parseToken) {
char[] text = reader.getTextCharacters();
token.append(text, reader.getTextStart(), reader.getTextLength());
}
break;
case XMLStreamConstants.END_ELEMENT:
handleEndElement();
break;
default:
break;
}
}
}
parse()方法随着reader依次读取XML文件中的内容,并根据读到的内容给出相应的处理方法。handleStartElement()处理起始的标签,如<name>;reader.getTextCharacters()读取标签中的内容,并放入token(StringBuilder实例),如f,s,.,d,e,f,a,u,l,t,F,S;最后handleEndElement()方法将字符拼接,构成值"fs.defaultFS",并根据标签类型做出相应处理。我们一个一个来看。
(7). handleStartElement()
private void handleStartElement() throws XMLStreamException, IOException {
switch (reader.getLocalName()) {
case "property":
handleStartProperty();
break;
case "name":
case "value":
case "final":
case "source":
case "tag":
parseToken = true;
token.setLength(0);
break;
case "include":
handleInclude();
break;
case "fallback":
fallbackEntered = true;
break;
case "configuration":
break;
default:
break;
}
}
这个方法根据读到的标签处理XML配置文件中的属性,其中handleStartProperty()处理了刚刚读到<property>的情况。
这里多提一句,handleInclude()处理了XML文件中的超链接资源,有远程配置需求时在此解析,解析过程不再具体分析。
(8). handleStartProperty()
private void handleStartProperty() {
confName = null;
confValue = null;
confFinal = false;
confTag = null;
confSource.clear();
// First test for short format configuration
int attrCount = reader.getAttributeCount();
for (int i = 0; i < attrCount; i++) {
String propertyAttr = reader.getAttributeLocalName(i);
if ("name".equals(propertyAttr)) {
confName = StringInterner.weakIntern(
reader.getAttributeValue(i));
} else if ("value".equals(propertyAttr)) {
confValue = StringInterner.weakIntern(
reader.getAttributeValue(i));
} else if ("final".equals(propertyAttr)) {
confFinal = "true".equals(reader.getAttributeValue(i));
} else if ("source".equals(propertyAttr)) {
confSource.add(StringInterner.weakIntern(
reader.getAttributeValue(i)));
} else if ("tag".equals(propertyAttr)) {
confTag = StringInterner
.weakIntern(reader.getAttributeValue(i));
}
}
}
这个方法主要作用为初始化confXXX这些变量,为拼接属性做准备。还有些处理短配置(可能指的是短标签)的逻辑,在正常设置的XML配置中不会用到,不做展开。
(9). handleEndElement()
void handleEndElement() throws IOException {
String tokenStr = token.toString();
switch (reader.getLocalName()) {
case "name":
if (token.length() > 0) {
confName = StringInterner.weakIntern(tokenStr.trim());
}
break;
case "value":
if (token.length() > 0) {
confValue = StringInterner.weakIntern(tokenStr);
}
break;
case "final":
confFinal = "true".equals(tokenStr);
break;
case "source":
confSource.add(StringInterner.weakIntern(tokenStr));
break;
case "tag":
if (token.length() > 0) {
confTag = StringInterner.weakIntern(tokenStr);
}
break;
case "include":
if (fallbackAllowed && !fallbackEntered) {
throw new IOException("Fetch fail on include for '"
+ confInclude + "' with no fallback while loading '"
+ name + "'");
}
fallbackAllowed = false;
fallbackEntered = false;
break;
case "property":
handleEndProperty();
break;
default:
break;
}
}
处理结尾元素,根据不同标签进行不同处理,大抵是将token里的字符拼接为confXXX。关注handleEndProperty()方法。
(10). hadleEndProperty()
void handleEndProperty() {
if (confName == null || (!fallbackAllowed && fallbackEntered)) {
return;
}
String[] confSourceArray;
if (confSource.isEmpty()) {
confSourceArray = nameSingletonArray;
} else {
confSource.add(name);
confSourceArray = confSource.toArray(new String[confSource.size()]);
}
// Read tags and put them in propertyTagsMap
if (confTag != null) {
readTagFromConfig(confTag, confName, confValue, confSourceArray);
}
DeprecatedKeyInfo keyInfo =
deprecations.getDeprecatedKeyMap().get(confName);
if (keyInfo != null) {
keyInfo.clearAccessed();
for (String key : keyInfo.newKeys) {
// update new keys with deprecated key's value
results.add(new ParsedItem(
name, key, confValue, confFinal, confSourceArray));
}
} else {
results.add(new ParsedItem(name, confName, confValue, confFinal,
confSourceArray));
}
}
在hadleEndProperty()方法里对,拼接好的字符串注明来源(如core.xml),处理tag,最后包装为ParsedItem放入results这个ArrayList里。results最后返回为items。
图中展示了items的结构,保存了属性的key-value和来源。
(11). 最后返回loadResource()方法的最后部分,将items中的item一个一个通过loadProperty()方法读入properties中,并注明来源。此部分代码比较简单,略去不谈。
综上,conf.set()方法加载了保存在resources中的资源,再将设置的属性加载到properties中去。这是一个触发了真正加载配置的行为的方法,而不是懒加载。猜想这么设计的理由,是因为set的时候可能会与之前加载的属性有覆盖关系,或者设置的属性为final而之前有过设置,为了避免冲突,需要进行加载。
3. hadoop fs -D
对于 -D 的介绍,help给出了简洁易懂的描述:
$ ./bin/hadoop fs -help
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
......
......
Generic options supported are:
......
-D <property=value> define a value for a given property
......
The general command line syntax is:
command [genericOptions] [commandOptions]
命令的顺序也很清晰:command [genericOptions] [commandOptions]
例如命令 ./bin/hadoop fs -D fs.defaultFS=freeToWrite -ls /
按照预期,key为fs.defaultFS,value为freeToWrite的值就被加载到properties中去了,类似conf.set(),其实底层正是调用了conf.set()方法。
(1). 根据Java API,shell方式的命令入口在ToolRunner.run() :
public static int run(Tool tool, String[] args)
throws Exception{
return run(tool.getConf(), tool, args);
}
public static int run(Configuration conf, Tool tool, String[] args)
throws Exception{
if (CallerContext.getCurrent() == null) {
CallerContext ctx = new CallerContext.Builder("CLI").build();
CallerContext.setCurrent(ctx);
}
// Note the entry point in the audit context; this
// may be used in audit events set to cloud store logs
// or elsewhere.
CommonAuditContext.noteEntryPoint(tool);
if(conf == null) {
conf = new Configuration();
}
GenericOptionsParser parser = new GenericOptionsParser(conf, args);
//set the configuration back, so that Tool can configure itself
tool.setConf(conf);
//get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs();
return tool.run(toolArgs);
}
(2). 忽略掉控制信息,聚焦分析GenericOptionsParser,其构造方法调用了parseGeneralOptions()方法:
private boolean parseGeneralOptions(Options opts, String[] args)
throws IOException {
opts = buildGeneralOptions(opts);
CommandLineParser parser = new GnuParser();
boolean parsed = false;
try {
commandLine = parser.parse(opts, preProcessForWindows(args), true);
processGeneralOptions(commandLine);
parsed = true;
} catch(ParseException e) {
LOG.warn("options parsing failed: "+e.getMessage());
HelpFormatter formatter = new HelpFormatter();
formatter.printHelp("general options are: ", opts);
}
return parsed;
}
这个方法分析了输入的命令并进行拆分,保存在commandLine中:
genericOptions和commandOptions已经被分开了。
(3). 之后,processGeneralOptions()方法对commandLine进行分析:
private void processGeneralOptions(CommandLine line) throws IOException {
......
......
if (line.hasOption('D')) {
String[] property = line.getOptionValues('D');
for(String prop : property) {
String[] keyval = prop.split("=", 2);
if (keyval.length == 2) {
conf.set(keyval[0], keyval[1], "from command line");
}
}
}
......
......
}
这之中的处理逻辑全都是 if(line.hasOption("XXX")){} ,省去其他部分,只看 -D 的部分。逻辑也很简单,按 "=" 划分开key和value并调用conf.set()方法。
至此,-D 加载方法也已经分析完毕,本质上是在fs命令之前调用了conf.set()方法。
补充:本着分析到底的原则(来都来了),之后会对接来下shell命令的执行过程进行解析。