mirrorwriter类中的域:
/**
* Key to use asking settings for character map.
*/
public static final String ATTR_CHAR_MAP = "character-map";
addElementToDefinition(new StringList(ATTR_CHAR_MAP,
"This list is grouped in pairs. "
+ "The first string in each pair must have a length of one. "
+ "If it occurs in a URI path, "
+ "it is replaced by the second string in the pair. "
+ "For UNIX, no character mapping is normally needed. "
+ "For Macintosh, the recommended value is [: %%3A]. "
+ "For Windows, the recommended value is "
+ "[' ' %%20 " %%22 * %%2A : %%3A < %%3C "
+ "\\> %%3E ? %%3F \\\\ %%5C ^ %%5E | %%7C]."));
按照上面的理解,这个域的意义在于:如果路径中含有特殊字符,如'',*,:,<等,就将其转换为对应的%%20,%%22等
addElementToDefinition(new StringList(ATTR_CONTENT_TYPE_MAP,
"This list is grouped in pairs. "
+ "If the content type of a resource begins (case-insensitive) "
+ "with the first string in a pair, the suffix is set to "
+ "the second string in the pair, replacing any suffix that may "
+ "have been in the URI. For example, to force all HTML files "
+ "to have the same suffix, use [text/html html]."));
这个域:如果内容类型等于该map的key,则将其文件后缀改为对应的后缀。
e = addElementToDefinition(new SimpleType(ATTR_DIRECTORY_FILE,
"Implicitly append this to a URI ending with '/'.",
"index.html"));
如果一个链接以/结尾,则生成一个index.html来表示这个文件
比如一个58团购的链接:http://t.58.com/xm/68526387987971009/?linkid=xm_liebiao_home_1
本地文件如下:
e = addElementToDefinition(new SimpleType(ATTR_DOT_BEGIN,
"If a segment starts with '.', the '.' is replaced by this.",
DEFAULT_DOT_BEGIN));
addElementToDefinition(new SimpleType(ATTR_DOT_END,
"If a directory name ends with '.' it is replaced by this. "
+ "For all file systems except Windows, '.' is recommended. "
+ "For Windows, %%2E is recommended.",
"."));
这边的两个域应该也就是路径转换时用到吧
addElementToDefinition(new StringList(ATTR_HOST_MAP,
"This list is grouped in pairs. "
+ "If a host name matches (case-insensitive) the first string "
+ "in a pair, it is replaced by the second string in the pair. "
+ "This can be used for consistency when several names are used "
+ "for one host, for example "
+ "[12.34.56.78 www42.foo.com]."));
类似dns啊
addElementToDefinition(new SimpleType(ATTR_PATH,
"Top-level directory for mirror files.", "mirror"));
这个都知道啦,镜像存储的位置
addElementToDefinition(new SimpleType(ATTR_SUFFIX_AT_END,
"If true, the suffix is placed at the end of the path, "
+ "after the query (if any). If false, the suffix is placed "
+ "before the query.",
Boolean.TRUE));
如果链接中包含查询的话,是把后缀放到查询前还是查询后。
e = addElementToDefinition(new SimpleType(ATTR_TOO_LONG_DIRECTORY,
"If all the directories in the URI would exceed, "
+ "or come close to exceeding, the file system maximum "
+ "path length, then they are all replaced by this.",
DEFAULT_TOO_LONG_DIRECTORY));
路径太长。。。自动转换
/** Default value for ATTR_TOO_LONG_DIRECTORY.*/
private static final String DEFAULT_TOO_LONG_DIRECTORY = "LONG";
addElementToDefinition(new StringList(ATTR_UNDERSCORE_SET,
"If a directory name appears (case-insensitive) in this list "
+ "then an underscore is placed before it. "
+ "For all file systems except Windows, this is not needed. "
+ "For Windows, the following is recommended: "
+ "[com1 com2 com3 com4 com5 com6 com7 com8 com9 "
+ "lpt1 lpt2 lpt3 lpt4 lpt5 lpt6 lpt7 lpt8 lpt9 "
+ "con nul prn]."));
这个不太懂。。。
innerprocess方法:
String scheme = uuri.getScheme();
if (!"http".equalsIgnoreCase(scheme)
&& !"https".equalsIgnoreCase(scheme)) {
return;
}
非http(s)直接返回;
RecordingInputStream recis = curi.getHttpRecorder().getRecordedInput();
if (0L == recis.getResponseContentLength()) {
return;
}
如果未得到链接指向的网页的内容,直接返回;
String baseDir = null; // Base directory.
String baseSeg = null; // ATTR_PATH value.
try {
baseSeg = (String) getAttribute(ATTR_PATH, curi);
} catch (AttributeNotFoundException e) {
logger.warning(e.getLocalizedMessage());
return;
}
默认baseSeg的值为mirror;
// Trim any trailing File.separatorChar characters from baseSeg.
while ((baseSeg.length() > 1) && baseSeg.endsWith(File.separator)) {
baseSeg = baseSeg.substring(0, baseSeg.length() - 1);
}
去掉文件分隔符
if (0 == baseSeg.length()) {
baseDir = getController().getDisk().getPath();
} else if ((new File(baseSeg)).isAbsolute()) {
baseDir = baseSeg;
} else {
baseDir = getController().getDisk().getPath() + File.separator
+ baseSeg;
}
如果基本路径为空,则从controller中取,也就是从order。xml文件取,
默认就是项目路径了,
原因就在上面,配置文件里面是空的,所以只能是得到工作目录的路径了。。
// Already have a path for this URI.
boolean reCrawl = curi.containsKey(A_MIRROR_PATH);
if (reCrawl) {
mps = curi.getString(A_MIRROR_PATH);
destFile = new File(baseDir + File.separator + mps);
File parent = destFile.getParentFile();
if (null != parent) {
IoUtils.ensureWriteableDirectory(parent);
}
}
如果已经存在路径的话,则直接得到那个路径(不需要转换了);
else {
URIToFileReturn r = null; // Return from uriToFile().
try {
r = uriToFile(baseDir, curi);
} catch (AttributeNotFoundException e) {
logger.warning(e.getLocalizedMessage());
return;
}
destFile = r.getFile();
mps = r.getRelativePath();
}
不然的话,是需要做一个从uri到文件路径的转换的,调用uriToFile。
在方法uriToFile中,完成全部的uri到文件路径的转换工作;
还记得之前有个表,里面类似map的entry,第一个数据为ip,第二个数据为对应的域名,这边应该是进行ip替换为域名;
这边是根据之前的域,设置是否显示端口号;设置后缀,比如contenttype为text/html,后缀为html;
非法字符的转换,如windows路径中不能含有问号之类的;
转换完之后,
logger.warning(uuri.toString() + " -> " + destFile.getPath());
这边会有提示,提示该uri变成本地路径后的字符串显示;
就这样吧