Java --- Guava Escapers
Guava Escaper 提供了对字符串内容中特殊字符进行替换的框架,并包括了Xml和Html的两个实现。
结构
基本结构:
Escaper: 最原始的抽象基类,定义了转化功能接口 publicabstract String escape(String string);
CharEscaper:继承自Escaper类,实现了Char替换的递归转换过程,遗留proteccted abstract char[] escape(char c)子类实现具体的字符转换逻辑。
UnicodeEscaper:类似CharEscaper,实现将Unicode字符替换成特定字符的转换,遗留protected abstract char[] escape(int cp)来进行具体的转换。
ArrayBasedCharEscaper:CharEscaper的子类,使用二维数组保存带替换字符和替换结果。
ArrayBasedUnicodeEscaper:UnicodeEscaper的子类,使用二维数组保存带替换字符和替换结果。
Builder:Escapers类的内部类,Escapers类提供了获取Builder的静态方法,Builder类通过添加待替换字符和替换结果的关系,然后调用Build方法来方位Escaper实例,该实例是ArrayBasedCharEscaper的子类。
获取Escaper实例
Guava 提供了常用的CharEscaper的获取方法,Escapers.builder().addEscape(..,..)…build()。
如下面是XmlEscapers是创建拥有XML Escape的Escape对象方法:
static { Escapers.Builder builder = Escapers.builder(); // The char values \uFFFE and \uFFFF are explicitly not allowed in XML // (Unicode code points above \uFFFF are represented via surrogate pairs // which means they are treated as pairs of safe characters). builder.setSafeRange(Character.MIN_VALUE, '\uFFFD'); // Unsafe characters are replaced with the Unicode replacement character. builder.setUnsafeReplacement("\uFFFD");
/* * Except for \n, \t, and \r, all ASCII control characters are replaced with * the Unicode replacement character. * * Implementation note: An alternative to the following would be to make a * map that simply replaces the allowed ASCII whitespace characters with * themselves and to set the minimum safe character to 0x20. However this * would slow down the escaping of simple strings that contain \t, \n, or * \r. */ for (charc = MIN_ASCII_CONTROL_CHAR; c <= MAX_ASCII_CONTROL_CHAR; c++) { if (c != '\t' && c != '\n' && c != '\r') { builder.addEscape(c, "\uFFFD"); } }
// Build the content escaper first and then add quote escaping for the // general escaper. builder.addEscape('&', "&"); builder.addEscape('<', "<"); builder.addEscape('>', ">"); XML_CONTENT_ESCAPER = builder.build(); builder.addEscape('\'', "'"); builder.addEscape('"', """); XML_ESCAPER = builder.build(); builder.addEscape('\t', "	"); builder.addEscape('\n', "
"); builder.addEscape('\r', "
"); XML_ATTRIBUTE_ESCAPER = builder.build(); } |
从上面的代码中可以看出构建并获取自己的CharEscape的实例非常简单,流程如下:
1. 调用Escapers.newBuilder()获取Builder
2. 设置安全区域builder.setSafeRange(‘’,’’)
3. 设置非安全字符,同时没有对应替代字符数组的字符的默认替换builder.setUnsafeReplacement()
4. 添加替换规则builder.addEscape ….
5. 调用builder.build()获取Escape对象。
XmlEscaper使用
Guava 提供了常用的XmlEscaper,通过EmlEscapers可以获取针对内容,属性的Escape实例使用。如:
Escaper escaper = XmlEscapers.xmlContentEscaper(); String smContent = "<test>Dosda/jlsdaf><ifdsa<M<>"; String input = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" + "<content>"+smContent+"</content>"; System.out.println(input); input = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" + "<content>"+escaper.escape(smContent)+"</content>"; System.out.println(input); |
结果:
<?xml version="1.0" encoding="UTF-8"?><content><test>Dosda/jlsdaf><ifdsa<M<></content>
|
可以看出escape将<,>等特殊字符进行了转义。
URL Escape
URL 中也经常需要进行特殊字符的替换,否则解析可能会出现意想不到的结果,Guava也提供了URL相关的Escape
URL 格式:protocol ://hostname[:port] / path / [;parameters][?query]#fragment
Guava Escape提供了 path,parameter,fragment三个部分的Escape实例。
分别调用UrlEscapers类的以下方法获取:
urlFormParameterEscaper()
urlPathSegmentEscaper()
urlFragmentEscaper()