碰到jBPM的编码问题了 T T

上周在写某东西的时候,要用jBPM来部署流程,数据来源是Saito同学传来的XML字符串。但是只要XML里一有中文,部署就会失败。而且在写的那东西极其不好调试——程序里很多地方都被proxy包装过的话,调试起来会很痛苦。
大概的状况是:Saito同学传来的XML字符串不包含XML声明,我这边调用jBPM新建NewDeployment,把字符串放进去,然后deploy,然后就抛出个被包装了很多层的异常,其中最内层的是MalformedByteSequenceException。
之前还有别的更头疼的问题要解决,这个问题就放在了一边。周五的时候终于有时间去修一修。

抓来jBPM 4.1的源码一看,囧了。在我调用jBPM的地方,传入的XML字符串大概经过了下面一些方法才被解析为DOM:

org.jbpm.pvm.internal.repository.DeploymentImpl:
public NewDeployment addResourceFromString(String resourceName, String text) {
addResourceFromStreamInput(resourceName, new StringStreamInput(text));
return this;
}


org.jbpm.pvm.internal.stream.StringStreamInput:
public class StringStreamInput extends StreamInput {

String string;

public StringStreamInput(String string) {
this.name = "string";
this.string = string;
}

public InputStream openStream() {
byte[] bytes = string.getBytes();
return new ByteArrayInputStream(bytes);
}
}


org.jbpm.pvm.internal.xml.Parse:
protected InputSource getInputSource() {
if (inputSource!=null) {
return inputSource;
}

if (streamInput!=null) {
inputStream = streamInput.openStream();
return new InputSource(inputStream);
}

addProblem("no source specified to parse");
return null;
}


org.jbpm.pvm.internal.xml.Parser:
protected Document buildDom(Parse parse) {
Document document = null;

try {
SAXParser saxParser = saxParserFactory.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();

// ...

InputSource inputSource = parse.getInputSource();
xmlReader.parse(inputSource);

} catch (Exception e) {
parse.addProblem("couldn't parse xml document", e);
}

return document;
}


总之绕了各种接口,经过层层包装、拆包、再包装再拆包,可怜的字符串终于来到SAX解析器的手上。问题是jBPM在中间调用了String.getBytes():这个方法会把Java字符串(Unicode)转换为系统默认编码并返回对应的byte[],但当InputSource中没有设置编码信息时,SAXParser默认是以UTF-8编码来读取输入流的。我的开发机的系统默认编码是GBK,于是就出问题了。jBPM用String.getBytes()让我无语了……就不能用带编码参数的版本么 T T

简单再现这个问题可以用下面这个程序:
import java.io.*;
import javax.xml.parsers.*;
import org.xml.sax.*;

public class TestSAX {
private static InputSource getInputSource(String src) {
return new InputSource(new ByteArrayInputStream(src.getBytes()));
}

public static void main(String[] args) throws Exception {
String xmlStr = "<test name=\"名称\"></test>";
System.out.println(xmlStr);

SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
SAXParser saxParser = saxParserFactory.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();

InputSource input = getInputSource(xmlStr);
xmlReader.parse(input);
}
}

运行它会看到:
D:\temp>java TestSAX
<test name="名称"></test>
Exception in thread "main" com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanLiteral(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$ContentDriver.scanRootElementHook(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at TestSAX.main(TestSAX.java:19)


稍微修改,加上对应本机器的系统默认编码的XML声明后,问题就解决了:
import java.io.*;
import javax.xml.parsers.*;
import org.xml.sax.*;

public class TestSAX {
private static InputSource getInputSource(String src) {
return new InputSource(new ByteArrayInputStream(src.getBytes()));
}

public static void main(String[] args) throws Exception {
String xmlStr = "<?xml version=\"1.0\" encoding=\"" + System.getProperty("file.encoding") + "\"?><test name=\"名称\"></test>";
System.out.println(xmlStr);

SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
SAXParser saxParser = saxParserFactory.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();

InputSource input = getInputSource(xmlStr);
xmlReader.parse(input);
}
}


这让我想起以前在developerWorks读过的一篇文章,[url=http://www.ibm.com/developerworks/xml/library/x-tipdecl.html]Tip: Always use an XML declaration[/url]。确实是应该写上XML声明的。

不过随便乱写一个XML声明却不管用。如果把上例的XML声明中的编码改为"UTF-8",在我这默认GBK的系统上跑照样是碰到MalformedByteSequenceException错误。考虑到我不应该依赖于定测试环境和线上环境的编码到底是什么,这里就用System.getProperty("file.encoding")去获取系统默认编码,以便于String.getBytes()匹配。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值