碰到jBPM的编码问题了 T T

最新推荐文章于 2021-03-27 16:34:01 发布

rednaxelafx

最新推荐文章于 2021-03-27 16:34:01 发布

阅读量135

点赞数

分类专栏： Java 文章标签： JBPM Apache XML SUN IBM

本文链接：https://blog.csdn.net/rednaxelafx/article/details/83492670

版权

Java 专栏收录该内容

86 篇文章 0 订阅

订阅专栏

上周在写某东西的时候，要用jBPM来部署流程，数据来源是Saito同学传来的XML字符串。但是只要XML里一有中文，部署就会失败。而且在写的那东西极其不好调试——程序里很多地方都被proxy包装过的话，调试起来会很痛苦。
大概的状况是：Saito同学传来的XML字符串不包含XML声明，我这边调用jBPM新建NewDeployment，把字符串放进去，然后deploy，然后就抛出个被包装了很多层的异常，其中最内层的是MalformedByteSequenceException。
之前还有别的更头疼的问题要解决，这个问题就放在了一边。周五的时候终于有时间去修一修。

抓来jBPM 4.1的源码一看，囧了。在我调用jBPM的地方，传入的XML字符串大概经过了下面一些方法才被解析为DOM：

org.jbpm.pvm.internal.repository.DeploymentImpl：

public NewDeployment addResourceFromString(String resourceName, String text) {
  addResourceFromStreamInput(resourceName, new StringStreamInput(text));
  return this;
}

org.jbpm.pvm.internal.stream.StringStreamInput：

public class StringStreamInput extends StreamInput {

  String string;

  public StringStreamInput(String string) {
    this.name = "string";
    this.string = string;
  }

  public InputStream openStream() {
    byte[] bytes = string.getBytes();
    return new ByteArrayInputStream(bytes);
  }
}

org.jbpm.pvm.internal.xml.Parse：

protected InputSource getInputSource() {
  if (inputSource!=null) {
    return inputSource;
  }

  if (streamInput!=null) {
    inputStream = streamInput.openStream();
    return new InputSource(inputStream);
  }

  addProblem("no source specified to parse");
  return null;
}

org.jbpm.pvm.internal.xml.Parser：

protected Document buildDom(Parse parse) {
  Document document = null;

  try {
    SAXParser saxParser = saxParserFactory.newSAXParser();
    XMLReader xmlReader = saxParser.getXMLReader();

    // ...

    InputSource inputSource = parse.getInputSource(); 
    xmlReader.parse(inputSource);

  } catch (Exception e) {
    parse.addProblem("couldn't parse xml document", e);
  }

  return document;
}

总之绕了各种接口，经过层层包装、拆包、再包装再拆包，可怜的字符串终于来到SAX解析器的手上。问题是jBPM在中间调用了String.getBytes()：这个方法会把Java字符串（Unicode）转换为系统默认编码并返回对应的byte[]，但当InputSource中没有设置编码信息时，SAXParser默认是以UTF-8编码来读取输入流的。我的开发机的系统默认编码是GBK，于是就出问题了。jBPM用String.getBytes()让我无语了……就不能用带编码参数的版本么 T T

简单再现这个问题可以用下面这个程序：

import java.io.*;
import javax.xml.parsers.*;
import org.xml.sax.*;

public class TestSAX {
  private static InputSource getInputSource(String src) {
    return new InputSource(new ByteArrayInputStream(src.getBytes()));
  }

  public static void main(String[] args) throws Exception {
    String xmlStr = "<test name=\"名称\"></test>";
    System.out.println(xmlStr);

    SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
    SAXParser saxParser = saxParserFactory.newSAXParser();
    XMLReader xmlReader = saxParser.getXMLReader();

    InputSource input = getInputSource(xmlStr);
    xmlReader.parse(input);
  }
}

运行它会看到：

D:\temp>java TestSAX
<test name="名称"></test>
Exception in thread "main" com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence.
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanLiteral(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$ContentDriver.scanRootElementHook(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
        at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
        at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
        at TestSAX.main(TestSAX.java:19)

稍微修改，加上对应本机器的系统默认编码的XML声明后，问题就解决了：

import java.io.*;
import javax.xml.parsers.*;
import org.xml.sax.*;

public class TestSAX {
  private static InputSource getInputSource(String src) {
    return new InputSource(new ByteArrayInputStream(src.getBytes()));
  }

  public static void main(String[] args) throws Exception {
    String xmlStr = "<?xml version=\"1.0\" encoding=\"" + System.getProperty("file.encoding") + "\"?><test name=\"名称\"></test>";
    System.out.println(xmlStr);

    SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
    SAXParser saxParser = saxParserFactory.newSAXParser();
    XMLReader xmlReader = saxParser.getXMLReader();

    InputSource input = getInputSource(xmlStr);
    xmlReader.parse(input);
  }
}

这让我想起以前在developerWorks读过的一篇文章，[url=http://www.ibm.com/developerworks/xml/library/x-tipdecl.html]Tip: Always use an XML declaration[/url]。确实是应该写上XML声明的。

不过随便乱写一个XML声明却不管用。如果把上例的XML声明中的编码改为"UTF-8"，在我这默认GBK的系统上跑照样是碰到MalformedByteSequenceException错误。考虑到我不应该依赖于定测试环境和线上环境的编码到底是什么，这里就用System.getProperty("file.encoding")去获取系统默认编码，以便于String.getBytes()匹配。

rednaxelafx

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
碰到jBPM的编码问题了 T T

上周在写某东西的时候，要用jBPM来部署流程，数据来源是Saito同学传来的XML字符串。但是只要XML里一有中文，部署就会失败。而且在写的那东西极其不好调试——程序里很多地方都被proxy包装过的话，调试起来会很痛苦。大概的状况是：Saito同学传来的XML字符串不包含XML声明，我这边调用jBPM新建NewDeployment，把字符串放进去，然后deploy，然后就抛出个被包装了很多层的...
复制链接

扫一扫

专栏目录