java xml内容的引用,Java XML解析:避免实体引用解析

I am currently parsing XHTML documents with a DOM parser, like:

final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

dbf.setValidating(false);

final DocumentBuilder db = dbf.newDocumentBuilder();

db.setEntityResolver(MY_ENTITY_RESOLVER);

db.setErrorHandler(MY_ERROR_HANDLER);

...

final Document doc = db.parse(inputSource);

And my problem is that when my document contains an entity reference like, for example:

My parser creates a Text node for that content containing "€" instead of "€". This is, it is resolving the entity in the way it is supposed to do it (the XHTML 1.0 Strict DTD links to the ENTITIES Latin1 DTD, which in turn establishes the equivalence of "€" with "€").

The problem is, I don't want the parser to do such thing. I would like to keep the "€" text unmodified.

I've already tried with:

final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

dbf.setExpandEntityReferences(false);

But:

I don't like this because I fear this might make some parser implementations not navigate from the XHTML 1.0 Strict DTD to the ENTITIES Latin1 DTD and therefore not consider "€" as a declared entity.

When I do this, it weirdly creates two nodes: a "pound" Entity node, and a Text node with the "€" symbol after it.

Any ideas? Is it possible to configure this in a DOM Parser without resorting to preprocessing the XHTML and substituting all "&" symbols for something other?...

Solutions could be for a DOM parser or also a SAX one, I wouldn't mind using SAX parsing and then creating my DOM using a transformation...

Also, I cannot switch to a non standard XML parsing libray. No jdom, no jsoup, no HtmlCleaner, etc.

Thanks a lot.

解决方案

The approach I took was to replace any entities with a unique marker that is treated as plain text by Xerces. Once converted into a Document object, the markers are replaced with Entity Reference objects.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值