解析DTD的一个肮脏的解决方案是滥用Xerces内部.您可以将它作为可接受的东西的起点,因为它已在最近的JRE中可用,源代码可用(使用JDK或来自Apache),并且可以根据您的喜好进行修改(Apache许可证).请注意,对于具有外部实体等的真实DTD,您必须使用适配器配置XMLDTDLoader(例如setEntityResolver / Feature / Property).
这里有一些独立的代码可以试用(这对我来说似乎适用于OpenJDK 1.7.0和Oracle JDK 1.8.0):
import org.xml.sax.InputSource;
import com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammar;
import com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDLoader;
import com.sun.org.apache.xerces.internal.util.SAXInputSource;
import com.sun.org.apache.xerces.internal.xni.parser.XMLInputSource;
public class So26391485 {
public static void main(String[] args) throws Exception {
// minimal example DTD
StringWriter sw = new StringWriter();
sw.write("");
sw.write(" ");
sw.write(" ");
sw.write(" ");
sw.write("]>");
// read DTD
InputStream dtdStream = new ByteArrayInputStream(sw.toString().getBytes());
//InputStream dtdStream = So26391485.class.getResourceAsStream("your.dtd");
Scanner scanner = new Scanner(dtdStream);
String dtdText = scanner.useDelimiter("\\z").next();
scanner.close();
// DIRTY: use Xerces internals to parse the DTD
Pattern dtdPattern = Pattern.compile("^\\s*\s+(\\S+)\\s*\\[(.*)\\]>\\s*$", Pattern.DOTALL);
Matcher m = dtdPattern.matcher(dtdText);
if (m.matches()) {
String docType = m.group(1);
InputSource is = new InputSource(new StringReader(m.group(2)));
XMLInputSource source = new SAXInputSource(is);
XMLDTDLoader d = new XMLDTDLoader();
DTDGrammar g = (DTDGrammar) d.loadGrammar(source);
g.printElements();
}
}
}
(我不得不砍掉DOCTYPE声明,因为我没有设法让Xerces按原样读取DTD.毕竟XMLDTDLoader不是那样用的……)