My goal is to do standalone HTML 5 Markup validation with an XSD schema in java.
In the following I describe my approach.
Any help is appreciated -- also if there is an alternative or better way to do this.
Trang [3], is an open-source converter for different XML schema languages and should be able to convert from Relax NG to XSD. With the WHATTF schema, a trang converter call can be made as follows:
$ java -jar ./trang.jar ./whattf/syntax/relaxng/html5.rnc html5.xsd
However, trang produces many of the following wanrnings on incorrect type conversion:
whattf/syntax/relaxng/applications.rnc:265:51: warning: cannot convert datatype library "http://whattf.org/datatype-draft"; using datatype "string"
[...]
I think for trang to be working, one needs to pass pluggable-datatypes [4] to jing. Jing [5]
is a Relax NG validator and I think it used by trang.
In the whattf/syntax/relaxng/datatype folder a java implementation for these pluggable-datatypes is provided. Thus I created a html5-datatypes.jar and added it to trangs classpath as follows:
$ java -cp ./html5-datatypes.jar -jar ./trang.jar ./whattf/syntax/relaxng/html5.rnc html5.xsd
However, this results in the same errors.
Beyond that, using the created XSD files with the javax.xml.validation.Validator as follows:
SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = schemaFactory.newSchema( new File("html5.xsd") );
Validator validator = schema.newValidator();
validator.validate( new StreamSource( new File("example.html") ) );
produces an exception:
org.xml.sax.SAXParseException: cos-element-consistent: Error for type 'time.inner'. Multiple elements with name 'script', with different types, appear in the model group.
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.error(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.xs.XSConstraints.reportSchemaError(Unknown Source)
at org.apache.xerces.impl.xs.XSConstraints.fullSchemaChecking(Unknown Source)
at org.apache.xerces.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source)
at org.apache.xerces.impl.xs.XMLSchemaLoader.loadGrammar(Unknown Source)
at org.apache.xerces.jaxp.validation.XMLSchemaFactory.newSchema(Unknown Source)
at javax.xml.validation.SchemaFactory.newSchema(SchemaFactory.java:594)
at javax.xml.validation.SchemaFactory.newSchema(SchemaFactory.java:610)
[3] thaiopensource.com/relaxng/trang.html
[4] thaiopensource.com/relaxng/pluggable-datatypes.html
[5] thaiopensource.com/relaxng/jing.html
解决方案
There seem to be some XHTML 5 XSDs around in the Web. For instance, there was an open-source XHTML 5 Schema at: http://www.xmlmind.com/xmleditor/download.shtml