很多情况下我们为了优化XSD文件的可读性和可维护性,以及复用等问题的时候我们需要将schema文件拆分成多个,本文将着重关注于使用多个schema文件验证单一XML文件的问题(注: XML validation for multiple schemas)
下面将通过以下几个步骤演示如何使用多个schema(XSD)文件验证单一XML文件
1. 创建需要被验证的XML文件
2. 根据XML反向创建XSD文件
3. 使用多个schema验证XML文件
4. 运行测试
现在将逐步展开演示:
1. 创建需要被验证的XML文件
- <? xml version = "1.0" encoding = "utf-8" ?>
- < employees xmlns:admin = "http://www.company.com/management/employees/admin" >
- < admin:employee >
- < admin:userId > johnsmith@company.com </ admin:userId >
- < admin:password > abc123_ </ admin:password >
- < admin:name > John Smith </ admin:name >
- < admin:age > 24 </ admin:age >
- < admin:gender > Male </ admin:gender >
- </ admin:employee >
- < admin:employee >
- < admin:userId > christinechen@company.com </ admin:userId >
- < admin:password > 123456 </ admin:password >
- < admin:name > Christine Chen </ admin:name >
- < admin:age > 27 </ admin:age >
- < admin:gender > Female </ admin:gender >
- </ admin:employee >
- </ employees >
2. 根据XML反向创建XSD文件
注:本文是反向生成的XSD文件,当然您可能是已经有XSD文件,那就可以直接跳过第二步了。
通过观察employees.xml的格式我们可以反向的创建出employees.xsd文件,但是为了快捷起见,我们可以选择使用转换工具(XML to XSD)来完成这项工作,这里我将使用trang:http://www.thaiopensource.com/relaxng/trang.html
首先下载最新版的trang.jar文件,然后将employees.xml和trang.jar放在同一个目录下,运行如下命令行:
java -jar trang.jar employees.xml employees.xsd
运行之后将会在当前目录下生成两个XSD文件:employees.xsd, admin.xsd, 如下:
employees.xsd
- <? xml version = "1.0" encoding = "UTF-8" ?>
- < xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema" elementFormDefault = "qualified" xmlns:admin = "http://www.company.com/management/employees/admin" >
- < xs:import namespace = "http://www.company.com/management/employees/admin" schemaLocation = "admin.xsd" />
- < xs:element name = "employees" >
- < xs:complexType >
- < xs:sequence >
- < xs:element maxOccurs = "unbounded" ref = "admin:employee" />
- </ xs:sequence >
- </ xs:complexType >
- </ xs:element >
- </ xs:schema >
admin.xsd
- <? xml version = "1.0" encoding = "UTF-8" ?>
- < xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema" elementFormDefault = "qualified" targetNamespace = "http://www.company.com/management/employees/admin" xmlns:admin = "http://www.company.com/management/employees/admin" >
- < xs:import schemaLocation = "employees.xsd" />
- < xs:element name = "employee" >
- < xs:complexType >
- < xs:sequence >
- < xs:element ref = "admin:userId" />
- < xs:element ref = "admin:password" />
- < xs:element ref = "admin:name" />
- < xs:element ref = "admin:age" />
- < xs:element ref = "admin:gender" />
- </ xs:sequence >
- </ xs:complexType >
- </ xs:element >
- < xs:element name = "userId" type = "xs:string" />
- < xs:element name = "password" type = "xs:NMTOKEN" />
- < xs:element name = "name" type = "xs:string" />
- < xs:element name = "age" type = "xs:integer" />
- < xs:element name = "gender" type = "xs:NCName" />
- </ xs:schema >
当然你也可以自己手动的去书写XSD文件。
3. 使用多个schema验证XML文件
如果想验证使用单一shema的XML,应该不会遇到太多问题,示例如下:
- public static boolean validateSingleSchema(File xml, File xsd) {
- boolean legal = false ;
- try {
- SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
- Schema schema = sf.newSchema(xsd);
- Validator validator = schema.newValidator();
- validator.validate(new StreamSource(xml));
- legal = true ;
- } catch (Exception e) {
- legal = false ;
- log.error(e.getMessage());
- }
- return legal;
- }
public static boolean validateSingleSchema(File xml, File xsd) {
boolean legal = false;
try {
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = sf.newSchema(xsd);
Validator validator = schema.newValidator();
validator.validate(new StreamSource(xml));
legal = true;
} catch (Exception e) {
legal = false;
log.error(e.getMessage());
}
return legal;
}
但是当使用多个schema验证的时候会导致无法加载classpath外部的使用<xs:import>/<xs:include>加载的XSD文件,导致如下error message:
org.xml.sax.SAXParseException: src-resolve: Cannot resolve the name 'admin:employee' to a(n) 'element declaration' component.
为了解决这个问题我们需要使用LSResourceResolver, SchemaFactory在解析shcema的时候可以使用LSResourceResolver加载外部资源。
代码如下:
- package com.javaeye.terrencexu.jaxb;
- import java.io.File;
- import java.io.FileInputStream;
- import java.io.FileNotFoundException;
- import java.io.InputStream;
- import java.io.Reader;
- import java.net.URI;
- import java.net.URISyntaxException;
- import org.apache.log4j.Logger;
- import org.w3c.dom.ls.LSInput;
- import org.w3c.dom.ls.LSResourceResolver;
- /**
- *
- * Implement LSResourceResolver to customize resource resolution when parsing schemas.
- * <p>
- * SchemaFactory uses a LSResourceResolver when it needs to locate external resources
- * while parsing schemas, although exactly what constitutes "locating external resources"
- * is up to each schema language.
- * </p>
- * <p>
- * For example, for W3C XML Schema, this includes files <include>d or <import>ed,
- * and DTD referenced from schema files, etc.
- *</p>
- *
- */
- class SchemaResourceResolver implements LSResourceResolver {
- private static final Logger log = Logger.getLogger(SchemaResourceResolver. class );
- /**
- *
- * Allow the application to resolve external resources.
- *
- * <p>
- * The LSParser will call this method before opening any external resource, including
- * the external DTD subset, external entities referenced within the DTD, and external
- * entities referenced within the document element (however, the top-level document
- * entity is not passed to this method). The application may then request that the
- * LSParser resolve the external resource itself, that it use an alternative URI,
- * or that it use an entirely different input source.
- * </p>
- *
- * <p>
- * Application writers can use this method to redirect external system identifiers to
- * secure and/or local URI, to look up public identifiers in a catalogue, or to read
- * an entity from a database or other input source (including, for example, a dialog box).
- * </p>
- */
- public LSInput resolveResource(String type, String namespaceURI, String publicId, String systemId, String baseURI) {
- log.info("/n>> Resolving " + "/n"
- + "TYPE: " + type + "/n"
- + "NAMESPACE_URI: " + namespaceURI + "/n"
- + "PUBLIC_ID: " + publicId + "/n"
- + "SYSTEM_ID: " + systemId + "/n"
- + "BASE_URI: " + baseURI + "/n" );
- String schemaLocation = baseURI.substring(0 , baseURI.lastIndexOf( "/" ) + 1 );
- if (systemId.indexOf( "http://" ) < 0 ) {
- systemId = schemaLocation + systemId;
- }
- LSInput lsInput = new LSInputImpl();
- URI uri = null ;
- try {
- uri = new URI(systemId);
- } catch (URISyntaxException e) {
- e.printStackTrace();
- }
- File file = new File(uri);
- FileInputStream is = null ;
- try {
- is = new FileInputStream(file);
- } catch (FileNotFoundException e) {
- e.printStackTrace();
- }
- lsInput.setSystemId(systemId);
- lsInput.setByteStream(is);
- return lsInput;
- }
- /**
- *
- * Represents an input source for data
- *
- */
- class LSInputImpl implements LSInput {
- private String publicId;
- private String systemId;
- private String baseURI;
- private InputStream byteStream;
- private Reader charStream;
- private String stringData;
- private String encoding;
- private boolean certifiedText;
- public LSInputImpl() {}
- public LSInputImpl(String publicId, String systemId, InputStream byteStream) {
- this .publicId = publicId;
- this .systemId = systemId;
- this .byteStream = byteStream;
- }
- public String getBaseURI() {
- return baseURI;
- }
- public InputStream getByteStream() {
- return byteStream;
- }
- public boolean getCertifiedText() {
- return certifiedText;
- }
- public Reader getCharacterStream() {
- return charStream;
- }
- public String getEncoding() {
- return encoding;
- }
- public String getPublicId() {
- return publicId;
- }
- public String getStringData() {
- return stringData;
- }
- public String getSystemId() {
- return systemId;
- }
- public void setBaseURI(String baseURI) {
- this .baseURI = baseURI;
- }
- public void setByteStream(InputStream byteStream) {
- this .byteStream = byteStream;
- }
- public void setCertifiedText( boolean certifiedText) {
- this .certifiedText = certifiedText;
- }
- public void setCharacterStream(Reader characterStream) {
- this .charStream = characterStream;
- }
- public void setEncoding(String encoding) {
- this .encoding = encoding;
- }
- public void setPublicId(String publicId) {
- this .publicId = publicId;
- }
- public void setStringData(String stringData) {
- this .stringData = stringData;
- }
- public void setSystemId(String systemId) {
- this .systemId = systemId;
- }
- }
- }
package com.javaeye.terrencexu.jaxb;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
import java.io.Reader;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.log4j.Logger;
import org.w3c.dom.ls.LSInput;
import org.w3c.dom.ls.LSResourceResolver;
/**
*
* Implement LSResourceResolver to customize resource resolution when parsing schemas.
* <p>
* SchemaFactory uses a LSResourceResolver when it needs to locate external resources
* while parsing schemas, although exactly what constitutes "locating external resources"
* is up to each schema language.
* </p>
* <p>
* For example, for W3C XML Schema, this includes files <include>d or <import>ed,
* and DTD referenced from schema files, etc.
*</p>
*
*/
class SchemaResourceResolver implements LSResourceResolver {
private static final Logger log = Logger.getLogger(SchemaResourceResolver.class);
/**
*
* Allow the application to resolve external resources.
*
* <p>
* The LSParser will call this method before opening any external resource, including
* the external DTD subset, external entities referenced within the DTD, and external
* entities referenced within the document element (however, the top-level document
* entity is not passed to this method). The application may then request that the
* LSParser resolve the external resource itself, that it use an alternative URI,
* or that it use an entirely different input source.
* </p>
*
* <p>
* Application writers can use this method to redirect external system identifiers to
* secure and/or local URI, to look up public identifiers in a catalogue, or to read
* an entity from a database or other input source (including, for example, a dialog box).
* </p>
*/
public LSInput resolveResource(String type, String namespaceURI, String publicId, String systemId, String baseURI) {
log.info("/n>> Resolving " + "/n"
+ "TYPE: " + type + "/n"
+ "NAMESPACE_URI: " + namespaceURI + "/n"
+ "PUBLIC_ID: " + publicId + "/n"
+ "SYSTEM_ID: " + systemId + "/n"
+ "BASE_URI: " + baseURI + "/n");
String schemaLocation = baseURI.substring(0, baseURI.lastIndexOf("/") + 1);
if(systemId.indexOf("http://") < 0) {
systemId = schemaLocation + systemId;
}
LSInput lsInput = new LSInputImpl();
URI uri = null;
try {
uri = new URI(systemId);
} catch (URISyntaxException e) {
e.printStackTrace();
}
File file = new File(uri);
FileInputStream is = null;
try {
is = new FileInputStream(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
lsInput.setSystemId(systemId);
lsInput.setByteStream(is);
return lsInput;
}
/**
*
* Represents an input source for data
*
*/
class LSInputImpl implements LSInput {
private String publicId;
private String systemId;
private String baseURI;
private InputStream byteStream;
private Reader charStream;
private String stringData;
private String encoding;
private boolean certifiedText;
public LSInputImpl() {}
public LSInputImpl(String publicId, String systemId, InputStream byteStream) {
this.publicId = publicId;
this.systemId = systemId;
this.byteStream = byteStream;
}
public String getBaseURI() {
return baseURI;
}
public InputStream getByteStream() {
return byteStream;
}
public boolean getCertifiedText() {
return certifiedText;
}
public Reader getCharacterStream() {
return charStream;
}
public String getEncoding() {
return encoding;
}
public String getPublicId() {
return publicId;
}
public String getStringData() {
return stringData;
}
public String getSystemId() {
return systemId;
}
public void setBaseURI(String baseURI) {
this.baseURI = baseURI;
}
public void setByteStream(InputStream byteStream) {
this.byteStream = byteStream;
}
public void setCertifiedText(boolean certifiedText) {
this.certifiedText = certifiedText;
}
public void setCharacterStream(Reader characterStream) {
this.charStream = characterStream;
}
public void setEncoding(String encoding) {
this.encoding = encoding;
}
public void setPublicId(String publicId) {
this.publicId = publicId;
}
public void setStringData(String stringData) {
this.stringData = stringData;
}
public void setSystemId(String systemId) {
this.systemId = systemId;
}
}
}
最后要做的事情就是创建一个validator去封装XML验证的逻辑代码, 如下:
- package com.javaeye.terrencexu.jaxb;
- import java.io.File;
- import java.io.IOException;
- import java.io.InputStream;
- import java.io.StringWriter;
- import java.util.List;
- import javax.xml.parsers.DocumentBuilder;
- import javax.xml.parsers.DocumentBuilderFactory;
- import javax.xml.parsers.ParserConfigurationException;
- import javax.xml.transform.Source;
- import javax.xml.transform.dom.DOMSource;
- import javax.xml.transform.stream.StreamSource;
- import javax.xml.validation.Schema;
- import javax.xml.validation.SchemaFactory;
- import javax.xml.validation.Validator;
- import org.apache.log4j.Logger;
- import org.xml.sax.SAXException;
- public final class XMLParser {
- private static final Logger log = Logger.getLogger(XMLParser. class );
- private XMLParser() {}
- public static boolean validateWithSingleSchema(File xml, File xsd) {
- boolean legal = false ;
- try {
- SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
- Schema schema = sf.newSchema(xsd);
- Validator validator = schema.newValidator();
- validator.validate(new StreamSource(xml));
- legal = true ;
- } catch (Exception e) {
- legal = false ;
- log.error(e.getMessage());
- }
- return legal;
- }
- public static boolean validateWithMultiSchemas(InputStream xml, List<File> schemas) {
- boolean legal = false ;
- try {
- Schema schema = createSchema(schemas);
- Validator validator = schema.newValidator();
- validator.validate(new StreamSource(xml));
- legal = true ;
- } catch (Exception e) {
- legal = false ;
- log.error(e.getMessage());
- }
- return legal;
- }
- /**
- * Create Schema object from the schemas file.
- *
- * @param schemas
- * @return
- * @throws ParserConfigurationException
- * @throws SAXException
- * @throws IOException
- */
- private static Schema createSchema(List<File> schemas) throws ParserConfigurationException, SAXException, IOException {
- SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
- SchemaResourceResolver resourceResolver = new SchemaResourceResolver();
- sf.setResourceResolver(resourceResolver);
- Source[] sources = new Source[schemas.size()];
- DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
- docFactory.setValidating(false );
- docFactory.setNamespaceAware(true );
- DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
- for ( int i = 0 ; i < schemas.size(); i ++) {
- org.w3c.dom.Document doc = docBuilder.parse(schemas.get(i));
- DOMSource stream = new DOMSource(doc, schemas.get(i).getAbsolutePath());
- sources[i] = stream;
- }
- return sf.newSchema(sources);
- }
- }
package com.javaeye.terrencexu.jaxb;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.StringWriter;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Source;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.apache.log4j.Logger;
import org.xml.sax.SAXException;
public final class XMLParser {
private static final Logger log = Logger.getLogger(XMLParser.class);
private XMLParser() {}
public static boolean validateWithSingleSchema(File xml, File xsd) {
boolean legal = false;
try {
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = sf.newSchema(xsd);
Validator validator = schema.newValidator();
validator.validate(new StreamSource(xml));
legal = true;
} catch (Exception e) {
legal = false;
log.error(e.getMessage());
}
return legal;
}
public static boolean validateWithMultiSchemas(InputStream xml, List<File> schemas) {
boolean legal = false;
try {
Schema schema = createSchema(schemas);
Validator validator = schema.newValidator();
validator.validate(new StreamSource(xml));
legal = true;
} catch(Exception e) {
legal = false;
log.error(e.getMessage());
}
return legal;
}
/**
* Create Schema object from the schemas file.
*
* @param schemas
* @return
* @throws ParserConfigurationException
* @throws SAXException
* @throws IOException
*/
private static Schema createSchema(List<File> schemas) throws ParserConfigurationException, SAXException, IOException {
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
SchemaResourceResolver resourceResolver = new SchemaResourceResolver();
sf.setResourceResolver(resourceResolver);
Source[] sources = new Source[schemas.size()];
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setValidating(false);
docFactory.setNamespaceAware(true);
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
for(int i = 0; i < schemas.size(); i ++) {
org.w3c.dom.Document doc = docBuilder.parse(schemas.get(i));
DOMSource stream = new DOMSource(doc, schemas.get(i).getAbsolutePath());
sources[i] = stream;
}
return sf.newSchema(sources);
}
}
4. 运行测试
- public static void testValidate() throws SAXException, FileNotFoundException {
- InputStream xml = new FileInputStream( new File( "C://eclipse//workspace1//JavaStudy//test//employees.xml" ));
- List<File> schemas = new ArrayList<File>();
- schemas.add(new File( "C://eclipse//workspace1//JavaStudy//test//employees.xsd" ));
- schemas.add(new File( "C://eclipse//workspace1//JavaStudy//test//admin.xsd" ));
- XMLParser.validateWithMultiSchemas(xml, schemas);
- }
public static void testValidate() throws SAXException, FileNotFoundException {
InputStream xml = new FileInputStream(new File("C://eclipse//workspace1//JavaStudy//test//employees.xml"));
List<File> schemas = new ArrayList<File>();
schemas.add(new File("C://eclipse//workspace1//JavaStudy//test//employees.xsd"));
schemas.add(new File("C://eclipse//workspace1//JavaStudy//test//admin.xsd"));
XMLParser.validateWithMultiSchemas(xml, schemas);
}
注:如果两个schema文件在同一个目录下,那么可以只传递一个主schema文件(employees.xsd)即可, SchemaResourceResolver会帮我们加载admin.xsd