How to Validate XML using Java

Configure Java APIs (SAX, DOM, dom4j, XOM) using JAXP 1.3 to validate XML Documents with DTD and Schema(s).

Many Java XML APIs provide mechanisms to validate XML documents, the JAXP API can be used for most of these XML APIs but subtle configuration differences exists. This article shows five ways of how to configure different Java APIs (including DOM, SAX, dom4j and XOM) using JAXP 1.3 for checking and validating XML with DTD and Schema(s).

Setup

All underlying examples can be compiled and executed using Java 5.0 (JAXP 1.3) or higher and make use of the following components and settings.

Error Handler

To report errors, it is necessary to provide an ErrorHandler to the underlying implementation. The ErrorHandler used for the examples is a very simple one which reports the error to System.out and continues until the XML document has been fully parsed or until a fatal-error has been reported.

public class SimpleErrorHandler implements ErrorHandler {
    public void warning(SAXParseException e) throws SAXException {
        System.out.println(e.getMessage());
    }

    public void error(SAXParseException e) throws SAXException {
        System.out.println(e.getMessage());
    }

    public void fatalError(SAXParseException e) throws SAXException {
        System.out.println(e.getMessage());
    }
}

Namespace Aware

Namespaces have been introduced to XML after the first specification of XML had received the official W3C Recommendation status. This is the reason why (most of the) XML parser implementations do not support XML Namespaces by default, to handle the validation of XML documents with namespaces correctly it is therefore necessary to configure the underlying parsers to provide support for XML Namespaces.

Input Document

The input document ("contacts.xml") that has been used for all the code examples is shown below.

<!DOCTYPE contacts SYSTEM "contacts.dtd">

<contacts xsi:noNamespaceSchemaLocation="contacts.xsd" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
  <contact title="citizen"> 
    <firstname>Edwin</firstname> 
    <lastname>Dankert</lastname>
  </contact> 
</contacts>

XML Schema

The XML Schema ("contacts.xsd") as defined below has been used in the code examples to validate the input document. The input document contains an extra attribute which has not been defined in the XML Schema, this shows that the XML Schema has been used for the validation. When using this XML Schema to validate the input XML document, the following error gets reported:

cvc-complex-type.3.2.2: Attribute 'title' is not allowed to appear in element 'contact'.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="contacts">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="contact"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="contact">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="firstname" type="xs:NCName"/>
        <xs:element name="lastname" type="xs:NCName"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

DTD

The DTD ("contacts.dtd") as defined below has been used in the code examples to validate the input document. To highlight that the DTD has been used for the validation, the title attribute in the input document has a value which is not allowed according to this DTD. When using this DTD to validate the input XML document, the following error gets reported:

Attribute "title" with value "citizen" must have a value from the list "MR MS MRS ".

<!ELEMENT contacts (contact*)>
<!ATTLIST contacts xsi:noNamespaceSchemaLocation CDATA #IMPLIED>
<!ATTLIST contacts xmlns:xsi CDATA #IMPLIED>

<!ELEMENT contact (firstname,lastname)>
<!ATTLIST contact title (MR|MS|MRS) "MS">

<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>

Checking Wellformed-ness

Before a document can be called XML and not csv, simple text or any other format, it needs to support the basic rules as defined by the XML Recommendation, when it adheres to these rules it is said to be   Wellformed XML.

Code Fragments: DOMSAXdom4jXOM

DOM

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);

DocumentBuilder builder = factory.newDocumentBuilder();

builder.setErrorHandler(new SimpleErrorHandler());

Document document = builder.parse(new InputSource("document.xml"));

SAX

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));

dom4j

SAXReader reader = new SAXReader();
reader.setValidation(false);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");

XOM

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());

Builder builder = new Builder(reader);
builder.build("contacts.xml");

Validate using internal DTD

Parse the input document using only the DTD (contacts.dtd), as defined by the DOCTYPE in the input document, for validation.

Code Fragments: DOMSAXdom4jXOM

DOM

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

DocumentBuilder builder = factory.newDocumentBuilder();

builder.setErrorHandler(new SimpleErrorHandler());

Document document = builder.parse(new InputSource("document.xml"));

SAX

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));

dom4j

SAXReader reader = new SAXReader();
reader.setValidation(true);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");

XOM

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());

Builder builder = new Builder(reader);
builder.build("contacts.xml");

Validate using internal XSD

Parse the input document using only the XML Schema (contacts.xsd), as defined by the noNamespaceSchemaLocation attribute in the input document, for validation.

Code Fragments: DOMSAXdom4jXOM

DOM

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);
factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage", 
      "http://www.w3.org/2001/XMLSchema");

DocumentBuilder builder = factory.newDocumentBuilder();

builder.setErrorHandler(new SimpleErrorHandler());

Document document = builder.parse(new InputSource("document.xml"));

SAX

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();
parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", 
      "http://www.w3.org/2001/XMLSchema");

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));

dom4j

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);

SAXParser parser = factory.newSAXParser();
parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", 
      "http://www.w3.org/2001/XMLSchema");

SAXReader reader = new SAXReader(parser.getXMLReader());
reader.setValidation(true);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");

XOM

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SAXParser parser = factory.newSAXParser();
parser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", 
      "http://www.w3.org/2001/XMLSchema");

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());

Builder builder = new Builder(reader);
builder.build("contacts.xml");

Validate using external Schema

Parse the input document using the schema (contacts.xsd), as defined externally by the source-code.

Code Fragments: DOMSAXdom4jXOM

DOM

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);

SchemaFactory schemaFactory = 
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

DocumentBuilder builder = factory.newDocumentBuilder();

builder.setErrorHandler(new SimpleErrorHandler());

Document document = builder.parse(new InputSource("document.xml"));

SAX

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);

SchemaFactory schemaFactory = 
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));

dom4j

SAXParserFactory factory = SAXParserFactory.newInstance();

SchemaFactory schemaFactory = 
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

SAXParser parser = factory.newSAXParser();

SAXReader reader = new SAXReader(parser.getXMLReader());
reader.setValidation(false);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");

XOM

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);

SchemaFactory schemaFactory = 
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());

Builder builder = new Builder(reader);
builder.build("contacts.xml");

Validate using internal DTD and external Schema

Parse the input document using the schema (contacts.xsd), as defined externally by the source-code and the DTD (contacts.dtd), as defined by the DOCTYPE in the input document, for validation.

Code Fragments: DOMSAXdom4jXOM

DOM

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SchemaFactory schemaFactory = 
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

DocumentBuilder builder = factory.newDocumentBuilder();

builder.setErrorHandler(new SimpleErrorHandler());

Document document = builder.parse(new InputSource("document.xml"));

SAX

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SchemaFactory schemaFactory = 
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

SAXParser parser = factory.newSAXParser();

XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());
reader.parse(new InputSource("document.xml"));

dom4j

SAXParserFactory factory = SAXParserFactory.newInstance();

SchemaFactory schemaFactory = 
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");

factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

SAXParser parser = factory.newSAXParser();

SAXReader reader = new SAXReader(parser.getXMLReader());
reader.setValidation(true);
reader.setErrorHandler(new SimpleErrorHandler());
reader.read("contacts.xml");

XOM

SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
factory.setNamespaceAware(true);

SchemaFactory schemaFactory = 
    SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
factory.setSchema(schemaFactory.newSchema(
    new Source[] {new StreamSource("contacts.xsd")}));

SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new SimpleErrorHandler());

Builder builder = new Builder(reader);
builder.build("contacts.xml");

Conclusion

Several mechanisms and XML APIs can be used to parse and validate XML, by using JAXP 1.3 the mechanism can mostly stay the same for these different APIs.

Sample Code

Download any of the archives to get the full source-code for the examples above.

The archives consist of the ./contacts.xml input XML file, ./contacts.xsd the XML Schema document, ./contacts.dtdthe DTD document and the source-code for the fragments above, located in the ./src directory.

The archive also contains a number XML Hammer validation projects included in the ./xmlhammer-projects directory. To be able to execute these XML Hammer projects, you will need to have the XML Hammer application installed. This can be downloaded from:

http://www.xmlhammer.org/downloads.html.

1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。
1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值