Processing XML with Java

Processing XML with Java

Processing XML with Java

Elliotte Rusty Harold


Dedication

In memory of all the victims of the attacks on September 11, 2001

Table of Contents

Preface
Who You Are
What You Need to Know What You Need to Have
How to Use This Book The Online Edition Some grammatical notes Contacting the Author
Acknowledgements 1. XML for Data
Motivating XML
A Thought Experiment Robustness Extensibility Ease of Use
XML Syntax
XML Documents XML Applications Elements and Tags Text Attributes XML Declaration Comments Processing Instructions Entities Namespaces
Validity
DTDs Schemas Schematron The Last Mile
Style sheets
CSS Associating Style Sheets with XML Documents XSL
Summary
2. XML Protocols: XML-RPC and SOAP
XML as a Message Format
Envelopes Data Representation
HTTP as a Transport Protocol
How HTTP Works HTTP in Java
RSS Customizing the Request
Query Strings How POST Works
XML-RPC
Data Structures Faults Validating XML-RPC
SOAP
A SOAP Example Posting SOAP documents Faults Encoding Styles SOAP Headers SOAP Limitations Validating SOAP
Custom Protocols Summary
3. Writing XML with Java
Fibonacci Numbers Writing XML
Better Coding Practices Attributes Producing Valid XML Namespaces
Output Streams, Writers, and Encodings A Simple XML-RPC Client A Simple SOAP Client Servlets Summary
4. Converting Flat Files to XML
The Budget The Model Input Determining the Output Format
Validation Attributes
Building Hierarchical Structures from Flat Data Alternatives to Java
Imposing Hierarchy with XSLT The XML Query Language
Relational Databases Summary
5. Reading XML
InputStreams and Readers XML Parsers
Choosing an XML API Choosing an XML Parser Available Parsers
SAX DOM JAXP JDOM dom4j ElectricXML XMLPULL Summary
6. SAX
What is SAX? Parsing Callback Interfaces
Implementing ContentHandler Using the ContentHandler The DefaultHandler Adapter Class
Receiving Documents Receiving Elements Handling Attributes Receiving Characters Receiving Processing Instructions Receiving Namespace Mappings Ignorable White Space Receiving Skipped Entities Receiving Locators What the ContentHandler Doesn’t Tell You Summary
7. The XMLReader Interface
Building Parser Objects Input
InputSource EntityResolver
Exceptions and Errors
SAXExceptions The ErrorHandler interface
Features and Properties
Getting and Setting Features Getting and Setting Properties Required Features Standard Features Standard Properties Xerces Custom Features Xerces Custom Properties
DTDHandler Summary
8. SAX Filters
The Filter Architecture The XMLFilter interface Content Filters
Filtering Tags Filtering Elements Filtering attributes Filters that add content Filters vs. Transforms
The XMLFilterImpl Class Parsing non-XML Documents Multihandler adapters Summary
9. The Document Object Model
The Evolution of DOM DOM Modules Application Specific DOMs Trees
Document nodes Element nodes Attribute nodes Leaf nodes Non-tree nodes What is and isn’t in the tree
DOM Parsers for Java Parsing documents with a DOM Parser
JAXP DocumentBuilder and DocumentBuilderFactory DOM3 Load and Save
The Node Interface
Node Types Node Properties Navigating the tree Modifying the tree Utility Methods
The NodeList interface JAXP Serialization DOMException Choosing between SAX and DOM Summary
10. Creating XML Documents with DOM
DOMImplementation Locating a DOMImplementation
Implementation Specific Class JAXP DocumentBuilder DOM3 DOMImplementationRegistry
The Document Interface as an Abstract Factory The Document Interface as a Node Type
Getter methods Finding elements Transferring nodes between documents
Normalization Summary
11. The Document Object Model Core
The Element Interface
Extracting Elements Attributes
The NamedNodeMap Interface The CharacterData interface The Text Interface The CDATASection Interface The EntityReference Interface The Attr Interface The ProcessingInstruction Interface The Comment Interface The DocumentType Interface The Entity Interface The Notation Interface Summary
12. The DOM Traversal Module
NodeIterator
Constructing NodeIterators with DocumentTraversal Liveness Filtering by Node Type
NodeFilter TreeWalker Summary
13. Output from DOM
Xerces Serialization OutputFormat DOM Level 3
Creating DOMWriters Serialization Features Filtering Output
Summary
14. JDOM
What is JDOM? Creating XML Elements with JDOM Creating XML Documents with JDOM Writing XML Documents with JDOM Document Type Declarations Namespaces Reading XML Documents with JDOM Navigating JDOM Trees Talking to DOM Programs Talking to SAX Programs
Configuring SAXBuilder SAXOutputter
Java Integration
Serializing JDOM Objects Synchronizing JDOM Objects Testing Equality Hash codes String representations Cloning
What JDOM doesn’t do Summary
15. The JDOM Model
The Document Class The Element Class
Constructors Navigation and Search Attributes
The Attribute Class The Text Class The CDATA Class The ProcessingInstruction Class The Comment Class Namespaces The DocType class The EntityRef Class Summary
16. XPath
Queries The XPath Data Model Location Paths
Axes Node tests Predicates Compound Location Paths Absolute Location Paths Abbreviated Location paths Combining location paths
Expressions
Literals Operators Functions
XPath Engines
XPath with Saxon XPath with Xalan
DOM Level 3 XPath
Namespace Bindings Snapshots Compiled Expressions
Jaxen Summary
17. XSLT
XSL Transformations
Template Rules Stylesheets Taking the Value of a Node Applying Templates The Default Template Rules Selection Calling Templates by Name
TrAX
Thread Safety Locating Transformers The xml-stylesheet processing instruction Features XSLT Processor Attributes URI Resolution Error Handling Passing Parameters to Style Sheets Output Properties Sources and Results
Extending XSLT with Java
Extension Functions Extension Elements
Summary
A. XML APIs Quick Reference
SAX
org.xml.sax org.xml.sax.ext org.xml.sax.helpers
DOM
The DOM Data Model org.w3c.dom org.w3c.dom.traversal
JAXP
javax.xml.parsers
TrAX
javax.xml.transform javax.xml.transform.stream javax.xml.transform.dom javax.xml.transform.sax
JDOM Quick Reference
org.jdom org.jdom.filter org.jdom.input org.jdom.output org.jdom.transform org.jdom.xpath
XMLPULL
org.xmlpull.v1
B. SOAP 1.1 Schemas
The SOAP 1.1 Envelope Schema The SOAP 1.1 Encoding Schema W3C® SOFTWARE NOTICE AND LICENSE
Recommended Reading Index

List of Examples

1.1. A plain text document indicating an order for 12 Birdsong Clocks, SKU 244 1.2. An XML document indicating an order for 12 Birdsong Clocks, SKU 244 1.3. A document indicating an order for 12 Birdsong Clocks, SKU 244? 1.4. Still an order for 12 Birdsong Clocks, SKU 244 1.5. An XML document indicating an order for multiple products shipped to multiple addresses 1.6. An XML document that uses a default namespace 1.7. An XML document that uses two default namespaces 1.8. A DTD for order documents 1.9. order.xsd: a schema for order documents 1.10. order.sct: a Schematron schema for order documents 1.11. A CSS stylesheet for order documents 1.12. An XSLT stylesheet for order documents 1.13. An XSL-FO document for the clock order 2.1. An XML document that labels elements with schema simple types 2.2. URLGrabber 2.3. URLGrabberTest 2.4. An RSS 0.91 document 2.5. An RSS 1.0 document 2.6. An XML-RPC request document 2.7. POSTing an XML-RPC request document 2.8. An XML-RPC response 2.9. An XML-RPC request that passes an array as an argument 2.10. An XML-RPC response document that returns an array 2.11. An XML-RPC Request that passes a struct as an argument 2.12. An XML-RPC fault 2.13. A DTD for XML-RPC 2.14. A Schema for XML-RPC 2.15. A SOAP document requesting the current stock price of Red Hat 2.16. A SOAP Response 2.17. A SOAP document requesting the current stock price of Red Hat 2.18. A SOAP document returning the current stock price of Red Hat 2.19. A SOAP fault response 2.20. A SOAP document that specifies the encoding style 2.21. A schema that assigns type to elements in the http://namespaces.cafeconleche.org/xmljava/ch2/ namespace 2.22. A SOAP Request with a digital signature in the header 2.23. A SOAP Request with two header entries 2.24. A SOAP Request with a mustUnderstand attribute 2.25. A Master Schema for SOAP Trading documents 3.1. A program that calculates the Fibonacci numbers 3.2. The first 10 Fibonacci numbers in an XML document 3.3. A program that outputs the Fibonacci numbers as an XML document 3.4. Using named constants for element names 3.5. A Java program that writes an XML document that uses attributes 3.6. A Java program that generates a valid document 3.7. A MathML document containing Fibonacci numbers 3.8. A Java program that generates a MathML document 3.9. A Java program that writes an XML file 3.10. Connecting an XML-RPC server with URLConnection 3.11. Connecting to a SOAP server with URLConnection 3.12. A servlet that generates XML 4.1. A class that parses comma separated values into a List of HashMaps 4.2. Naively reproducing the original table structure in XML 4.3. A schema for the XML budget data 4.4. Converting to XML with attributes 4.5. A hierarchical arrangement of the budget data 4.6. The Budget class 4.7. The Agency class 4.8. The Bureau Class 4.9. An Account Class 4.10. The Subfunction Class 4.11. The driver class that builds the data structure and writes it out again 4.12. An XSLT stylesheet that converts flat XML data to hierarchical XML data 4.13. An XQuery that converts flat data to hierarchical data 4.14. A program that connects to a relational database using JDBC and converts the table to hierarchical XML 5.1. A response from the Fibonacci XML-RPC server 5.2. Reading an XML-RPC Response 5.3. A SAX based client for the Fibonacci XML-RPC server 5.4. The ContentHandler for the SAX client for the Fibonacci XML-RPC server 5.5. A DOM based client for the Fibonacci XML-RPC server 5.6. A JAXP based client for the Fibonacci XML-RPC server 5.7. A JDOM based client for the Fibonacci XML-RPC server 5.8. A dom4j based client for the Fibonacci XML-RPC server 5.9. An ElectricXML based client for the Fibonacci XML-RPC server 5.10. An XMLPULL based client for the Fibonacci XML-RPC server 6.1. A SAX program that parses a document 6.2. The SAX ContentHandler interface 6.3. A SAX ContentHandler that writes all #PCDATA onto a java.io.Writer 6.4. The driver method for the text extractor program 6.5. A subclass of DefaultHandler that writes all #PCDATA onto a java.io.Writer 6.6. A ContentHandler interface that resets its data structures between documents 6.7. A ContentHandler class that builds a GUI representation of an XML document 6.8. The SAX Attributes interface 6.9. A ContentHandler class that spiders XLinks 6.10. A SAX client for the Fibonacci XML-RPC server 6.11. A ContentHandler that prints processing instruction targets and data on System.out 6.12. The NamespaceSupport class 6.13. A document that uses ignorable white space to prettify the XML 6.14. An XML document containing a potentially skipped entity reference 6.15. The SAX Locator interface 6.16. Determining the locations of events 7.1. The SAX InputSource class 7.2. The EntityResolver interface 7.3. An XHTML EntityResolver 7.4. The SAXException class 7.5. The SAXParseException class 7.6. A SAX program that parses a document and identifies the line numbers of any well-formedness errors 7.7. The ErrorHandler interface 7.8. A SAX program that reports all problems found in an XML document 7.9. A SAX program that validates documents 7.10. A SAX program that echoes the parsed document 7.11. The LexicalHandler interface 7.12. An implementation of the LexicalHandler interface 7.13. The DeclHandler interface 7.14. A program that prints out a complete DTD 7.15. Making maximal use of Xerces’s special abilities 7.16. The DTDHandler interface 7.17. A caching DTDHandler 7.18. A Notation utility class 7.19. An UnparsedEntity utility class 7.20. A program that lists the unparsed entities and notations used in an XML document 8.1. The XMLFilter interface 8.2. A filter that blocks all events 8.3. A filter that filters nothing 8.4. A filter that times all parsing 8.5. Parsing a document through a filter 8.6. A ContentHandler filter 8.7. A filter that substitutes its own ContentHandler 8.8. A program that filters documents 8.9. A ContentHandler filter that throws away non-XHTML elements 8.10. The AttributesImpl helper class 8.11. Changing one element into another 8.12. A subclass of XMLFilterImpl 8.13. Accessing databases through SAX 8.14. A very simple user interface for extracting XML data from a relational database 8.15. Attaching multiple handlers of the same type to a single parser 9.1. Which modules does Oracle support? 9.2. An XML-RPC request document 9.3. A program that uses Xerces to check documents for well-formedness 9.4. A program that uses the Oracle XML parser to check documents for well-formedness 9.5. A program that uses JAXP to check documents for well-formedness 9.6. Using JAXP to check documents for well-formedness 9.7. A program that uses DOM3 to check documents for well-formedness 9.8. The Node interface 9.9. Changing short type constants to strings 9.10. A class to inspect the properties of a node 9.11. Walking the tree with the Node interface 9.12. A method that changes a document by reordering nodes 9.13. The NodeList interface 9.14. Using JAXP to both read and write an XML document 9.15. The DOMException class 10.1. The DOMImplementation interface 10.2. The DOMImplementationRegistry class 10.3. The DOMImplementationSource interface 10.4. The Document interface 10.5. Building an SVG document in memory using DOM 10.6. A DOM program that outputs the Fibonacci numbers as an XML document 10.7. A valid MathML document containing Fibonacci numbers 10.8. A DOM program that outputs the Fibonacci numbers as a MathML document 10.9. A valid MathML document using prefixed names 10.10. The properties of a Document object 10.11. An XML-RPC request document 10.12. An XML-RPC response document 10.13. A DOM based XML-RPC servlet 10.14. A DOM based SOAP servlet 11.1. The Element interface 11.2. Extracting examples from DocBook 11.3. A document that uses attributes 11.4. A DOM program that adds attributes 11.5. The NamedNodeMap interface 11.6. An XLink spider that uses DOM 11.7. The CharacterData interface 11.8. ROT13 encoder for XML documents 11.9. The Text interface 11.10. Printing the text nodes in an XML document 11.11. The CDATASection interface 11.12. Merging CDATA sections with text nodes 11.13. The EntityReference interface 11.14. Inserting entity references into a document 11.15. The Attr interface 11.16. Specifying all attributes 11.17. The ProcessingInstruction interface 11.18. Reading PseudoAttributes from a ProcessingInstruction 11.19. The Comment interface 11.20. Printing comments 11.21. The DocumentType interface 11.22. The Entity interface 11.23. Listing parsed entities used in the document 11.24. The Notation interface 11.25. Listing the Notations declared in a DTD 12.1. The NodeIterator interface 12.2. The DocumentTraversal factory interface 12.3. Using a NodeIterator to extract all the comments from a document 12.4. Using a NodeIterator to retrieve the complete text content of an element 12.5. The NodeFilter interface 12.6. An implementation of the NodeFilter interface 12.7. The TreeWalker interface 12.8. The ExampleFilter class 12.9. Navigating a sub-tree with TreeWalker 13.1. Using Xerces’ OutputFormat class to pretty print XML 13.2. Using Xerces’ OutputFormat class to pretty print MathML 13.3. The DOM3 DOMWriter interface 13.4. The DOM3 DOMErrorHandler interface 13.5. Serializing with DOMWriter 13.6. The DOM3 DOMImplementationLS interface 13.7. An implementation independent DOM3 program to build and serialize an XML document 13.8. The DOMWriterFilter interface 13.9. Filtering everything that isn’t XHTML on output 13.10. Using a DOMWriterFilter 14.1. A JDOM program that produces an XML document containing Fibonacci numbers 14.2. A Fibonacci DTD 14.3. A JDOM program that produces an XML document containing Fibonacci numbers 14.4. A MathML document containing the first three Fibonacci numbers 14.5. A JDOM program that uses namespaces 14.6. A JDOM program that uses the default namespace 14.7. A JDOM program that checks XML documents for well-formedness 14.8. A JDOM program that validates XML documents 14.9. A JDOM program that lists the elements used in a document 14.10. A JDOM program that lists the nodes used in a document 14.11. A JDOM program that schema validates documents 14.12. A JDOM program that passes documents to a SAX ContentHandler 15.1. The JDOM Document class 15.2. Inspecting elements 15.3. An XML-RPC request document 15.4. The JDOM Filter interface 15.5. The ContentFilter class 15.6. The ElementFilter class 15.7. A filter for xml-stylesheet processing instructions in the prolog 15.8. Moving elements between documents 15.9. Searching for RDDL resources 15.10. The JDOM Attribute class 15.11. The JDOM Text class 15.12. JDOM based ROT13 encoder for XML documents 15.13. The JDOM CDATA class 15.14. The JDOM ProcessingInstruction class 15.15. The JDOM Comment class 15.16. Printing comments 15.17. The JDOM Namespace class 15.18. An XML document that uses namespace prefixes in attribute values 15.19. The JDOM DocType class 15.20. Validating XHTML with the DocType class 15.21. The JDOM EntityRef class 16.1. Weather data in XML 16.2. A SOAP response document 16.3. An XML-RPC request document 16.4. A SOAP request document 16.5. The Xalan XPathAPI class 16.6. The XPathEvaluator interface 16.7. The XPathResult interface 16.8. An XML document containing namespace bindings and an XPath search expression 16.9. The DOM3 XPathExpression interface 17.1. An XSLT stylesheet for XML-RPC request documents 17.2. An XSLT stylesheet that echoes XML-RPC requests 17.3. An XML-RPC request document 17.4. An XML-RPC response document 17.5. An XSLT stylesheet that calculates Fibonacci numbers 17.6. A servlet that uses TrAX and XSLT to respond to XML-RPC requests 17.7. Testing the availability of TrAX features 17.8. The TrAX URIResolver interface 17.9. A URIResolver class 17.10. The TrAX ErrorListener interface 17.11. An ErrorListener that uses the Logging API 17.12. The TrAX OutputKeys class 17.13. The TrAX DOMSource class 17.14. The TrAX DOMResult class 17.15. The TrAX SAXSource class 17.16. The TrAX SAXResult class 17.17. The TrAX StreamSource class 17.18. The TrAX StreamResult class 17.19. A Java class that calculates Fibonacci numbers 17.20. The Xalan ExpressionContext interface 17.21. A stylesheet that uses an extension element

Copyright 2001, 2002 Elliotte Rusty Haroldelharo@metalab.unc.eduLast Modified September 25, 2002
 Up To Cafe con Leche 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值