In this section, we explain the various syntactical rules of XML. Documents that follow these rules are called well-formed, but not necessarily valid, as we'll see. If your document breaks any of these rules, it will be rejected by most, if not all, XML parsers.
Well-Formedness
The minimal requirement for an XML document is that it be well-formed, meaning that it adheres to a small number of syntax rules, which are summarized in Table 3-1 and explained in the following sections. However, a document can abide by all these rules and still be invalid. To be valid, a document must both be well-formed and adhere to the constraints imposed by a DTD or XML Schema.
- Table 3.1 XML Syntax Rules (Well-Formedness Constraints)
- The document must have a consistent, well-defined structure.
- All attribute values must be quoted (single or double quotes).
- White space in content, including line breaks, is significant.
- All start tags must have corresponding end tags (exception: empty elements).
- The root element must contain all others, which must nest properly by start/end tag pairing.
- Elements must not overlap; they may be nested, however. (This is also technically true for HTML. Browsers ignore overlapping in HTML, but not in XML.)
- Each element except the root element must have exactly one parent element that contains it.
- Element and attribute names are case-sensitive: Price and PRICE are different elements.
- Keywords such as DOCTYPE and ENTITY must always appear in uppercase; similarly for other DTD keywords such as ELEMENT and ATTLIST.
- Tags without content are called empty elements and must end in "/>".