Integration with the XML Data Type

Integration with the XML Data Type

With the introduction of the XML data type, we wanted to also give FOR XML the ability to generate an instance of XML directly (more precisely, it generates a single row, single column rowset where the cell contains the XML data type instance).

Because of the backwards-compatibility considerations outlined above, we added a new TYPE directive to generate the result as XML. For example,

SELECT * FROM Customers FOR XML AUTO, TYPE

returns the Customers elements as an XML data type instance, instead of the nvarchar(max) instance that would have been the case without the TYPE directive.

This result is guaranteed to conform to the well-formedness constraints provided by the XML data type. Since the result is an XML data type instance, you can also use XQuery expressions to query and reshape the result. For example, the following expression retrieves the Customer contact name into a new Person element.

SELECT (SELECT * FROM Customers FOR XML AUTO, TYPE).query(
'<doc>{
   for $c in /Customers
   return 
     <Person name="{data($c/@ContactName)}"/>
 }</doc>')

returns (only first elements shown),

<doc>
  <Person name="Maria Anders" />
  <Person name="Ana Trujillo" />
  <Person name="Antonio Moreno" />
  ...
</doc>

Assigning FOR XML Results

Since FOR XML queries now return assignable values, the result of a FOR XML query can be assigned to a variable, or inserted into a column.

DECLARE @cust XML;
SET @cust = (SELECT * FROM Customers FOR XML AUTO, TYPE)
CREATE TABLE T(i int, x XML)
INSERT INTO T SELECT 1, (SELECT * FROM Customers FOR XML AUTO, TYPE)

Nesting of FOR XML Expressions

FOR XML, in SQL Server 2005, recognizes XML data type columns, and will inline them as sub-elements. Thus, we can nest FOR XML queries to generate hierarchies, instead of having to rely on the AUTO mode heuristic, or writing an EXPLICIT mode query.

Let's look at the example. The following FOR XML EXPLICIT query is returning Customer elements, containing their orders and the employees that work on their orders. For simplification, we are only returning a single property per element.

SELECT 1 as TAG,
       NULL as Parent,
       CustomerID as "Customer!1!CustomerID",
       NULL as "Order!2!OrderID",
       NULL as "Employee!3!LastName"
FROM Customers
UNION ALL
SELECT 2,
       1,
       Customers.CustomerID,
       Orders.OrderID,
       NULL
FROM Orders 
JOIN Customers ON Orders.CustomerID = Customers.CustomerID
UNION ALL
SELECT DISTINCT 3,
       1,
       Customers.CustomerID,
       NULL,
       Employees.LastName
FROM Customers 
JOIN Orders ON Customers.CustomerID = Orders.CustomerID
JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
ORDER BY "Customer!1!CustomerID","Employee!3!LastName","Order!2!OrderID"
FOR XML EXPLICIT

returns (only first customer shown),

<Customer CustomerID="ALFKI">
  <Order OrderID="10643" />
  <Order OrderID="10692" />
  <Order OrderID="10702" />
  <Order OrderID="10835" />
  <Order OrderID="10952" />
  <Order OrderID="11011" />
  <Employee LastName="Davolio" />
  <Employee LastName="Leverling" />
  <Employee LastName="Peacock" />
  <Employee LastName="Suyama" />
</Customer>
...

As you can see, we need a select statement for each element. We also repeat the parent's identification with every child, so that the order by groups the children with their parents. The streaming serialization from the rowset to XML depends on this grouping to get the correct nesting.

Now let's look at how we can rewrite this by using nesting FOR XML expressions. We can leverage the new TYPE directive to generate XML data type instances (otherwise, you will get a textual result that will be entitized if it is embedded in another FOR XML query) and nest sub selections to define the hierarchy.

We use a separate FOR XML query for each of the three "entity" elements, and nest them to express the hierarchy. We now rewrite the previous EXPLICIT mode query using the AUTO mode and nesting:

SELECT CustomerID as "CustomerID",
      (SELECT OrderID as "OrderID"
       FROM Orders "Order"
       WHERE "Order".CustomerID = Customer.CustomerID
       FOR XML AUTO, TYPE),
      (SELECT DISTINCT LastName as "LastName"
       FROM Employees Employee
       JOIN Orders "Order" ON "Order".EmployeeID = Employee.EmployeeID
       WHERE Customer.CustomerID = "Order".CustomerID
       FOR XML AUTO, TYPE)
FROM Customers Customer
FOR XML AUTO, TYPE

This returns, except for guaranteeing the element order, the same result as the EXPLICIT mode query (if you care about the order, you can add an order by statement).

Obviously, this query is easier to write, understand, and maintain, even though it contains basically the same number of select statements and join conditions.

The New PATH Mode

In the above section, we looked at how we can utilize some of the FOR XML capabilities to rewrite a simple FOR XML EXPLICIT query into a simpler query that exploits the nestability of FOR XML, the AUTO mode, and the new XML data type.

One may say, however, that the previous query was too simple to reflect a real world scenario with FOR XML EXPLICIT.

For example, some of the strengths of the explicit mode are to mix attributes and elements at will, create wrappers and nested, complex properties, and even create space-separated value lists and mixed content. None of these results can be achieved by nesting FOR XML AUTO queries. So do we still have to write EXPLICIT mode queries to get these mapping capabilities?

Don't despair! A new FOR XML mode gives you the same flexibility in a much simpler way. The new PATH mode, together with the ability to nest FOR XML expressions, has become the simplest way to generate complex XML documents.

The PATH mode allows you to use an XPath-like syntax as a column name, which then is mapped into an attribute (e.g., "@a"), element (e.g., "e"), sub element structure ("e1/e2"), element content ("*"), text node ("text()"), or data value ("data()"). As with the RAW mode, the default name for the row element is row and can be overwritten with an NCName (a name without a prefix).

Let's look at some examples. First, let us give the PATH mode formulation of the above EXPLICIT mode query.

SELECT CustomerID as "@CustomerID",
           (SELECT OrderID as "@OrderID"
            FROM Orders
            WHERE Orders.CustomerID = Customers.CustomerID
            FOR XML PATH('Order'), TYPE),
          (SELECT DISTINCT LastName as "@LastName"
           FROM Employees
           JOIN Orders ON Orders.EmployeeID = Employees.EmployeeID
           WHERE Customers.CustomerID = Orders.CustomerID
           FOR XML PATH('Employee'), TYPE)
FROM Customers
FOR XML PATH('Customer')

This example is similar to the AUTO mode version and returns the same result.

Now let's look at some of the PATH mode specific capabilities. The next query takes the customer information and groups the address and contact information into separate sub elements using a more complex path expression as column alias, and—using the new ROOT directive—adds a root node around it for good measure.

SELECT CustomerID as "@CustomerID", 
           CompanyName,
           Address as "address/street",
           City as "address/city",
           Region as "address/region",
           PostalCode as "address/zip",
           Country as "address/country",
           ContactName as "contact/name",
           ContactTitle as "contact/title",
           Phone as "contact/phone", 
           Fax as "contact/fax"
FROM Customers
FOR XML PATH('Customer'), ROOT('doc')

This query results in the following document (only showing the first customer element).

<doc>
  <Customer CustomerID="ALFKI">
    <CompanyName>Alfreds Futterkiste</CompanyName>
    <address>
      <street>Obere Str. 57</street>
      <city>Berlin</city>
      <zip>12209</zip>
      <country>Germany</country>
    </address>
    <contact>
      <name>Maria Anders</name>
      <title>Sales Representative</title>
      <phone>030-0074321</phone>
      <fax>030-0076545</fax>
    </contact>
  </Customer>
  ...
</doc>

How would this query look like if it used the EXPLICIT mode? We need four select clauses—one for each non-leaf element—instead of only one selection.

SELECT top 1 
       1 as TAG,
       NULL as Parent,
       1 as "doc!1!dummy!hide",
       NULL as "Customer!2!CustomerID",
       NULL as "Customer!2!CompanyName!element",
       NULL as "address!3!street!element",
       NULL as "address!3!city!element",
       NULL as "address!3!region!element",
       NULL as "address!3!zip!element",
       NULL as "address!3!country!element",
       NULL as "contact!4!name!element",
       NULL as "contact!4!title!element",
       NULL as "contact!4!phone!element",
       NULL as "contact!4!fax!element"
FROM Customers
UNION ALL
SELECT 2, 1,
       1,
       CustomerID, CompanyName,
       NULL, NULL, NULL, NULL, NULL,
       NULL, NULL, NULL, NULL
FROM Customers
UNION ALL
SELECT 3, 2,
       1,
       CustomerID, NULL,
       Address, City, Region, PostalCode, Country,
       NULL, NULL, NULL, NULL
FROM Customers
UNION ALL
SELECT 4, 2,
       1,
       CustomerID, NULL,
       NULL, NULL, NULL, NULL, NULL,
       ContactName, ContactTitle, Phone, Fax
FROM Customers
ORDER BY "doc!1!dummy!hide","Customer!2!CustomerID"
FOR XML EXPLICIT, TYPE

Now we know why the EXPLICIT mode is sometimes called the "query from hell."

Last but not least, the following gives us an example of generating a value list, and shows the use of a text node.

SELECT CustomerID as "@ID",
      (SELECT OrderID as "data()"
       FROM Orders
       WHERE Customers.CustomerID=Orders.CustomerID
       FOR XML PATH('')
      ) as "@OrderIDs",
       CompanyName,
       ContactTitle as "ContactName/@ContactTitle",
       ContactName as "ContactName/text()",
       PostalCode as "Address/@ZIP",
       Address as "Address/Street",
       City as "Address/City"
FROM Customers
FOR XML PATH('Customer')

This creates a result of the form (showing one example customer),

<Customer ID="HUNGC" OrderIDs="10375 10394 10415 10600 10660">
  <CompanyName>Hungry Coyote Import Store</CompanyName>
  <ContactName 
     ContactTitle="Sales Representative">Yoshi Latimer</ContactName>
  <Address ZIP="97827">
    <Street>City Center Plaza 516 Main St.</Street>
    <City>Elgin</City>
  </Address>
</Customer>

Let's dissect the relevant parts of the query.

The sub query, which generates the OrderIDs attribute list, maps the OrderID column values as atomic values (using the path data()). These values will then be serialized as a text node by adding a space between sibling atomic values, which are provided in the next cell in the rowset. We then avoid generating a name for the row by using the zero-length string as the PATH mode argument, so that we get a single string (note, there is no TYPE directive!) as the result of the FOR XML PATH expression. That string gets mapped to the OrderIDs attribute by the containing FOR XML expression.

The CompanyName gets mapped to a sub element of the same name.

The ContactTitle produces the ContactTitle attribute of the ContactName element, while the ContactName column value is mapped into the text node of the same element. Note that in this case, the same result would have been achieved by mapping ContactName to the ContactName element directly.

Finally, the properties for the Address element parts are put together.

Adding XML Namespaces

XML Namespaces are becoming an increasingly important aspect of authoring XML documents for information interchange. They are used to disambiguate different vocabularies, identify ownership of a vocabulary, and to associate XML Schema information (and potentially other information) with an XML element or attribute.

FOR XML, in SQL Server 2000, puts the burden of generating and maintaining XML namespaces completely on the query writer. The XML namespace declaration attributes had to be created, like every other attribute, with the namespace URI being the column value. Unless the generated XML was in attribute-centric form, this meant that the query had to be written using the EXPLICIT mode. For example, the following query puts the resulting Customer elements and its property elements into the namespace urn:example.com/customer:

SELECT 1 as tag, NULL as parent,
   'urn:example.com/customer' as "cust:Customer!1!xmlns:cust", 
   CustomerID as "cust:Customer!1!cust:CustomerID!element",
   ContactName as "cust:Customer!1!cust:ContactName!element"
FROM Customers 
FOR XML EXPLICIT

The result of the query is (only first two elements shown):

<cust:Customer xmlns:cust="urn:example.com/customer">
  <cust:CustomerID>ALFKI</cust:CustomerID>
  <cust:ContactName>Maria Anders</cust:ContactName>
</cust:Customer>
<cust:Customer xmlns:cust="urn:example.com/customer">
  <cust:CustomerID>ANATR</cust:CustomerID>
  <cust:ContactName>Ana Trujillo</cust:ContactName>
</cust:Customer>

Namespace declarations are not really attributes in the XML data model. Thus, the PATH mode does not allow them to be specified as attributes.

In order to simplify the use of XML namespaces in FOR XML, we have added support for the WITH XMLNAMESPACES clause starting in the April CTP version of SQL Server 2005. The WITH XMLNAMESPACES clause is defined in the SQL:2003 standard as an extension to the general WITH clause that is commonly used to define common table expressions. The WITH clause can be placed on top-level SQL statements, such as SELECT, INSERT and UPDATE statements, and used inside a CREATE VIEW statement. The WITH XMLNAMESPACES clause can be used with the RAW, AUTO and PATH mode, but not with either the XMLSCHEMA and XMLDATA directives or the EXPLICIT mode.

The previous method of creating namespaces is still supported for the SQL Server 2000 modes, but cannot be mixed with the WITH XMLNAMESPACES clause. To disambiguate the WITH clause from other syntactic uses of WITH, T-SQL statements that precede WITH clauses need to be terminated with a semi-colon (;). The following query places the customer and order data into different namespaces and adds a root node in a default namespace.

WITH XMLNAMESPACES (
    DEFAULT 'urn:example.com/doc'
  , 'urn:example.com/customer' as "c"
  , 'urn:example.com/order' as "o"
)
SELECT CustomerID as "@ID",
      (SELECT OrderID as "@OrderID"
       from Orders
       where Customers.CustomerID=Orders.CustomerID
       FOR XML PATH('o:Order'), TYPE
      ) as "c:Orders",
       CompanyName as "c:CompanyName",
       ContactTitle as "c:ContactName/@ContactTitle",
       ContactName as "c:ContactName/text()",
       PostalCode as "c:Address/@ZIP",
       Address as "c:Address/c:Street",
       City as "c:Address/c:City"
FROM Customers
FOR XML PATH('c:Customer'), ROOT('doc')
 

As the following partial result shows, the XML namespace declarations are currently added at the top-level elements of every FOR XML selection:

<doc xmlns:o="urn:example.com/order" xmlns:c="urn:example.com/customer" 
     xmlns="urn:example.com/doc">
  <c:Customer ID="ALFKI">
    <c:Orders>
      <o:Order xmlns:o="urn:example.com/order"
         xmlns:c="urn:example.com/customer" xmlns="urn:example.com/doc" 
         OrderID="10643" />
      <o:Order xmlns:o="urn:example.com/order" 
         xmlns:c="urn:example.com/customer" xmlns="urn:example.com/doc" 
         OrderID="10692" />
      <o:Order xmlns:o="urn:example.com/order" 
         xmlns:c="urn:example.com/customer" xmlns="urn:example.com/doc" 
         OrderID="10702" />
      <o:Order xmlns:o="urn:example.com/order" 
         xmlns:c="urn:example.com/customer" xmlns="urn:example.com/doc" 
         OrderID="10835" />
      <o:Order xmlns:o="urn:example.com/order" 
         xmlns:c="urn:example.com/customer" xmlns="urn:example.com/doc" 
         OrderID="10952" />
      <o:Order xmlns:o="urn:example.com/order" 
         xmlns:c="urn:example.com/customer" xmlns="urn:example.com/doc" 
         OrderID="11011" />
    </c:Orders>
    <c:CompanyName>Alfreds Futterkiste</c:CompanyName>
    <c:ContactName 
       ContactTitle="Sales Representative">Maria Anders</c:ContactName>
    <c:Address ZIP="12209">
      <c:Street>Obere Str. 57</c:Street>
      <c:City>Berlin</c:City>
    </c:Address>
  </c:Customer>
...

The above query used the DEFAULT clause for adding a default namespace. Note that if there are nested XML documents with no default namespace included in the result, a slight performance penalty has to be paid to make sure that they preserve their absence of a default namespace.

Finally, the WITH XMLNAMESPACES clause can also be used to provide namespace bindings for the XQuery and XML DML methods on the XML data type.

Recursion and FOR XML

One of the strengths of the XML format is that it can easily represent hierarchies, including structurally recursive hierarchies such as parts lists. In SQL Server 2000, you could not generate such structures without knowing the maximum depth at query formulation time. Since we now can nest FOR XML expressions, we can easily generate the recursive hierarchies using user-defined functions.

For example, the following user-defined function creates a nested XML document detailing the parts list for a specific part. First, let us define some example data:

CREATE TABLE PARTS(id int, parent int, name nvarchar(500))
GO
INSERT INTO PARTS 
  SELECT 1, NULL, N'car'
  UNION
  SELECT 2, 1, N'engine'
  UNION
  SELECT 3, 1, N'body'
  UNION
  SELECT 4, 3, N'door'
  UNION
  SELECT 5, 3, N'fender'
  UNION
  SELECT 6, 4, N'window'
  UNION
  SELECT 7, 2, N'piston'

Next, we define the function that for a given part number returns the subparts in XML form:

CREATE FUNCTION PartsList(@PartsNo int)
RETURNS XML
WITH RETURNS NULL ON NULL INPUT 
BEGIN RETURN 
  (SELECT id as "@id", name as "@name", 
      CASE WHEN parent=@PartsNo
      THEN dbo.PartsList(id)
      END
   FROM dbo.PARTS WHERE parent=@PartsNo
   FOR XML PATH('Parts'), TYPE)
END

We use a CASE statement to make sure that the query execution does not calculate the function recursively if the optimizer decides to apply the filter after executing the selection. Executing the following expression will return part 3 with it subparts.

select id as "@id", name as "@name", 
       CASE WHEN id=3
       THEN dbo.PartsList(id)
       END
FROM PARTS
WHERE id=3
FOR XML PATH('Parts'), TYPE

Returns:

<Parts id="3" name="body">
  <Parts id="4" name="door">
    <Parts id="6" name="window" />
  </Parts>
  <Parts id="5" name="fender" />
</Parts>

Note that SQL Server 2005 has a maximum limit of 32 recursively nested function invocations. If your parts hierarchy exceeds the limit, you will need to use the old approach of getting the XML in flat form and applying an XSLT style sheet to create the hierarchy.

More FOR XML Extensions

Besides the new features listed above, SQL Server 2005 provides the following new features:

  1. RAW mode can be combined with the ELEMENTS directive, and take a parameter to overwrite the row element name. For example,
    SELECT *
    FROM Customers 
    FOR XML RAW('Customer'), ELEMENTS
    
    

    returns (only showing the first customer),

    <Customer>
      <CustomerID>ALFKI</CustomerID>
      <CompanyName>Alfreds Futterkiste</CompanyName>
      <ContactName>Maria Anders</ContactName>
      <ContactTitle>Sales Representative</ContactTitle>
      <Address>Obere Str. 57</Address>
      <City>Berlin</City>
      <PostalCode>12209</PostalCode>
      <Country>Germany</Country>
      <Phone>030-0074321</Phone>
      <Fax>030-0076545</Fax>
    </Customer>
    
    
  2. The ELEMENTS directive provides an XSINIL option to map NULL values to an element with an attribute xsi:nil="true". For example,
    SELECT * 
    FROM Customers 
    WHERE Region is null 
    FOR XML PATH('Customer'), ELEMENTS XSINIL
    
    

    returns (only showing the first customer),

    <Customer xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
      <CustomerID>ALFKI</CustomerID>
      <CompanyName>Alfreds Futterkiste</CompanyName>
      <ContactName>Maria Anders</ContactName>
      <ContactTitle>Sales Representative</ContactTitle>
      <Address>Obere Str. 57</Address>
      <City>Berlin</City>
      <Region xsi:nil="true" />
      <PostalCode>12209</PostalCode>
      <Country>Germany</Country>
      <Phone>030-0074321</Phone>
      <Fax>030-0076545</Fax>
    </Customer>
    
    
  3. A new in-line schema inference directive XMLSCHEMA that takes a target namespace URI as an optional argument has been added for the RAW and AUTO modes. For example,
    SELECT *
    FROM Customers 
    FOR XML RAW('Customer'), XMLSCHEMA('urn:example.com')
    
    

    returns (only showing parts of the schema and data),

    <xsd:schema targetNamespace="urn:example.com" 
       xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
       xmlns:sqltypes=
         "http://schemas.microsoft.com/sqlserver/2004/sqltypes" 
       elementFormDefault="qualified">
      <xsd:import namespace=
            "http://schemas.microsoft.com/sqlserver/2004/sqltypes" />
      <xsd:element name="Customer">
        ...
      </xsd:element>
    </xsd:schema>
    <Customer xmlns="urn:example.com" CustomerID="ALFKI" CompanyName="Alfreds Futterkiste" ContactName="Maria Anders" ContactTitle="Sales Representative" Address="Obere Str. 57" City="Berlin" PostalCode="12209" Country="Germany" Phone="030-0074321" Fax="030-0076545" />
    ...
    

    Note that the EXPLICIT and PATH modes – unlike the RAW and AUTO modes – are normally used when generating the XML according to a previously provided schema. Therefore we do not provide the schema inference directive with the EXPLICIT and PATH modes.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值