Using XML with MySQL(转载)

Using XML with MySQL

Paul DuBois
paul@kitebird.com

Document revision: 1.01
Last update: 2003-01-24

Table of Contents


Introduction

A growing number of applications today use data represented in the form of XML documents. XML use is on the rise because it is a simple ASCII format that has a well-defined yet extensible structure. As a result, many standardized XML processing tools has been written. What is the impact of these developments for users of the MySQL database system? MySQL has no native facilities for dealing with XML--does this mean we are left out of the XML movement? By no means. Many of the most popular languages for writing MySQL applications also have XML support, so these languages provide a natural bridge for spanning the gap between XML and relational databases. The following list indicates just some of the possibilities open to you for employing XML processing techniques to make more productive use of your MySQL server:

  • XML as a data transfer medium. Writing a query result as an XML document results in a platform-neutral ASCII file that can be used by other applications, even those that are not necessarily database-oriented. The recipient of such a document can employ standard XML tools to parse it and recover the original data values. Used this way, XML serves as an interface between your MySQL database and other applications that can read XML but may know nothing about MySQL. This works in the other direction, too. If an application can produce XML-formatted documents, you can read them and store the information contained therein into MySQL by using simple XML parsing techniques.
  • XML as a web delivery format. XML's simple, well-defined structure makes it useful for information delivery in a web environment. For example, you can set up an information feed that clients can use on an automated basis: Define XML formatting conventions with which to express the information and provide access to it through a web script. Clients can send requests to the script, which connects to MySQL, retrieves the desired information, and formats it as an XML document that is returned to the client. The client then extracts information from the document using standard XML tools.
  • Using XML to write web pages. As the limitations of HTML for writing web pages become more keenly felt, web developers turn increasingly to the greater expressive capabilities of XML. HTML serves primarily as a destination format, whereas XML is useful both as source and destination formats. For example, an XML document can incorporate the results of database queries and then, with the help of a rendering engine such as AxKit, be transformed into a format that matches the type of client you wish to serve. You can send HTML, WML, or plain text to web browsers, wireless devices, or printers. (Or, as indicated in the previous item, you can serve the document directly to clients that understand XML.) Contrast this with HTML, which does not render well into other formats.
  • Storing XML directly. You can of course store XML itself in your database. You might store templates for documents such as form letters that you combine with customer records to produce mailings, for example.

To help you get started, this article focuses on the tasks described in the data transfer and web delivery items in the preceding list. It shows how to create XML documents from query results, how to create new database records from information contained in XML documents, and how to set up a web-based XML delivery service.

You can use XML from within any language that has the appropriate processing tools available. For example, XML APIs exist for languages such as PHP, Python, Java, and Tcl, all of which also have MySQL capabilities. This article uses Perl, another language that enjoys strong XML and MySQL API support. The examples use the Perl DBI module to interact with MySQL, in conjunction with a variety of XML processing modules. MySQL can be obtained at www.mysql.com . The Perl DBI and XML modules can be obtained from the CPAN (the Perl archive at cpan.perl.org ).

The examples shown here assume that you have a MySQL database named test on the local host, accessed through a MySQL account with a user name and password of testuser and testpass . The article also assumes that you have a basic working knowledge of XML and the Perl DBI module.

Writing Query Results as XML

 

Suppose that you want to produce XML output from a table named animal that has two string columns:

   +--------+-----------+
   | name   | category  |
   +--------+-----------+
   | snake  | reptile   |
   | frog   | amphibian |
   | tuna   | fish      |
   | racoon | mammal    |
   +--------+-----------+

The information in the table can be retrieved easily using a simple SQL statement:

   SELECT name, category FROM animal

The question is how to convert the information residing in MySQL to an XML representation. The two methods shown in the following discussion demonstrate how to write an XML document "manually" by adding the XML tags yourself, or by using one of the available Perl modules to do most of the work.

Writing XML By Adding Element Tags Yourself

One way to generate XML from the contents of the animal table is to write all the document tags explicitly using print statements. Connect to MySQL, issue the query, fetch the results, and wrap them within the appropriate tags required to produce a properly formatted XML document:

   use strict;
   use DBI;

   my $dbh = DBI->connect ("DBI:mysql:test",
                           "testuser", "testpass",
                           { RaiseError => 1, PrintError => 0});
   my $sth = $dbh->prepare ("SELECT name, category FROM animal");
   $sth->execute ();
   print "/n";
   print "/n";
   while (my ($name, $category) = $sth->fetchrow_array ())
   {
       print " /n";
       print "  $name/n";
       print "  $category/n";
       print " /n";
   }
   $dbh->disconnect ();
   print "/n";

This script produces the following XML representation of the data set, where the root element contains a element for each row in the table and each row contains an element per column:

   
   
    
     snake
     reptile
    
    
     frog
     amphibian
    
    
     tuna
     fish
    
    
     racoon
     mammal
    
   

This script has the advantage of being simple to write, and thus can be implemented relatively quickly. However, it also has some specific disadvantages:

  • Knowledge of the table structure is built in, such as the column names to use for element tags within the rows. This means the code could not be used for a different query without modifying the processing loop.
  • The script does not encode any special characters that might occur within data values (such as '< ' or '& '). The animal table does not have any, but another table might, in which case any output generated from it would be malformed.

The script could be rewritten to be less query-specific and to perform encoding, but an easier approach is to use existing Perl modules that do the work for you.

Writing XML By Using a Utility Module

A number of Perl modules are available for writing XML documents; the example shown in this section uses XML::Generator::DBI . This module is designed to work in concert with DBI, which makes it especially convenient for writing scripts that fetch information from MySQL or other databases. The API for XML::Generator::DBI consists of two methods:

  • $gen = XML::Generator::DBI->new( arguments );
    new() creates a new generator object. This method requires that you first open a connection to MySQL to get a database handle, and that you supply a handler object that understands the SAX (Simple API for XML) protocol. Pass the database handle and the SAX object to new() to obtain an object to use for executing queries.
  • $gen->execute( query );
    The execute() method issues the query and writes the results as an XML document. The generator uses the database handle to read information from the database, and it posts SAX events to the SAX object to write the information in XML format.

The following script shows how to use XML::Generator::DBI to convert the contents of the animal table to XML. The SAX handler is obtained from the XML::Hander::YAWriter (yet another writer) module.

   use strict;
   use DBI;
   use XML::Generator::DBI;
   use XML::Handler::YAWriter;

   my $dbh = DBI->connect ("DBI:mysql:test",
                           "testuser", "testpass",
                           { RaiseError => 1, PrintError => 0});
   my $out = XML::Handler::YAWriter->new (AsFile => "-");
   my $gen = XML::Generator::DBI->new (
                                   Handler => $out,
                                   dbh => $dbh
                               );
   $gen->execute ("SELECT name, category FROM animal");
   $dbh->disconnect ();

This example involves about the same amount of code as the one in the previous section, but it's more general. For example, to generate XML for a different query, all you have to do is change the argument to the execute() call. The XML that this second script generates (shown below) is somewhat different than for the preceding example. Compare it the XML document shown earlier:

   
   
    

      reptile
     
     
      frog
      amphibian
     
     
      tuna
      fish
     
     
      racoon
      mammal
     
    
   

This output differs in the following ways:

  • The tag includes an encoding attribute specifying the UTF-8 character set. If the table had contained any characters that lie outside this set, XML::Generator::DBI would convert them to base64 encoding automatically.
  • The document root element is rather than . You could change this if desired by providing a RootElement argument when creating the generator object:
       my $gen = XML::Generator::DBI->new (
                                       Handler => $out,
                                       dbh => $dbh,
                                       RootElement => "dataset"
                                   );
  • The elements are placed within a

The XML::Generator::DBI and XML::Handler::YAWriter modules have other options you can use to modify their behavior. For more information, read their documentation using the perldoc command:

   % perldoc XML::Generator::DBI

   % perldoc XML::Handler::YAWriter

Reading XML Documents into MySQL

The previous section showed how to convert query results to XML for use by other applications. This section illustrates how to go in the opposite direction, that is, how to extract records from an XML document and insert them into MySQL. This task generally requires that you know something about the structure of the document and the table, so that you can determine the correspondence between document elements and table columns. The examples assume that you want to process an XML document, animal.xml , that contains new records to be added to the animal table. The records are contained within elements, each of which includes elements for the columns in the record:

   
   
    

      reptile
     
     
      frog
      amphibian
     
     
      tuna
      fish
     
     
      racoon
      mammal
     
    
   

You also need a module that implements some kind of parsing mechanism. The most popular Perl parser is XML::Parser , a module that you can use directly, or indirectly through one of the higher-level parser modules that are built on top of it. Higher-level parsers implement a variety of approaches. Some modules convert the entire document to an in-memory structure. For example, XML:DOM produces a structure that conforms to the Document Object Model standard. If you use such a module, you can iterate through the structure that it produces to access each row's contents. Other modules, such as those that use SAX-based parsers, implement a streaming approach that returns elements of the document as they are encountered. Using an in-memory structure may be more convenient in some ways (particularly if you want to check relationships between column values), but if you're working with large amounts of data, a streaming approach that does not require holding the entire document in memory at once may be preferable.

The examples in this section shows how to read animal.xml two ways, first by using XML::Parser directly, then by using XML::XPath, a module that (like XML::DOM ) holds the document in memory.

Reading XML with XML::Parser

 

XML::Parser supports a variety of ways to parse XML documents. The following example uses its Handler interface, in which you create a parser object and register callback functions to be invoked when the parser encounters opening and closing element tags, or text within the body of elements. The main part of the parsing script sets up a %row hash that contains a member for each column name in the animal table, connects to MySQL, and creates a parser object. Then it parses the input file, which causes the handler functions to be called at appropriate places in the file:

   use strict;
   use DBI;
   use XML::Parser;

   # create hash to hold values for expected column names
   my %row = ("name" => undef, "category" => undef);

   # connect to database and create parser object
   my $dbh = DBI->connect ("DBI:mysql:test",
                           "testuser", "testpass",
                           { RaiseError => 1, PrintError => 0});
   my $parser = new XML::Parser (
                           Handlers => {
                               Start => /&handle_start,
                               End   => /&handle_end,
                               Char  => /&handle_text
                           }
                       );

   # parse file and disconnect
   $parser->parsefile ("animal.xml");
   $dbh->disconnect ();

The callback functions coordinate to recognize the start and end of rows and to extract column values contained within each row. They work together using the following logic:

  • When a element begins, handle_start() empties the %row hash in preparation for collecting a new set of column values:
       sub handle_start
       {
       my ($p, $tag) = @_;     # parser, tag name
    
           if ($tag eq "row")
           {
               foreach my $key (keys (%row))
               {
                   $row{$key} = undef;
               }
           }
       }
    
  • When the parser finds text data, handle_text() saves it in a %row hash member if the current element corresponds to one of the column names in the animal table (name , category ):
       sub handle_text
       {
       my ($p, $data) = @_;        # parser, text
    
           my $tag = $p->current_element ();
           $row{$tag} .= $data if exists ($row{$tag});
       }
    
    The reason that handle_text() concatenates the text to the current value of the hash member is that XML parsers do not necessarily return all the text for an XML element at once. (An implementation is allowed to return the text a character at a time, for example.)
  • When a element ends, handle_end() uses the values collected in the %row hash to construct and issue an INSERT statement that creates a new record:
       sub handle_end
       {
       my ($p, $tag) = @_;     # parser, tag name
    
           if ($tag eq "row")
           {
               my $str;
               # construct column assignments for INSERT statement
               foreach my $key (keys (%row))
               {
                   $str .= "," if $str;
                   $str .= "$key=" . $dbh->quote($row{$key});
               }
               $dbh->do ("INSERT INTO animal SET $str");
           }
       }
    
    handle_end() creates INSERT statements that look like this:
       INSERT INTO animal SET name='name_val
    ',category='category_val
    '
    
    The quote() function escapes any special characters in the data values, to make them legal for inclusion in a SQL query string.

The preceding example only scratches the surface of the ways you can interact with the XML::Parser module. For more information, read its documentation with the perldoc command:

   % perldoc XML::Parser

Reading XML with XML::XPath

The XML::Parser example just shown represents a low-level approach to XML input processing that has the advantage of placing a low memory burden on your script. But it isn't necessarily very easy to understand, due to the fragmentation of different parsing tasks into separate handler functions. If you're willing to hold the document in memory in exchange for being able to use a method that that is conceptually more intuitive, consider XML::XPath . This module implements the XPath specification, which allows you to specify absolute or relative element paths to the parts of the document you're interested in. For example, the absolute path /database/select/row selects elements that are reached specifically by traveling through and The first couple of lines are the HTTP response headers that indicate the length of the response and the response type. This is followed by a blank line to separate the headers from the body, and then the body itself, which is an XML document containing the requested information. We'll assume that clients know that they should look for and tags within a element to get the information. For a more sophisticated application, you might want to define and publish a document type definition (DTD) that describes how clients should interpret documents that the application produces.

The animserv.pl script is implemented as shown below. It obtains the name parameter supplied by the client, constructs a query to look for the appropriate record, and generates a web response that includes the appropriate headers:

   use strict;
   use CGI qw(param header);
   use DBI;
   use XML::Generator::DBI;
   use XML::Handler::YAWriter;

   # get animal name from client; exit if none found
   my $name = param ("name");
   exit (0) unless defined ($name);

   # run query to look for given animal, generating result
   # as a string so the length can be determined
   my $dbh = DBI->connect ("DBI:mysql:test",
                           "testuser", "testpass",
                           { RaiseError => 1, PrintError => 0});
   my $out = XML::Handler::YAWriter->new (AsString => 1);
   my $gen = XML::Generator::DBI->new (
                                   Handler => $out,
                                   dbh => $dbh
                               );
   $name = $dbh->quote ($name);
   my $doc = $gen->execute (
           "SELECT name, category FROM animal
           WHERE name = $name"
           );
   $dbh->disconnect ();

   # generate type and length headers, then print document
   print header (-Content_Type => "text/xml",
                   -charset => "UTF-8",
                   -Content_Length => length ($doc));
   print $doc;

The animserv.pl script is similar in many ways to the earlier example that used XML::Generate::DBI to produce an XML document. The primary differences are:

  • animserv.pl must determine what information the client wants, based on the value of the name parameter that indicates the desired animal. The script uses the param() function from the CGI.pm module to get this value.
  • The script returns a data set that contains only part of the animal table, so the execute() call issues a query that includes a WHERE clause to indicate which record to retrieve. Note that the $name value is converted using quote() before it's inserted into the query string. It's dangerous to include client input directly into queries (someone may attempt to break the script by passing something nasty), so animserv.pl sanitizes the animal name value by escaping any special characters in the value.
  • The XML document that is sent to the client must be preceded by HTTP headers that tell the client the length and type of the response. The length is obtained by passing the SAX handler an AsString argument to cause it to return the document as a string rather than printing it immediately. The length of the string becomes the value used in the Content-Length: header. The script prints the HTTP headers by invoking header() , another function from the CGI.pm module.

The animserv.pl application is very simple, but could be modified to act as the basis for a variety of information servers. For example, if you have a dictionary of words and meanings stored in MySQL, a few minor changes to the application would allow you to set up a dictionary server to which clients submit words and from which they receive XML-format definitions in response.

Resources

This article illustrates some of the ways that MySQL can be used in applications that process XML documents, but the XML modules used in the example scripts represent only a few of the many available to you. To see others, visit the CPAN at:

   http://cpan.perl.org/



For more information on using XML from within Perl scripts, check out the Perl & XML column at this web site; it contains a very helpful series of articles:

   http://xml.com/



If you're interested in more ways to use the Perl DBI module for web programming, see my book MySQL and Perl for the Web (New Riders). The book's companion web site is:

   http://www.kitebird.com/mysql-perl/



Acknowledgment

The original version of this article was written for NuSphere Corporation. The current version is an updated revision of the original.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值