使用SimpleXML解析XML

最新推荐文章于 2021-03-10 17:28:23 发布

culh2177

最新推荐文章于 2021-03-10 17:28:23 发布

阅读量303

点赞数

文章标签： java python 编程语言 xml linux

原文链接：https://www.sitepoint.com/parsing-xml-with-simplexml/

版权

Parsing XML essentially means navigating through an XML document and returning the relevant data. An increasing number of web services return data in JSON format, but a large number still return XML, so you need to master parsing XML if you really want to consume the full breadth of APIs available.

解析XML本质上是指浏览XML文档并返回相关数据。越来越多的Web服务以JSON格式返回数据，但是仍有大量返回XML，因此，如果您确实想消耗全部可用的API，则需要掌握XML的解析。

Using PHP’s SimpleXML extension that was introduced back in PHP 5.0, working with XML is very easy to do. In this article I’ll show you how.

使用从PHP 5.0引入PHP的SimpleXML扩展，使用XML十分容易。在本文中，我将向您展示如何进行。

基本用法 (Basic Usage)

Let’s start with the following sample as languages.xml:

让我们从以下示例开始，即languages.xml ：

<?xml version="1.0" encoding="utf-8"?>
<languages>
 <lang name="C">
  <appeared>1972</appeared>
  <creator>Dennis Ritchie</creator>
 </lang>
 <lang name="PHP">
  <appeared>1995</appeared>
  <creator>Rasmus Lerdorf</creator>
 </lang>
 <lang name="Java">
  <appeared>1995</appeared>
  <creator>James Gosling</creator>
 </lang>
</languages>

The above XML document encodes a list of programming languages, giving two details about each language: its year of implementation and the name of its creator.

上面的XML文档对一系列编程语言进行了编码，提供了每种语言的两个详细信息：其实施年份和创建者名称。

The first step is to loading the XML using either simplexml_load_file() or simplexml_load_string(). As you might expect, the former will load the XML file a file and the later will load the XML from a given string.

第一步是使用simplexml_load_file()或simplexml_load_string()加载XML。如您所料，前者将向XML文件加载文件，而后者将从给定的字符串加载XML。

<?php
$languages = simplexml_load_file("languages.xml");

Both functions read the entire DOM tree into memory and returns a SimpleXMLElement object representation of it. In the above example, the object is stored into the $languages variable. You can then use var_dump() or print_r() to get the details of the returned object if you like.

这两个函数都将整个DOM树读入内存，并返回其SimpleXMLElement对象表示形式。在上面的示例中，对象存储在$ languages变量中。然后，您可以根据需要使用var_dump()或print_r()来获取返回对象的详细信息。

SimpleXMLElement Object
(
    [lang] => Array
        (
            [0] => SimpleXMLElement Object
                (
                    [@attributes] => Array
                        (
                            [name] => C
                        )
                    [appeared] => 1972
                    [creator] => Dennis Ritchie
                )
            [1] => SimpleXMLElement Object
                (
                    [@attributes] => Array
                        (
                            [name] => PHP
                        )
                    [appeared] => 1995
                    [creator] => Rasmus Lerdorf
                )
            [2] => SimpleXMLElement Object
                (
                    [@attributes] => Array
                        (
                            [name] => Java
                        )
                    [appeared] => 1995
                    [creator] => James Gosling
                )
        )
)

The XML contained a root language element which wrapped three lang elements, which is why the SimpleXMLElement has the public property lang which is an array of three SimpleXMLElements. Each element of the array corresponds to a lang element in the XML document.

XML包含包装三个lang元素的根language元素，这就是为什么SimpleXMLElement具有公共属性lang原因，该属性是三个SimpleXMLElements的数组。数组的每个元素对应于XML文档中的lang元素。

You can access the properties of the object in the usual way with the -> operator. For example, $languages->lang[0] will give you a SimpleXMLElement object which corresponds to the first lang element. This object then has two public properties: appeared and creator.

您可以使用->运算符以常规方式访问对象的属性。例如， $languages->lang[0]将为您提供一个与第一个lang元素相对应的SimpleXMLElement对象。然后，该对象具有两个公共属性： appeared和creator 。

<?php
$languages->lang[0]->appeared;
$languages->lang[0]->creator;

Iterating through the list of languages and showing their details can be done very easily with standard looping methods, such as foreach.

使用标准循环方法(例如foreach可以非常轻松地遍历语言列表并显示其详细信息。

<?php
foreach ($languages->lang as $lang) {
    printf(
        "<p>%s appeared in %d and was created by %s.</p>",
        $lang["name"],
        $lang->appeared,
        $lang->creator
    );
}

Notice that I accessed the lang element’s name attribute to retrieve the name of the language. You can access any attribute of an element represented as a SimpleXMLElement object using array notation like this.

注意，我访问了lang元素的name属性以检索语言的名称。您可以使用这样的数组符号访问表示为SimpleXMLElement对象的元素的任何属性。

处理命名空间 (Dealing With Namespaces)

Many times you’ll encounter namespaced elements while working with XML from different web services. Let’s modify our languages.xml example to reflect the usage of namespaces:

很多时候，当您使用来自不同Web服务的XML时，都会遇到命名空间元素。让我们修改我们的languages.xml示例，以反映名称空间的用法：

<?xml version="1.0" encoding="utf-8"?>
<languages
 xmlns:dc="http://purl.org/dc/elements/1.1/">
 <lang name="C">
  <appeared>1972</appeared>
  <dc:creator>Dennis Ritchie</dc:creator>
 </lang>
 <lang name="PHP">
  <appeared>1995</appeared>
  <dc:creator>Rasmus Lerdorf</dc:creator>
 </lang>
 <lang name="Java">
  <appeared>1995</appeared>
  <dc:creator>James Gosling</dc:creator>
 </lang>
</languages>

Now the creator element is placed under the namespace dc which points to http://purl.org/dc/elements/1.1/. If you try to print the creator of a language using our previous technique, it won’t work. In order to read namespaced elements like this you need to use one of the following approaches.

现在， creator元素被放置在命名空间dc下，该命名空间指向http://purl.org/dc/elements/1.1/ 。如果您尝试使用我们以前的技术来打印语言的创建者，那么它将无法正常工作。为了读取这样的命名空间元素，您需要使用以下方法之一。

The first approach is to use the namespace URI directly in your code when accessing namespaced elements. The following example demonstrates how:

第一种方法是在访问命名空间元素时直接在代码中使用命名空间URI。以下示例演示了如何：

<?php
$dc = $languages->lang[1]- >children("http://purl.org/dc/elements/1.1/");
echo $dc->creator;

The children() method takes a namespace and returns the children of the element that are prefixed with it. It accepts two arguments; the first one is the XML namespace and the latter is an optional Boolean which defaults to false. If you pass true, the namespace will be treated as a prefix rather the actual namespace URI.

children()方法采用一个名称空间，并返回带有该名称空间的元素的子级。它接受两个参数。第一个是XML名称空间，第二个是可选的布尔值，默认为false。如果传递true，则将名称空间视为前缀，而不是实际的名称空间URI。

The second approach is to read the namespace URI from the document and use it while accessing namespaced elements. This is actually a cleaner way of accessing elements because you don’t have to hardcode the URI.

第二种方法是从文档中读取命名空间URI，并在访问命名空间元素时使用它。实际上，这是一种访问元素的更简洁的方法，因为您不必对URI进行硬编码。

<?php
$namespaces = $languages->getNamespaces(true);
$dc = $languages->lang[1]->children($namespaces["dc"]);

echo $dc->creator;

The getNamespaces() method returns an array of namespace prefixes with their associated URIs. It accepts an optional parameter which defaults to false. If you set it true then the method will return the namespaces used in parent and child nodes. Otherwise, it finds namespaces used within the parent node only.

getNamespaces()方法返回名称空间前缀及其相关的URI的数组。它接受默认为false的可选参数。如果将其设置为true，则该方法将返回在父节点和子节点中使用的名称空间。否则，它将查找仅在父节点内使用的名称空间。

Now you can iterate through the list of languages like so:

现在，您可以像这样遍历语言列表：

<?php
$languages = simplexml_load_file("languages.xml");
$ns = $languages->getNamespaces(true);

foreach($languages->lang as $lang) {
    $dc = $lang->children($ns["dc"]);
    printf(
        "<p>%s appeared in %d and was created by %s.</p>",
        $lang["name"],
        $lang->appeared,
        $dc->creator
    );
}

一个实际示例–解析YouTube视频供稿 (A Practical Example – Parsing YouTube Video Feed)

Let’s walk through an example that retrieves the RSS feed from a YouTube channel displays links to all of the videos from it. For this we need to make a call to the following URL:

让我们来看一个示例，该示例从YouTube频道中检索RSS feed，并显示指向该频道中所有视频的链接。为此，我们需要调用以下URL：

http://gdata.youtube.com/feeds/api/users//uploads

The URL returns a list of the latest videos from the given channel in XML format. We’ll parse the XML and get the following pieces of information for each video:

该URL以XML格式返回给定频道的最新视频列表。我们将解析XML，并为每个视频获取以下信息：

Video URL
影片网址
Thumbnail
缩图
Title
标题

We’ll start out by retrieving and loading the XML:

我们将从检索和加载XML开始：

<?php
$channel = "channelName";
$url = "http://gdata.youtube.com/feeds/api/users/".$channel."/uploads";
$xml = file_get_contents($url);

$feed = simplexml_load_string($xml);
$ns=$feed->getNameSpaces(true);

If you take a look at the XML feed you can see there are several entity elements each of which stores the details of a specific video from the channel. But we are concerned with only thumbnail image, video URL, and title. The three elements are children of group, which is a child of entry:

如果您看一下XML提要，您会看到有几个entity元素，每个entity元素都存储来自该频道的特定视频的详细信息。但是我们只关注缩略图，视频URL和标题。这三个元素是group的子级，这是entry的子级：

<entry>
   …
   <media:group>
      …
      <media:player url="video url"/>
      <media:thumbnail url="video url" height="height" width="width"/>
      <media:title type="plain">Title…</media:title>
      …
   </media:group>
   …
</entry>

We simply loop through all the entry elements, and for each one we can extract the relevant information. Note that player, thumbnail, and title are all under the media namespace. So, we need to proceed like the earlier example. We get the namespaces from the document and use the namespace while accessing the elements.

我们只需遍历所有entry元素，就可以为每个元素提取相关信息。请注意， player ， thumbnail和title都在media名称空间下。因此，我们需要像前面的示例一样进行操作。我们从文档中获取名称空间，并在访问元素时使用名称空间。

<?php
foreach ($feed->entry as $entry) {
	$group=$entry->children($ns["media"]);
	$group=$group->group;
	$thumbnail_attrs=$group->thumbnail[1]->attributes();
	$image=$thumbnail_attrs["url"];
	$player=$group->player->attributes();
	$link=$player["url"];
	$title=$group->title;
	printf('<p><a href="%s"><img src="%s" alt="%s"></a></p>',
	        $player, $image, $title);
}

结论 (Conclusion)

Now that you know how to use SimpleXML to parse XML data, you can improve your skills by parsing different XML feeds from various APIs. But an important point to consider is that SimpleXML reads the entire DOM into memory, so if you are parsing large data sets then you may face memory issues. In those cases it’s advisable to use something other than SimpleXML, preferably an event-based parser such as XML Parser. To learn more about SimpleXML, check out its documentation.

现在您知道如何使用SimpleXML解析XML数据，可以通过解析来自各种API的不同XML提要来提高技能。但是要考虑的重要一点是，SimpleXML将整个DOM读取到内存中，因此，如果要解析大型数据集，则可能会遇到内存问题。在这些情况下，建议使用SimpleXML以外的其他东西，最好使用基于事件的解析器，例如XML解析器。要了解有关SimpleXML的更多信息，请查看其文档。

And if you enjoyed reading this post, you’ll love Learnable; the place to learn fresh skills and techniques from the masters. Members get instant access to all of SitePoint’s ebooks and interactive online courses, like Jump Start PHP.

并且，如果您喜欢阅读这篇文章，您会喜欢Learnable的； 向大师学习新鲜技能的地方。 会员可以立即访问所有SitePoint的电子书和交互式在线课程，例如Jump Start PHP 。

Comments on this article are closed. Have a question about PHP? Why not ask it on our forums?

本文的评论已关闭。 对PHP有疑问吗？ 为什么不在我们的论坛上提问呢？

翻译自: https://www.sitepoint.com/parsing-xml-with-simplexml/

culh2177

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
使用SimpleXML解析XML

Parsing XML essentially means navigating through an XML document and returning the relevant data. An increasing number of web services return data in JSON format, but a large number still return XML, ...
复制链接

扫一扫