读取xml文件中的命名空间_使用不带正则表达式PHP读取XML命名空间。

最新推荐文章于 2021-03-17 04:42:28 发布

cunchi8090

最新推荐文章于 2021-03-17 04:42:28 发布

阅读量229

点赞数

文章标签： xml java php python 大数据

原文链接：https://www.experts-exchange.com/articles/11342/Reading-XML-Namespaces-using-PHP-Without-regex.html

版权

读取xml文件中的命名空间

There are a number of people out there who will tell you that the only way to parse an RSS feed containing name spaces is to use regular expressions. They are wrong and, frankly, should know better. In this essay I am going to show you how to parse an RSS feed using standard PHP libraries. Why namespaces are used in xml files is not within the scope of this document. I am just going to show you how to read them. This article assumes you already know how to code in PHP, but are having difficulty extracting data from an RSS feed.

有很多人会告诉您，解析包含名称空间的RSS feed的唯一方法是使用正则表达式。他们错了，坦率地说，他们应该了解得更多。在本文中，我将向您展示如何使用标准PHP库解析RSS feed。为什么在xml文件中使用名称空间不在本文档的范围之内。我将向您展示如何阅读它们。本文假定您已经知道如何使用PHP进行编码，但是很难从RSS feed中提取数据。

There are many functions in the standard PHP for dealing with xml, I am going to use simplexml because I find it is the easiest. Others will give you more control of the information gathered from the feed, but when all you want to do is read all the content of an RSS this will do everything you need. I am using a specific feed that conforms to standards, the methods discussed in this article can be applied to any feed.

标准PHP中有很多用于处理xml的函数，我将使用simplexml，因为我发现它是最简单的。其他人将为您提供从提要中收集的信息的更多控制权，但是当您只想阅读RSS的所有内容时，这将满足您的所有需求。我使用的是符合标准的特定Feed，本文中讨论的方法可以应用于任何Feed。

The supposed problem.

所谓的问题。

This is the source of a genuine RSS feed. It contains the usual suspects of channel, title, description, item etc. It also contains name spaces and values stored as attributes.

这是真正的RSS feed的来源。它包含频道，标题，描述，项目等的常见可疑内容。还包含名称空间和作为属性存储的值。

<?xml version="1.0"?>
<rss version="2.0"
     
     xmlns:media="http://search.yahoo.com/mrss/" 
     xmlns:dcterms="http://purl.org/dc/terms/" 
     xmlns:pbscontent="http://www.pbs.org/rss/pbscontent/" 
     xmlns:pbsvideo="http://www.pbs.org/rss/pbsvideo/" >
<channel>
    <title>The Local Show | PBS Video</title>
    <description>The Local Show RSS feed for PBS programming.</description>
    <link>http://video.pbs.org</link><language>en-us</language>
    <generator>http://video.pbs.org</generator>
    <lastBuildDate>Fri, 15 Mar 2013 10:16:35 -0400</lastBuildDate>
    <pubDate>Fri, 15 Mar 2013 10:16:35 -0400</pubDate>
    <item>
        <title>The Local Show | KC Makers, Celebrating Extraordinary Women</title>
        <link>http://video.pbs.org/video/2338801013/</link>
        <description>This week, we celebrate the achievements of just a few of the extraordinary women who live in the 					Metro.</description>
        <guid>http://video.pbs.org/video/2338801013/</guid>
        <pubDate>02/25/2013</pubDate>
        <media:description>The Local Show celebrates Kansas City&#39;s Makers: Women Who Make America.</media:description>
        <media:content medium="video" duration="1611000" />
        <media:thumbnail url="http://pbs.merlin.cdn.prod.s3.amazonaws.com/Video%20Asset/KCPT/local-								show/70943/images/567745_ThumbnailCOVEDefault_20130225174212.jpg.resize.142x80.jpg" 
			type="image/jpeg" height="60" width="142" />
        <media:rating scheme="urn:v-chip">nr</media:rating>
        <media:player url="http://video.pbs.org/video/2338801013/" />
        <category domain="PBS/taxonomy/topic">Arts &amp; Entertainment</category>
        <media:category scheme="http://www.pbs.org/rss/pbscontent/taxonomy/topic">Arts &amp; Entertainment</media:category>
        <category domain="PBS/taxonomy/topic">Culture &amp; Society</category>
        <media:category scheme="http://www.pbs.org/rss/pbscontent/taxonomy/topic">Culture &amp; Society</media:category>
        <category domain="PBS/taxonomy/topic">Health</category>
        <media:category scheme="http://www.pbs.org/rss/pbscontent/taxonomy/topic">Health</media:category>
        <category domain="PBS/taxonomy/topic">News &amp; Public Affairs</category>
        <media:category scheme="http://www.pbs.org/rss/pbscontent/taxonomy/topic">News &amp; Public Affairs</media:category>
        <category domain="PBS/taxonomy/topic">Parents</category>
        <media:category scheme="http://www.pbs.org/rss/pbscontent/taxonomy/topic">Parents</media:category><category domain="PBS/taxonomy/topic">Technology</category><media:category scheme="http://www.pbs.org/rss/pbscontent/taxonomy/topic">Technology</media:category>
        <pbsvideo:content_type>Episode</pbsvideo:content_type>
    </item>
[...Many more items]
    </channel>
</rss>

Just so there is no misunderstanding, namespaces are xml tags that contain a colon ':'. In this example the first name space is

只是为了避免误解，命名空间是包含冒号“：”的xml标记。在此示例中，名字空间是

<media:description></media:description>

This has the value

这具有价值

The Local Show celebrates Kansas City's Makers: Women Who Make America.

本地秀庆祝堪萨斯城的制造者：创造美国的女性。

The namespace <media:content> contains no data, but does have two attributes, video and duration

名称空间<media：content>不包含任何数据，但是具有两个属性，视频和持续时间

<media:content medium="video" duration="1611000" />

$url = "http://video.pbs.org/program/local-show/rss/";

$contents = file_get_contents($url);

$xml = simplexml_load_string($contents);

foreach ($xml->channel as $channel)
{
    foreach ($channel->item as $item)
    {
         foreach ($item as $feed)
         {
             echo "The feed : ". $feed. "<br />";
         }
    }
}

foreach ($xml->channel as $channel)
{
     foreach ($channel->item as $item)
    {
         $ns = $item->getNamespaces(true);  //Apply method to <item> tag

         $child = $item->children($ns["media"]); //Extract the “media:” namespace 

         foreach ($item as $feed)
         {
	echo "The feed : ". $feed. "<br />";
        }
    }
}

foreach ($xml->channel as $channel)
{
    foreach ($channel->item as $item)
    {
        $ns = $item->getNamespaces(true);

        $child = $item->children($ns["media"]);

        foreach ($item as $feed)
        {
            echo "The feed : ". $feed. "<br />";
        }

        foreach ($child as $name)
        {
            echo "the name space : ".$name ."<br />"; //Output namespace values
        }
    }
}

foreach ($child->content->attributes() as $attrib_name => $attrib_value )
{
    echo "the attribute name : ".$attrib_name." the attribute value : ".$attrib_value;
}

This will echo the name of the attribute and the value of the attribute. Since we already know the name of the attribute, which is medium, we can access its value

这将回显属性的名称和属性的值。由于我们已经知道属性的名称（中等），因此可以访问其值

echo "medium : ".$child->content->attributes()->medium;

In this feed there are several tags called <category> and several namespaces called <media:category>. Internally these will be stored in an array. There is a method called count() that will count all the elements in the array. The results can be used to address the individual tags and attributes.

在此提要中，有几个称为<category>的标记和几个称为<media：category>的命名空间。在内部，这些将存储在数组中。有一个称为count（）的方法将对数组中的所有元素进行计数。结果可用于处理各个标签和属性。

$cats = $item->category->count();

for ($i = 0; $i < $cats; $i++)
{
          echo $item->category[$i];
          echo $child->category[$i]->attributes()->scheme;
}

Here is the complete listing that will output some of the feed data to your screen. The lack of html is deliberate, I am not a web designer and line breaks are as much as I need.

这是完整的清单，它将一些提要数据输出到您的屏幕。缺少html是故意的，我不是网页设计师，而且换行符是我所需要的。

<?php
$url = "http://video.pbs.org/program/local-show/rss/";

$contents = file_get_contents($url);
$xml = simplexml_load_string($contents);

foreach ($xml->channel as $channel)
{
    foreach ($channel->item as $item)
    {
        $ns = $item->getNamespaces(true);
        $child = $item->children($ns["media"]);
        
        echo "Programme Title : " . $item->title . "<br />";
        echo "Video Link : " . $item->link . "<br />";
        echo "Description : " . $item->description . "<br />";
        echo "Transmitted  : " . $item->pubDate . "<br />";

        $cats = $item->category->count();

        echo "Found in the following categories :<br />";

        for ($i = 0; $i < $cats; $i++)
        {
            echo $item->category[$i];
            echo " link : ". $child->category[$i]->attributes()->scheme . "<br/>";
        }

        echo "Rating: " . $child->rating . "<br/>";

	  //Calculate the time from milliseconds
        $x = $child->content->attributes()->duration /(60 * 1000);
        $m = floor($x);
        $s = number_format(($x - $m) * 60);
        
        echo "Duration :  $m minutes  $s seconds <br/><br/><br/>";
    }
}
?>

There you have it, a parsed feed with namespaces, attributes and no regex in sight.

有了它，它是一个解析的提要，其中包含名称空间，属性，并且看不到正则表达式。

翻译自: https://www.experts-exchange.com/articles/11342/Reading-XML-Namespaces-using-PHP-Without-regex.html

读取xml文件中的命名空间

cunchi8090

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
读取xml文件中的命名空间_使用不带正则表达式PHP读取XML命名空间。

读取xml文件中的命名空间 There are a number of people out there who will tell you that the only way to parse an RSS feed containing name spaces is to use regular expressions. They are wrong a...
复制链接

扫一扫