![parsing pages with XPath parsing pages with XPath](https://i-blog.csdnimg.cn/blog_migrate/bd5a2487f0676b6fef61cf89872a20e7.png)
![用XPath解析页面 parsing pages with XPath](https://i-blog.csdnimg.cn/blog_migrate/bd5a2487f0676b6fef61cf89872a20e7.png)
Parsing pages with XPath. Today I will tell you how you can make parsers of remote HTML pages (in PHP). In this article I will show you how to perform xpath queries to Web pages. XPath – a query language to elements of xml or xhtml document. To obtain the necessary data, we just need to create the necessary query. For the work, we also need: browser Mozilla Firefox, firebug and firepath plugins. For our experiment, I suggest this webpage Google Sci/Tech News. Of course you can choose any other web page too.
使用XPath解析页面。 今天,我将告诉您如何创建远程HTML页面(在PHP中)的解析器。 在本文中,我将向您展示如何对网页执行xpath查询。 XPath – xml或xhtml文档元素的查询语言。 为了获得必要的数据,我们只需要创建必要的查询。 对于这项工作,我们还需要:浏览器Mozilla Firefox, firebug和firepath插件。 对于我们的实验,我建议使用此网页Google Sci / Tech News 。 当然,您也可以选择任何其他网页。
Here is downloadable package:
这是可下载的软件包:
[sociallocker]
[社交储物柜]
打包下载
[/sociallocker]
[/ sociallocker]
Ok, lets start, firstly make sure that both plugins installed in your browser. Then lets open our page with news (Google Sci/Tech News page). After – clicking the right mouse button at any description text, as example inside ‘A Samsung Electronics Co. Galaxy S smartphone, top…’ text, and in popup menu selecting ‘Inspect in Firepath’
好的,让我们开始吧,首先确保两个插件都已安装在浏览器中。 然后,让我们打开包含新闻的页面(Google Sci / Tech新闻页面)。 之后–在任何描述文字上单击鼠标右键,例如“ A Samsung Electronics Co. Galaxy S智能手机,顶部…”文字内的文字,然后在弹出菜单中选择“在Firepath中检查”
![firepath - step 1](https://i-blog.csdnimg.cn/blog_migrate/6789a271ff786e0983f794f4b9d7720b.png)
![Firepath-步骤1](https://i-blog.csdnimg.cn/blog_migrate/6789a271ff786e0983f794f4b9d7720b.png)
as result – we will see next:
结果–我们将看到下一个:
![firepath - step 2](https://i-blog.csdnimg.cn/blog_migrate/e2d03b8f6e4a952d9267ac46ddac53d4.png)
![firepath-步骤2](https://i-blog.csdnimg.cn/blog_migrate/e2d03b8f6e4a952d9267ac46ddac53d4.png)
Make attention to XPath: .//*[@id=’top-stories’]/div[2]/div[3]/div[1]
注意XPath:.//*[@id='top-stories']/div[2]/div[3]/div[1]
The result will be highlighted by the dashed line. After cleanup all unnecessary indexes and small corrections – we will get next query: .//*[@id=’top-stories’]/div/div[@class=’body’]/div[1]
结果将以虚线突出显示。 清除所有不必要的索引并进行较小的更正之后,我们将得到下一个查询:.//*[@id='top-