应用Firebug来监视网页元素

最新推荐文章于 2019-08-12 02:23:16 发布

streamind_xd

最新推荐文章于 2019-08-12 02:23:16 发布

阅读量678

点赞数

分类专栏：爬虫

爬虫专栏收录该内容

4 篇文章 0 订阅

订阅专栏

前文叙述的是采用Webkit作为scrapy的Downloader。在构建XPath时，我们可以通过Firebug来监视网页元素。

参考这篇文章：http://doc.scrapy.org/en/latest/topics/firefox.html#topics-firefox-addons

Using Firefox for scraping

Since Firefox add-ons operate on a live browser DOM, what you’ll actually see when inspecting the page source is not the original HTML, but a modified one after applying some browser clean up and executing Javascript code. Firefox, in particular, is known for adding <tbody> elements to tables. Scrapy, on the other hand, does not modify the original page HTML, so you won’t be able to extract any data if you use <tbody in your XPath expressions.

Therefore, you should keep in mind the following things when working with Firefox and XPath:

Disable Firefox Javascript while inspecting the DOM looking for XPaths to be used in Scrapy
Never use full XPath paths, use relative and clever ones based on attributes (such as id, class,width, etc) or any identifying features like contains(@href, 'image').
Never include <tbody> elements in your XPath expressions unless you really know what you’re doing

简单翻译：

在使用Scrapy时要注意以下几点：

当探寻DOM找出XPath时，Disable Firefox的Javascript
不要使用完整的XPath路径，而采用相关的基于属性的特征如：id、class等
不要在XPath中使用<tbody>元素

Firebug

Firebug is a widely known tool among web developers and it’s also very useful for scraping. In particular, its Inspect Element feature comes very handy when you need to construct the XPaths for extracting data because it allows you to view the HTML code of each page element while moving your mouse over it.

使用Firebug的监视网页（inspect elements）特性对构建XPath很有帮助。

streamind_xd

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
应用Firebug来监视网页元素

前文叙述的是采用Webkit作为scrapy的Downloader。在构建XPath时，我们可以通过Firebug来监视网页元素。参考这篇文章：http://doc.scrapy.org/en/latest/topics/firefox.html#topics-firefox-addonsUsing Firefox for scrapingSince Firefox add-o
复制链接

扫一扫

专栏目录