通常,这种页面会加载一堆Javascript(jQuery等),然后构建接口并检索要从数据源显示的数据.
因此,您需要做的是使用Firebug等工具在Firefox或类似页面中打开该页面,以查看实际执行的请求.如果幸运的话,您可以直接在XHR请求列表中找到它.在这种情况下:
http://www.govliquidation.com/json/buyer_ux/salescalendar.js
请注意,此行为可能会侵犯某些许可或使用条款.在继续之前,请与网站管理员/数据源/版权所有者明确这一点:检测并禁止这种抓取非常容易,并且识别您的可能性稍差.
无论如何,如果你在PHP中发出相同的调用,你可以使用非常简单的代码直接刮取数据(假设没有会话/身份验证问题,就像这里的情况一样):
$url = "http://www.govliquidation.com/json/buyer_ux/salescalendar.js";
$json = file_get_contents($url);
$data = json_decode($json);
?>
这将生成一个数据对象,您可以通过简单循环检查并转换为CSV.
stdClass Object
(
[result] => stdClass Object
(
[events] => Array
(
[0] => stdClass Object
(
[yahoo_dur] => 11300
[closing_today] => 0
[language_code] => en
[mixed_id] => 9297
[event_id] => 9297
[close_meridian] => PM
[commercial_sale_flag] => 0
[close_time] => 01/06/2014
[award_time_unixtime] => 1389070800
[category] => Tires, Parts & Components
[open_time_unixtime] => 1388638800
[yahoo_date] => 20140102T000000Z
[open_time] => 01/02/2014
[event_close_time] => 2014-01-06 17:00:00
[display_event_id] => 9297
[type_code] => X3
[title] => Truck Drive Axles @ Killeen, TX
[special_flag] => 1
[demil_flag] => 0
[google_close] => 20140106
[event_open_time] => 2014-01-02 00:00:00
[google_open] => 20140102
[third_party_url] =>
[bid_package_flag] => 0
[is_open] => 1
[fda_count] => 0
[close_time_unixtime] => 1389045600
您检索$data-> result->事件,在其转换为数组形式的项目上使用fputcsv(),Bob是您的叔叔.