html dom 查找 id,php – 使用Simple HTML DOM Parser按ID查找表

最新推荐文章于 2024-06-11 09:57:04 发布

weixin_39878247

最新推荐文章于 2024-06-11 09:57:04 发布

阅读量249

点赞数

文章标签： html dom 查找 id

去年我写了一个数据库播种器,它刮掉了一个统计网站.在重新访问我的代码后,它似乎不再起作用了,我对它的原因感到有点难过. $html-> find()应该返回找到的元素数组,但是它似乎只是在使用时找到第一个表.

根据文档,我尝试使用find()并指定每个表的ID,但这似乎也失败了.

$table_passing = $html->find('table[id=passing]');

任何人都可以帮我弄清楚这里有什么问题吗？我不知道为什么这些方法都不起作用,页面源清楚地显示了多个表和ID,两种方法都应该起作用.

private function getTeamStats()

{

$url = 'http://www.pro-football-reference.com/years/2016/opp.htm';

$html = file_get_html($url);

$tables = $html->find('table');

$table_defense = $tables[0];

$table_passing = $tables[1];

$table_rushing = $tables[2];

//$table_passing = $html->find('table[id=passing]');

$teams = array();

# OVERALL DEFENSIVE STATISTICS #

foreach ($table_defense->find('tr') as $row)

{

$stats = $row->find('td');

// Ignore the lines that don't have ranks, these aren't teams

if (isset($stats[0]) && !empty($stats[0]->plaintext))

{

$name = $stats[1]->plaintext;

$rank = $stats[0]->plaintext;

$games = $stats[2]->plaintext;

$yards = $stats[4]->plaintext;

// Calculate the Yards Allowed per Game by dividing Total / Games

$tydag = $yards / $games;

$teams[$name]['rank'] = $rank;

$teams[$name]['games'] = $games;

$teams[$name]['tydag'] = $tydag;

}

}

# PASSING DEFENSIVE STATISTICS #

foreach ($table_passing->find('tr') as $row)

{

$stats = $row->find('td');

// Ignore the lines that don't have ranks, these aren't teams

if (isset($stats[0]) && !empty($stats[0]->plaintext))

{

$name = $stats[1]->plaintext;

$pass_rank = $stats[0]->plaintext;

$pass_yards = $stats[14]->plaintext;

$teams[$name]['pass_rank'] = $pass_rank;

$teams[$name]['paydag'] = $pass_yards;

}

}

# RUSHING DEFENSIVE STATISTICS #

foreach ($table_rushing->find('tr') as $row)

{

$stats = $row->find('td');

// Ignore the lines that don't have ranks, these aren't teams

if (isset($stats[0]) && !empty($stats[0]->plaintext))

{

$name = $stats[1]->plaintext;

$rush_rank = $stats[0]->plaintext;

$rush_yards = $stats[7]->plaintext;

$teams[$name]['rush_rank'] = $rush_rank;

$teams[$name]['ruydag'] = $rush_yards;

}

}

解决方法:

我从不使用simplexml或其他衍生物,但是当使用XPath查询来查找ID等属性时,通常会使用@作为前缀,并且应该引用该值 – 因此对于您的情况,它可能是

$table_passing = $html->find('table[@id="passing"]');

使用标准DOMDocument& DOMXPath方法的问题是实际表在源代码中被“注释掉” – 因此html注释的简单字符串替换使得以下工作 – 这可以很容易地适应原始代码.

$url='http://www.pro-football-reference.com/years/2016/opp.htm';

$html=file_get_contents( $url );

/* remove the html comments */

$html=str_replace( array(''), '', $html );

libxml_use_internal_errors( true );

$dom=new DOMDocument;

$dom->validateOnParse=false;

$dom->standalone=true;

$dom->strictErrorChecking=false;

$dom->recover=true;

$dom->formatOutput=false;

$dom->loadHTML( $html );

libxml_clear_errors();

$xp=new DOMXPath( $dom );

$tbl=$xp->query( '//table[@id="passing"]' );

foreach( $tbl as $n )echo $n->tagName.' > '.$n->getAttribute('id');

/* outputs */

table > passing

标签：php,parsing,web-scraping,html,simple-html-dom

来源： https://codeday.me/bug/20190706/1393907.html

weixin_39878247

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
html dom 查找 id,php – 使用Simple HTML DOM Parser按ID查找表

去年我写了一个数据库播种器,它刮掉了一个统计网站.在重新访问我的代码后,它似乎不再起作用了,我对它的原因感到有点难过. $html-> find()应该返回找到的元素数组,但是它似乎只是在使用时找到第一个表.根据文档,我尝试使用find()并指定每个表的ID,但这似乎也失败了.$table_passing = $html->find('table[id=passing]');任何人都可...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。