使用php simple html dom parser解析html标签

用了一下

PHP Simple HTML DOM Parser

解析HTML页面,感觉还不错,它能创建一个DOM tree方便你解析html里面的内容。用来抓东西挺好的。

 

附带一个例子,你也到sourceforge下载压缩包看里面的例子:

Scraping data with PHP Simple HTML DOM Parser

Save to StumbleUpon  Stumble Upon it!

 

  Save to Del.icio.us  Save to Del.icio.us   (9 saves)

 

Share on Twitter!  Share on Twitter!

PHP Simple HTML DOM Parser , written in PHP5+, allows you to manipulate HTML in a very easy way. Supporting invalid HTML, this parser is better then other PHP scripts using complicated regexes to extract information from web pages.

Before getting the necessary info, a DOM should be created from either URL or file. The following script extracts links & images from a website:

  1. // Create DOM from URL or file   
  2. $html  = file_get_html( 'http://www.microsoft.com/' );  
  3.   
  4. // Extract links   
  5. foreach ( $html ->find( 'a'as   $element )  
  6.        echo   $element ->href .  '<br>' ;   
  7.   
  8. // Extract images   
  9. foreach ( $html ->find( 'img'as   $element )  
  10.        echo   $element ->src .  '<br>' ;  
// Create DOM from URL or file
$html = file_get_html('http://www.microsoft.com/');

// Extract links
foreach($html->find('a') as $element)
       echo $element->href . '<br>'; 

// Extract images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

The parser can also be used to modify HTML elements:

  1. // Create DOM from string   
  2. $html  = str_get_html( '<div id="simple">Simple</div><div id="parser">Parser</div>' );  
  3.   
  4. $html ->find( 'div' , 1)-> class  =  'bar' ;  
  5.   
  6. $html ->find( 'div[id=simple]' , 0)->innertext =  'Foo' ;  
  7.   
  8. // Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div>   
  9. echo   $html ;  
// Create DOM from string
$html = str_get_html('<div id="simple">Simple</div><div id="parser">Parser</div>');

$html->find('div', 1)->class = 'bar';

$html->find('div[id=simple]', 0)->innertext = 'Foo';

// Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div>
echo $html;

Do you wish to retrieve content without any tags?

  1. echo  file_get_html( 'http://www.yahoo.com/' )->plaintext;  
echo file_get_html('http://www.yahoo.com/')->plaintext;

In the package files of this parser (http://simplehtmldom.sourceforge.net/) you can find some scraping examples from digg, imdb, slashdot. Let’s create one that extracts the first 10 results (titles only) for the keyword “php” from Google:

  1. $url  =  'http://www.google.com/search?hl=en&q=php&btnG=Search' ;  
  2.   
  3. // Create DOM from URL   
  4. $html  = file_get_html( $url );  
  5.   
  6. // Match all 'A' tags that have the class attribute equal with 'l'   
  7. foreach ( $html ->find( 'a[class=l]'as   $key  =>  $info )  
  8. {  
  9. echo  ( $key  + 1). '. ' . $info ->plaintext. "<br />\n" ;  
  10. }  
$url = 'http://www.google.com/search?hl=en&q=php&btnG=Search';

// Create DOM from URL
$html = file_get_html($url);

// Match all 'A' tags that have the class attribute equal with 'l'
foreach($html->find('a[class=l]') as $key => $info)
{
echo ($key + 1).'. '.$info->plaintext."<br />\n";
}

NOTE Make sure to include the parser before using any functions of it:

  1. include   'simple_html_dom.php' ;  
include 'simple_html_dom.php';

For more information regarding the usage of this function consider checking the ‘PHP Simple HTML Dom Parser’ Manual. To download the package files use the following URL: http://sourceforge.net/project/showfiles.php?group_id=218559 .

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值