Parsing HTML in PHP

Have you ever wanted to get a list of the links contained in a HTML page? Or a list of images, the title or every other non-nested tag for that matter? Then this is the class for you!

Example:

include("phpHTMLParser.php");
$content = file_get_contents("http://www.onderstekop.nl/");
$parser = new phpHTMLParser("$content");
$HTMLObject = $parser->parse_tags(array("a", "title"));
$aTags = $HTMLObject->getTagsByName("a");
foreach ($aTags as $a) {
   if ($a->href != "") {
      echo $a->href . "<br/>";
      echo $a->innerHTML . "<br/><br/>";
   }
}
?>


In this example the parser only keeps track of the 'a' and 'title' tag from which only the 'a' tag object is being requested afterwards. Running this code will parse the HTML page obtained from http://www.onderstekop.nl/, return an object containing all the information you need and output a list of links with their description. This makes the job of dealing with web pages pretty simple, because you can work with a page in an object oriented way instead of having to go through it character by character or with sophisticated and error-prone regular expressions.

Some other features

Each tag object in the object obtained by a getTagsByName call, currently supports href and innerHTML (as shown), but also id, src and innerTag (to get all the attributes as a string).

Another feature, most useful for dumping results and debugging is the output() function available on the object returned by parse() or parse_tags() ($HTMLObject in our example). Furthermore, for even more debugging, you could set $debug=True in the php file itself.

Download phpHTMLParser

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值