将网页表格的内容提取出来

继昨天的网页抓取之后,后续的处理函数
<?php 
function get_td_array($table) { $table = preg_replace("/<table[^>]*?>/is","",$table); $table = preg_replace("/<tr[^>]*?>/si","",$table); $table = preg_replace("/<td[^>]*?>/si","",$table); $table = str_replace("</tr>","{tr}",$table); $table = str_replace("</td>","{td}",$table); $table = str_replace(" ","",$table); $table = preg_replace("'<[/!]*?[^<>]*?>'si","",$table); $table = preg_replace("'([rn])[s]+'","",$table); $table = str_replace(" ","",$table); $table = str_replace(" ","",$table); $table = explode('{tr}', $table); array_pop($table); foreach ($table as $key=>$tr) { $td = explode('{td}', $tr); $td = explode('{td}', $tr);//这个函数是将字符串转为数组,{td}为分隔符 array_pop($td); $td_array[] = $td; } return $td_array; } ?> 我们教务处的函数 <?php $url = "http://202.119.81.118:7777/pls/wwwxk/xk.login";//output.php为接受文件,内容为print_r($_POST) /*$post_data = array ( "pwuser" => "jying1314", "pwpwd" => "fff12138" );*/ $cookie_file=tempnam('./temp','cookie'); $post_fields='stuid=学号&pwd=密码'; $ch = curl_init(); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_URL, $url);//要访问的地址 curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);//执行结果是否被返回,0是返回,1是不返回 curl_setopt($ch, CURLOPT_POST, 1);// 发送一个常规的POST请求 curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);//POST提交的数据包 curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file); curl_setopt($ch, CURLOPT_TIMEOUT, 30);//设置超时 $output = curl_exec($ch);//执行并获取数据 curl_close($ch); // var_dump($output); $url='http://202.119.81.118:7777/pls/wwwxk/xk.CourseView'; $ch = curl_init($url); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file); $contents = curl_exec($ch); $table = preg_replace("/<TABLE[^>]*?>/is","",$contents); $table = preg_replace("/<TR[^>]*?>/is","",$table); $table = preg_replace("/<TD[^>]*?>/is","",$table); $table = str_replace("</TR>","{tr}",$table); $table = str_replace("</TH>","{tr}",$table); $table = str_replace("</TD>","{td}",$table); $table = str_replace(" ","",$table); $table = preg_replace("'<[/!]*?[^<>]*?>'si","",$table); $table = preg_replace("'([rn])[s]+'","",$table); $table = str_replace(" ","",$table); $arr = explode('{tr}', $table); array_pop($arr); // print_r($arr); foreach ($arr as $key=>$tr) { $td = explode('{td}', $tr); $td = explode('{td}', $tr); array_pop($td); $td_array[] = $td; } // print_r($td_array); // echo "<br>"; // echo $td_array[12][0]; //第二个参数表示列,第一个表示行,且两个一跳 // if(empty($td_array[12][0])){ // echo "这个为空值"; // }else{echo "NO";} for($b=0;$b<=6;$b++){ $c=$b+1; echo "weekend".$c."<br>"; for($i=10; $i<=20; $i=$i+2) {if(!empty($td_array[$i][$b])) { $d=($i-8)/2; echo $d.$td_array[$i][$b]; echo "<br>";}} } /*echo $td_array[1][2]; echo $td_array[1][3]; echo $td_array[1][4]; echo $td_array[1][5];*/ /*$mode="/./"; $string="google"; if(preg_match($mode,$contents,$arr)) {echo "匹配成功".$arr[0]; print_r($arr); } else {echo "匹配不成功";} curl_close($ch);*/ ?>
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值