php正则表达式解析html
JamesLiu • 2018 年 11 月 25 日
常用表达式
找到某个标签(如table标签)
preg_match("/
去掉tr前标签(其它标签同样适用)
$table = preg_replace("'
$table = preg_replace("'
]*?>'si","",$table);将tr后标签换成其它字符串便于后面解析数组
$table = str_replace("
","{tr}",$table);去掉html标签(全部)
$table = preg_replace("']*?>'si","",$table);
去掉空白字符
$table = preg_replace("'([rn])[s]+'","",$table);
$table = str_replace(" ","",$table);
去掉换行符
$table=preg_replace("/\s/","",$table);
数组用法
将解析得到的html按某个字符串打断
$table = explode('{tr}', $table);
删除数组最后一个元素
array_pop($table);
删除数组第一个元素
array_shift($th);
数组对应键连接
$array1=array("name","school","type");
$array2=array("JamesLiu","华东交通大学","理工类");
$res = array_combine($array1,$array2);
array_combine用的时候注意array1和array2的元素个数相同
CSDN demo(处理表格)
function get_td_array($table) {
$table = preg_replace("'
$table = preg_replace("'
]*?>'si","",$table);$table = preg_replace("'
]*?>'si","",$table);$table = str_replace("
","{tr}",$table);$table = str_replace("","{td}",$table);
//去掉 HTML 标记
$table = preg_replace("']*?>'si","",$table);
//去掉空白字符
$table = preg_replace("'([rn])[s]+'","",$table);
$table = str_replace(" ","",$table);
$table = str_replace(" ","",$table);
$table = explode('{tr}', $table);
array_pop($table);
foreach ($table as $key=>$tr) {
$td = explode('{td}', $tr);
array_pop($td);
$td_array[] = $td;
}
return $td_array;
}
?>