yql 提取html,YQL: html table is no longer supported

Thank you very much for your code.

It helped me to create my own script to read those pages which I need. I never programmed PHP before, but with your code and the wisdom of the internet I could change your script to my needs.

PHP

header('Access-Control-Allow-Origin: *'); //all

$url = $_GET['url'];

if (substr($url,0,25) != "https://www.xxxx.yy") {

echo "Only https://www.xxxx.yy allowed!";

return;

}

$xpathQuery = $_GET['xpath'];

//need more hard check for security, I made only basic

function check($target_url){

$check = curl_init();

//curl_setopt( $check, CURLOPT_HTTPHEADER, array("REMOTE_ADDR: $ip", "HTTP_X_FORWARDED_FOR: $ip"));

//curl_setopt($check, CURLOPT_INTERFACE, "xxx.xxx.xxx.xxx");

curl_setopt($check, CURLOPT_COOKIEJAR, 'cookiemon.txt');

curl_setopt($check, CURLOPT_COOKIEFILE, 'cookiemon.txt');

curl_setopt($check, CURLOPT_TIMEOUT, 40000);

curl_setopt($check, CURLOPT_RETURNTRANSFER, TRUE);

curl_setopt($check, CURLOPT_URL, $target_url);

curl_setopt($check, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);

curl_setopt($check, CURLOPT_FOLLOWLOCATION, false);

$tmp = curl_exec ($check);

curl_close ($check);

return $tmp;

}

// get html

$html = check($url);

$dom = new DOMDocument();

@$dom->loadHTML($html);

// apply xpath filter

$xpath = new DOMXPath($dom);

$elements = $xpath->query($xpathQuery);

$temp_dom = new DOMDocument();

foreach($elements as $n) $temp_dom->appendChild($temp_dom->importNode($n,true));

$renderedHtml = $temp_dom->saveHTML();

// return html in json response

// json structure:

// {html: "xxxx"}

$post_data = array(

'html' => $renderedHtml

);

echo json_encode($post_data);

?>

Javascript

$.ajax({

url: "url of service",

dataType: "json",

data: { url: url,

xpath: "//*"

},

type: 'GET',

success: function() {

},

error: function(data) {

}

});

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值