php 截取其它网站信息

最新推荐文章于 2023-01-22 11:14:20 发布

Rascal_Wei

最新推荐文章于 2023-01-22 11:14:20 发布

阅读量641

点赞数

本文链接：https://blog.csdn.net/weishuxiao1/article/details/38684771

版权

页面当中要插入与之相关的最新动态，几毛钱的小项目不可能真的使用爬虫去爬，百度、谷歌有现成的，移花接木应该可以实现。

但是不幸的是百度对直接使用php语句访问页面做了限制，而谷歌访问不了，总不至于让服务器去翻墙，但是傻帽搜狗可以实现，代码如下：

直接将网页以文件的形式导入进行操作：

file_get_content

<?php

// 从百度搜索当中无法获取 因为百度搜索限制使用代码直接访问
$url = "http://www.sogou.com/web?query=清华大学&ie=utf8";
$contents = file_get_contents ( $url );
// 汉语输出
echo "<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>";

var_dump ( $contents );
// 起始字符位置
$start_ch = strpos ( $contents, '<ul class="str-ul-list new-ul-list">' );
// 结束字符为止
$end_ch = strpos ( $contents, '</ul>', $start_ch ) + 5;
if ($start_ch == false) {
	$new_str = "没有新闻";
} else {
	$new_str = substr ( $contents, $start_ch, $end_ch - $start_ch + 1 );
}
echo "起始字符位置：" . $start_ch . "终止字符位置：" . $end_ch;
echo $new_str . "<br/>";

使用curl操作：

<?php
$curl = curl_init ();
// 设置你需要抓取的URL
curl_setopt ( $curl, CURLOPT_URL, 'http://www.sogou.com/web?query=南京大学&ie=utf8' );
curl_setopt ( $curl, CURLOPT_HEADER, 1 );
curl_setopt ( $curl, CURLOPT_RETURNTRANSFER, 1 );

// 运行cURL，请求网页
$data = curl_exec ( $curl );

// 关闭URL请求
curl_close ( $curl );

// 汉语输出
echo "<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>";

// 起始字符位置
$start_ch = strpos ( $data, '<div class="strBox">' );
// 结束字符为止
$end_ch = strpos (  $data, '</div>', $start_ch ) + 6;
if ($start_ch == false) {
	$new_str = "没有新闻";
} else {
	$new_str = substr (  $data, $start_ch, $end_ch - $start_ch + 1 );
}
echo $new_str . "<br/>";

Call to undefined function curl_init()
取消php.ini中extension=php_curl.dll的注释

抓取页面图片

<?php
function grab_pic($img_url, $save_url) {
	//判断远程服务器上文件是否存在不能够使用file_exist();
	if (@fopen ( $img_url, 'r' )) {
		echo 'File Exits';
		//将图片从url以字符串的形式读入
		$img_data = file_get_contents ( $img_url );
		//写入 .jpg文件
		$pic_state = file_put_contents ( $save_url, $img_data );
		
	} else {
		echo 'File Do Not Exits';
	}
}

$college_id="10026";
$img_url = "http://sinastorage.com/kaoshi.edu.sina.com.cn/college_photo/".$college_id.".jpg";
$save_url= $college_id.".jpg";

//抓取
$img_str = grab_pic ( $img_url, $save_url);

// echo $ima_str;
echo "<img src='$college_id.jpg'/>";