场景现象
在做视频爬取时候写了如下代码,爬取一般的图片可以正常下载
/**
* 从网络Url中下载文件
*
* @param urlStr
* @param fileName
* @param savePath
* @throws IOException
*/
public static String downLoadFromUrl(String urlStr, String fileName, String savePath) {
try {
URL url = new URL(urlStr);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
// 设置超时间为3秒
conn.setReadTimeout(1000*60);
conn.setConnectTimeout(1000*60);
// 得到输入流
InputStream inputStream = conn.getInputStream();
// 获取自己数组
byte[] getData = readInputStream(inputStream);
// 文件保存位置
File saveDir = new File(savePath);
if (!saveDir.exists()) {
saveDir.mkdir();
}
File file = new File(saveDir + File.separator + fileName);
FileOutputStream fos = new FileOutputStream(file);
fos.write(getData);
if (fos != null) {
fos.close();
}
if (inputStream != null) {
inputStream.close();
}
// System.out.println("info:"+url+" download success");
return saveDir + File.separator + fileName;
} catch (Exception e) {
e.printStackTrace();
}
return "";
}
/**
* 从输入流中获取字节数组
*
* @param inputStream
* @return
* @throws IOException
*/
public static byte[] readInputStream(InputStream inputStream) throws IOException {
byte[] buffer = new byte[1024];
int len = 0;
ByteArrayOutputStream bos = new ByteArrayOutputStream();
while ((len = inputStream.read(buffer)) != -1) {
bos.write(buffer, 0, len);
}
bos.close();
return bos.toByteArray();
}
但是爬取某网站视频时候出现403错误
Server returned HTTP response code: 403 for URL
解决过程
查网上资料加以下代码后任然报错
// 防止屏蔽程序抓取而返回403错误
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36");
网页直接打开链接
403 Forbidden
You don't have permission to access the URL on this server.
denied by Referer ACL
Powered by Tengine
CDN Request Id: 3df1972216820741360505586e
猜测要爬取的网站做了拦截, 尝试在请求头加上对方网站域名,解决问题
conn.setRequestProperty("Origin", "https://www.duifangfuwu.com");
conn.setRequestProperty("Referer", "https://www.duifangfuwu.com");