Server returned HTTP response code: 403 for URL

nathan0529

已于 2023-04-21 19:01:38 修改

阅读量2k

点赞数 3

文章标签： http java 网络协议

于 2023-04-21 18:59:42 首次发布

本文链接：https://blog.csdn.net/wnsh1990/article/details/130295061

版权

场景现象

在做视频爬取时候写了如下代码，爬取一般的图片可以正常下载

 
	/**
	 * 从网络Url中下载文件
	 *
	 * @param urlStr
	 * @param fileName
	 * @param savePath
	 * @throws IOException
	 */
	public static String downLoadFromUrl(String urlStr, String fileName, String savePath) {
		try {

			URL url = new URL(urlStr);
			HttpURLConnection conn = (HttpURLConnection) url.openConnection();
			// 设置超时间为3秒
            conn.setReadTimeout(1000*60);
            conn.setConnectTimeout(1000*60);

 
            
			// 得到输入流
			InputStream inputStream = conn.getInputStream();
			// 获取自己数组
			byte[] getData = readInputStream(inputStream);

			// 文件保存位置
			File saveDir = new File(savePath);
			if (!saveDir.exists()) {
				saveDir.mkdir();
			}
			File file = new File(saveDir + File.separator + fileName);
			FileOutputStream fos = new FileOutputStream(file);
			fos.write(getData);
			if (fos != null) {
				fos.close();
			}
			if (inputStream != null) {
				inputStream.close();
			}
			// System.out.println("info:"+url+" download success");
			return saveDir + File.separator + fileName;
		} catch (Exception e) {
			e.printStackTrace();
		}
		return "";

	}

	/**
	 * 从输入流中获取字节数组
	 *
	 * @param inputStream
	 * @return
	 * @throws IOException
	 */
	public static byte[] readInputStream(InputStream inputStream) throws IOException {
		byte[] buffer = new byte[1024];
		int len = 0;
		ByteArrayOutputStream bos = new ByteArrayOutputStream();
		while ((len = inputStream.read(buffer)) != -1) {
			bos.write(buffer, 0, len);
		}
		bos.close();
		return bos.toByteArray();
	}

但是爬取某网站视频时候出现403错误

Server returned HTTP response code: 403 for URL

解决过程

查网上资料加以下代码后任然报错

// 防止屏蔽程序抓取而返回403错误
conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36");

网页直接打开链接

403 Forbidden
You don't have permission to access the URL on this server.

denied by Referer ACL

Powered by Tengine
CDN Request Id: 3df1972216820741360505586e

猜测要爬取的网站做了拦截, 尝试在请求头加上对方网站域名，解决问题

     conn.setRequestProperty("Origin", "https://www.duifangfuwu.com");
     conn.setRequestProperty("Referer", "https://www.duifangfuwu.com");

nathan0529

关注

3
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Server returned HTTP response code: 403 for URL

Server returned HTTP response code: 403 for URL
复制链接

扫一扫

Server returned HTTP response code: 403 for URL

场景现象

解决过程

“相关推荐”对你有帮助么？