java中none applicable_【Java视频教程】day39-??网页抓取??

7de0af9bd913eb1546cc48d3ee455198.png

URLConnection

URL概述

我们平时上网,都要使用浏览器,然后再浏览器的地址栏中输入网址,然后按回车键提交请求;

如果这个网址没有错误,浏览器就能得到服务器返回的数据;

也就是说,网址描述的就是网站上某个资源的地址;

e37a23cab231b0ae4ef11ac3d428650d.png

java中为了能够方便的获取某个网址表示的资源,专门使用一个类对网址这个事物进行了封装,就是URL类;

URI:统一资源标识符;

7f38e8f3d13b4eed724ad5a7f68b7e70.png
https 协议
:// 分隔符
www.baidu.com 主机域名(表示ip)
/img/superlogo_c4d7df0a003d3db9b65e9ef0fe6da1ec.png 资源具体位置
? 分隔符,分割资源路径和查询参数
where=super 查询参数

d50b9d075d71930cea713cdb686b24ea.png

60c586b77b0504775aad221947b154a1.png

20f015b5641f2d547fe5f77a85b506af.png

2e46f1af0e064cf45f84a2ebfb581855.png

7bdb0bb62e084f16371b2795e9e5191f.png

524d797ade76bd1409d37765be745921.png
54a7284d83028bb1e850c9ef68ee6e33.png
https://www.zhihu.com/video/1079479364412014592

URLConnection介绍

概述:

通过URL对象,可以打开一个到这个资源的网络连接;通过这个网络连接就可以从这个资源下载数据;

Java中使用一个类来描述这个连接,就是URLConnection;

6e3fd7a6caddf0f0e294675fba08f49e.png

c1a2336c5c7790914c9486ce0f741872.png

a8a307ecbfc9c0284002ecb078d8204f.png

使用URLConnection访问一个网站,获取反馈信息:

3b9baeeae8536e0e8b86740726e3d433.png
<!doctype html>
<html>
<head>
<meta charset=utf-8 />
<title>自然风光图片 - 自然风景图片 (天堂图片网)</title>
<meta name="description" content="自然风光图片 - 自然风景图片," />
<meta name="keywords" content="" />
<link type="text/css" href="/img/a.css" rel="stylesheet" />
<meta name="applicable-device" content="pc"/>
<link rel="alternate" media="only screen and(max-width: 640px)" href="http://m.ivsky.com/tupian/ziranfengguang/">
<meta name="mobile-agent" content="format=html5; url=http://m.ivsky.com/tupian/ziranfengguang/">
<meta name="mobile-agent" content="format=xhtml; url=http://m.ivsky.com/tupian/ziranfengguang/">
<script type="text/javascript" src="/img/jq.js"></script>
<script type="text/javascript" src="/img/a.js"></script>
</head>
<body>
<div id="header">
	<div class="box">
		<div id="logo"><a href="http://www.ivsky.com">天堂图片网</a></div>
		<ul id="menu">
			<li><a href="/">首页</a></li>
			<li><a href="/tupian/" class="a_now">图片大全</a></li>
			<li><a href="/bizhi/">桌面壁纸</a></li>
		</ul>
		<div id="search">
			<div class="inp"><input type="text" id="ser_inp" class="ser_inp" /></div>
			<div class="inp-btn"><input type="submit" value="" id="ser_btn" class="ser_btn" /></div>
		</div>
		<div id="login">
			<a href="#" id="lb01" rel="nofollow">注册/登录</a>
			<a href="#" id="lb02" rel="nofollow">上传图片</a>
		</div>
	</div>
	<div class="hbg"></div>
</div>

<div class="box">
	<div id="alltop"><script>dy("alltop");</script></div>
	<div id="tplisttop1"><script>dy("tplisttop1");</script></div>
	<div id="tplisttop2"><script>dy("tplisttop2");</script></div>
</div>
<div class="box">
	<div class="pos">当前位置:<a href='http://www.ivsky.com/'>首页</a> > <a href="/tupian/">图片大全</a> > <a href="/tupian/ziranfengguang/">自然风光</a> </div>
	<div class="left">
		<div class="sort">
			<ul class="tpmenu"><li class="s1"><a href="/tupian/">所有分类</a></li><li class="s2on"><a href="/tupian/ziranfengguang/">自然风光</a></li><li class="s3"><a href="/tupian/chengshilvyou/">城市旅游</a></li><li class="s4"><a href="/tupian/dongwutupian/">动物图片</a></li><li class="s5"><a href="/tupian/zhiwuhuahui/">植物花卉</a></li><li class="s6"><a href="/tupian/haiyangshijie/">海洋世界</a></li><li class="s7"><a href="/tupian/renwutupian/">人物图片</a></li><li class="s8"><a href="/tupian/meishishijie/">美食世界</a></li><li class="s9"><a href="/tupian/wupin/">物品物件</a></li><li class="s10"><a href="/tupian/yundongtiyu/">运动体育</a></li><li class="s11"><a href="/tupian/jiaotongyunshu/">交通运输</a></li><li class="s12"><a href="/tupian/jianzhuhuanjing/">建筑环境</a></li><li class="s13"><a href="/tupian/jiaju/">装饰装修</a></li><li class="s14"><a href="/tupian/guanggaosheji/">广告设计</a></li><li class="s15"><a href="/tupian/katongtupian/">卡通图片</a></li><li class="s16"><a href="/tupian/jieritupian/">节日图片</a></li><li class="s17"><a href="/tupian/shejisucai/">设计素材</a></li><li class="s18"><a href="/tupian/yishu/">艺术绘画</a></li><li class="s19"><a href="/tupian/qita/">其他类别</a></li></ul>
		</div>
		 <div class="sline" id="sline2">                 
			<div><b>小分类</b><a href="/tupian/ziranfengjing_t2800/"  title="自然风景图片">自然风景</a> <a href="/tupian/tiankong_t811/"  title="天空图片">天空</a> <a href="/tupian/lantianbaiyun_t1485/"  title="蓝天白云图片">蓝天白云</a> <a href="/tupian/yangguang_t599/"  title="阳光图片">阳光</a> <a href="/tupian/richu_t165/"  title="日出图片">日出</a> <a href="/tupian/wanxia_t84/"  title="晚霞图片">晚霞</a> <a href="/tupian/xiyang_t15/"  title="夕阳图片">夕阳</a> <a href="/tupian/luori_t8846/"  title="落日图片">落日</a> <a href="/tupian/xingkong_t810/"  title="星空图片">星空</a> <a href="/tupian/yekong_t19444/"  title="夜空图片">夜空</a> <a href="/tupian/tudi_t4728/"  title="土地图片">土地</a> <a href="/tupian/gebi_t19445/"  title="戈壁图片">戈壁</a> <a href="/tupian/shamo_t2726/"  title="沙漠图片">沙漠</a> <a href="/tupian/xiagu_t94/"  title="峡谷图片">峡谷</a> <a href="/tupian/shanmai_t95/"  title="山脉图片">山脉</a> <a href="/tupian/shanchuan_t2213/"  title="山川图片">山川</a> <a href="/tupian/shanlin_t19446/"  title="山林图片">山林</a> <a href="/tupian/senlin_t2303/"  title="森林图片">森林</a> <a href="/tupian/shulin_t3166/"  title="树林图片">树林</a> <a href="/tupian/caoyuan_t933/"  title="草原图片">草原</a> <a href="/tupian/tianye_t1472/"  title="田野图片">田野</a> <a href="/tupian/caodi_t2229/"  title="草地图片">草地</a> <a href="/tupian/tianyuan_t3171/"  title="田园图片">田园</a> <a href="/tupian/nongtian_t5097/"  title="农田图片">农田</a> <a href="/tupian/maitian_t268/"  title="麦田图片">麦田</a> <a href="/tupian/caitian_t19447/"  title="菜田图片">菜田</a> <a href="/tupian/daotian_t3887/"  title="稻田图片">稻田</a> <a href="/tupian/titian_t10931/"  title="梯田图片">梯田</a> <a href="/tupian/heliu_t11199/"  title="河流图片">河流</a> <a href="/tupian/xiliu_t169/"  title="溪流图片">溪流</a> <a href="/tupian/pubu_t163/"  title="瀑布图片">瀑布</a> <a href="/tupian/hubo_t11602/"  title="湖泊图片">湖泊</a> <a href="/tupian/daoyu_t503/"  title="岛屿图片">岛屿</a> <a href="/tupian/haidao_t934/"  title="海岛图片">海岛</a> <a href="/tupian/dahai_t660/"  title="大海图片">大海</a> <a href="/tupian/hailang_t499/"  title="海浪图片">海浪</a> <a href="/tupian/haitan_t1225/"  title="海滩图片">海滩</a> <a href="/tupian/shatan_t600/"  title="沙滩图片">沙滩</a> <a href="/tupian/haian_t20/"  title="海岸图片">海岸</a> <a href="/tupian/haijing_t16177/"  title="海景图片">海景</a> <a href="/tupian/chunji_t8857/"  title="春季图片">春季</a> <a href="/tupian/xiaji_t3145/"  title="夏季图片">夏季</a> <a href="/tupian/qiuji_t3374/"  title="秋季图片">秋季</a> <a href="/tupian/dongji_t3144/"  title="冬季图片">冬季</a> <a href="/tupian/siji_t5102/"  title="四季图片">四季</a> <a href="/tupian/tianqiqihou_t19449/"  title="天气气候图片">天气气候</a> <a href="/tupian/bingxuejingse_t19450/"  title="冰雪景色图片">冰雪景色</a> <a href="/tupian/bingshan_t471/"  title="冰山图片">冰山</a> <a href="/tupian/diqiu_t2081/"  title="地球图片">地球</a> <a href="/tupian/shandian_t2740/"  title="闪电图片">闪电</a> <a href="/tupian/bingchuan_t14218/"  title="冰川图片">冰川</a> <a href="/tupian/yuzhou_t3889/"  title="宇宙图片">宇宙</a> <a href="/tupian/daziran_t3005/"  title="大自然图片">大自然</a> <a href="/tupian/weimei_t5095/"  title="唯美图片">唯美</a> <a href="/tupian/qitejingguan_t19451/"  title="奇特景观图片">奇特景观</a> </div>
		</div>
		<div id="tplistleft"><script>dy("tplistleft");</script></div>
		<ul class="ali">
		    <li>
				<div class="il_img"><a href="/tupian/bingxue_v46975/" title="高山冬季冰雪风景图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/16/bingxue-007.jpg" alt="高山冬季冰雪风景图片"></a></div>
				<p><a href="/tupian/bingxue_v46975/" target="_blank">高山冬季冰雪风景图片(14张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/yekong_v46963/" title="风云变幻的夜空图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/15/yekong-009.jpg" alt="风云变幻的夜空图片"></a></div>
				<p><a href="/tupian/yekong_v46963/" target="_blank">风云变幻的夜空图片(10张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/dacaoyuan_v46961/" title="一望无际的大草原图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/15/dacaoyuan-002.jpg" alt="一望无际的大草原图片"></a></div>
				<p><a href="/tupian/dacaoyuan_v46961/" target="_blank">一望无际的大草原图片(11张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/yuhou_de_caocong_v46962/" title="雨后的草丛图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/15/yuhou_de_caocong-006.jpg" alt="雨后的草丛图片"></a></div>
				<p><a href="/tupian/yuhou_de_caocong_v46962/" target="_blank">雨后的草丛图片(10张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/lvyouyou_de_caocong_v46959/" title="绿油油的草丛图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/15/lvyouyou_de_caocong-001.jpg" alt="绿油油的草丛图片"></a></div>
				<p><a href="/tupian/lvyouyou_de_caocong_v46959/" target="_blank">绿油油的草丛图片(12张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/pingtan_de_caodi_v46960/" title="平坦的草地图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/15/pingtan_de_caodi-010.jpg" alt="平坦的草地图片"></a></div>
				<p><a href="/tupian/pingtan_de_caodi_v46960/" target="_blank">平坦的草地图片(11张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/junqiao_de_shanmai_v46952/" title="峻峭的山峰图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/15/junqiao_de_shanmai-010.jpg" alt="峻峭的山峰图片"></a></div>
				<p><a href="/tupian/junqiao_de_shanmai_v46952/" target="_blank">峻峭的山峰图片(11张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/baiyun_v46828/" title="天空中飘动的白云图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/11/baiyun-007.jpg" alt="天空中飘动的白云图片"></a></div>
				<p><a href="/tupian/baiyun_v46828/" target="_blank">天空中飘动的白云图片(10张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/senlin_xiaolu_v46807/" title="森林里幽静的小路图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/10/senlin_xiaolu-012.jpg" alt="森林里幽静的小路图片"></a></div>
				<p><a href="/tupian/senlin_xiaolu_v46807/" target="_blank">森林里幽静的小路图片(13张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/jiebing_de_shuzhi_v46788/" title="结冰的树枝图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/09/jiebing_de_shuzhi-001.jpg" alt="结冰的树枝图片"></a></div>
				<p><a href="/tupian/jiebing_de_shuzhi_v46788/" target="_blank">结冰的树枝图片(11张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/pubu_v46772/" title="山涧中的瀑布图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/09/pubu.jpg" alt="山涧中的瀑布图片"></a></div>
				<p><a href="/tupian/pubu_v46772/" target="_blank">山涧中的瀑布图片(12张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/juanjuan_xishui_v46770/" title="涓涓流淌的溪水图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/09/juanjuan_xishui-007.jpg" alt="涓涓流淌的溪水图片"></a></div>
				<p><a href="/tupian/juanjuan_xishui_v46770/" target="_blank">涓涓流淌的溪水图片(12张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/shulin_xuejing_v46753/" title="树林里的雪景图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/07/shulin_xuejing-002.jpg" alt="树林里的雪景图片"></a></div>
				<p><a href="/tupian/shulin_xuejing_v46753/" target="_blank">树林里的雪景图片(9张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/shamo_v46726/" title="广阔的沙漠风景图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/04/shamo-003.jpg" alt="广阔的沙漠风景图片"></a></div>
				<p><a href="/tupian/shamo_v46726/" target="_blank">广阔的沙漠风景图片(11张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/wuyun_v46717/" title="天空的乌云图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/03/wuyun-002.jpg" alt="天空的乌云图片"></a></div>
				<p><a href="/tupian/wuyun_v46717/" target="_blank">天空的乌云图片(10张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/canyue_v46679/" title="一轮残月图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201801/31/canyue.jpg" alt="一轮残月图片"></a></div>
				<p><a href="/tupian/canyue_v46679/" target="_blank">一轮残月图片(10张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/bingchuan_v46686/" title="美丽的冰川景色图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/01/bingchuan-010.jpg" alt="美丽的冰川景色图片"></a></div>
				<p><a href="/tupian/bingchuan_v46686/" target="_blank">美丽的冰川景色图片(9张)</a></p>
			</li><li>
				<div class="il_img"><a href="/tupian/xueshan_v46687/" title="美丽的雪山景色图片" target="_blank"><img src="http://img.ivsky.com/img/tupian/li/201802/01/xueshan.jpg" alt="美丽的雪山景色图片"></a></div>
				<p><a href="/tupian/xueshan_v46687/" target="_blank">美丽的雪山景色图片(10张)</a></p>
			</li>
		</ul>
		<div id="tplistleft1"><script>dy("tplistleft1");</script></div>
		<div class="pagelist"><span class='page-cur'>1</span><a href='/tupian/ziranfengguang/index_2.html'>2</a><a href='/tupian/ziranfengguang/index_3.html'>3</a><a href='/tupian/ziranfengguang/index_4.html'>4</a><a href='/tupian/ziranfengguang/index_5.html'>5</a><a href='/tupian/ziranfengguang/index_6.html'>6</a><a href='/tupian/ziranfengguang/index_7.html'>7</a><a href='/tupian/ziranfengguang/index_8.html'>8</a><a href='/tupian/ziranfengguang/index_9.html'>9</a><a href='/tupian/ziranfengguang/index_10.html'>10</a><a href='/tupian/ziranfengguang/index_11.html'>11</a><a class='page-next' href='/tupian/ziranfengguang/index_2.html'>下一页</a></div>
		<div id="tplistleft2"><script>dy("tplistleft2");</script></div>
		<div id="tplistleft3"><script>dy("tplistleft3");</script></div>
	</div>
	<div class="right">
		<div id="tplistr1"><script>dy("tplistr1");</script></div>
        <div class="rb">
			<div class="rtit">最近更新的...</div>
			<div class="htag"><a href="/tupian/dianchi_t29713/">滇池图片</a><a href="/tupian/keaide_xueren_t38510/">可爱的雪人图片</a><a href="/tupian/chenchuan_t36836/">沉船图片</a><a href="/tupian/shancun_t37004/">山村图片</a><a href="/tupian/shuying_t8785/">树影图片</a><a href="/tupian/yunxiao_t29697/">云霄图片</a><a href="/tupian/xiafengxing_xiagu_t35996/">狭缝型峡谷图片</a><a href="/tupian/longjuanfeng_t35759/">龙卷风图片</a><a href="/tupian/shangu_t1386/">山谷图片</a><a href="/tupian/ninghua_t22549/">凝华图片</a><a href="/tupian/zhifengche_t6402/">纸风车图片</a><a href="/tupian/dongyuan_t28805/">冻原图片</a><a href="/tupian/feizhuliu_t4775/">非主流图片</a><a href="/tupian/haitan_xiyang_t35789/">海滩夕阳图片</a><a href="/tupian/qiutian_t62/">秋天图片</a><a href="/tupian/hongyanshi_t528/">红岩石图片</a><a href="/tupian/wusong_t42104/">雾淞图片</a><a href="/tupian/haizhongdao_t39000/">海中岛图片</a><a href="/tupian/xingqiu_t812/">星球图片</a><a href="/tupian/daotian_t3887/">稻田图片</a></div>
		</div>
		<div id="tplistr2"><script>dy("tplistr2");</script></div>
	</div>
</div>

<div class="box">
<div id="tplistbtm"><script>dy("tplistbtm");</script></div><div id="tplistbtm2"><script>dy("tplistbtm2");</script></div>
<div id="tppagebtm"><script>dy("tppagebtm");</script></div><div id="tppagebtm2"><script>dy("tppagebtm2");</script></div>
</div>
<div id="footer">
	<div class="box">
		<div id="fl">
			<dl>
				<dt>关于</dt>
				<dd><a href="/about/about.html" rel="nofollow">关于天堂</a></dd>
				<dd><a href="/about/team.html" rel="nofollow">团队成员</a></dd>
				<dd><a href="/about/disclaimer.html" rel="nofollow">免责声明</a></dd>
			</dl>
			<dl>
				<dt>帮助</dt>
				<dd><a href="/about/tougao.html" rel="nofollow">用户投稿</a></dd>
				<dd><a href="/about/faq.html" rel="nofollow">常见问题</a></dd>
			</dl>
			<dl>
				<dt>联系</dt>
				<dd><a href="/about/contact.html" rel="nofollow">联系我们</a></dd>
				<dd><a href="/about/guestbook.html" rel="nofollow">留言反馈</a></dd>
				<dd><a href="/about/ad.html" rel="nofollow">广告投放</a></dd>
			</dl>
			<dl>
				<dt>关注</dt>
				<dd class="sina"><a href="http://weibo.com/ivskycom" target="_blank" rel="nofollow">新浪微博</a></dd>
				<dd class="q"><a href="http://t.qq.com/ivskycom" target="_blank" rel="nofollow">腾讯微博</a></dd>
			</dl>
		</div>
		<div id="fr">
			<p>&copy; 2005-2017 <a href="http://www.ivsky.com/">天堂图片网</a>  <a href="http://www.miibeian.gov.cn/" target="_blank" rel="nofollow">闽ICP备:05021777号</p>
			<p>本站提供的图片仅供学习和交流使用,版权归原作者所有,请勿用于任何商业用途</p>
			<p>更多信息请浏览本站免责声明</p>
		</div>
	</div>
</div>
<script>dy("tbox");</script>
<div id="tj"><script>dy("tj");</script></div>
</body>
</html>
d7d18e143b470d30390aae09b0d70417.png
https://www.zhihu.com/video/1079479612031119360

使用URL实现图片下载

根据网页上的图片地址使用URL和URLConnection下载图片:

  1. 获取这张图片的地址,封装为一个URL对象;
  2. 通过这个URL对象获取一个URLConnection对象;
  3. 从这个Connection对象中打开一个输入流,读取数据,写入本地文件
public class PicDownLoadDemo {
	//演示使用commentsio工具包,简化下载的代码
 public static void main(String[] args) throws Exception {
		//1、根据指定的图片地址,创建一个URL对象
		URL url = new URL("http://img.ivsky.com/img/tupian/pre/201709/25/shandian-004.jpg");
		//2、直接根据URL对象获取输入流
		InputStream is = url.openStream();
		//3、创建一个文件输出流,创建一个关联到本地硬盘的文件输出流,表示下载之后的文件
		FileOutputStream fos = new FileOutputStream("F:2.jpg");
		//4、使用工具包,读写数据
		IOUtils.copy(is, fos);
		//5、关流,释放资源
		fos.close();
		is.close();
	}
}	

635375743efb8f2d019893b72e3edd7d.png

使用URL爬取一个网页上的图片

网页中匹配图片连接的正则表达式:"(http|https)://[^"^(^)^}^>^<^{]+.(jpg|png|jpeg|gif)"

思路:

  1. 创建一个URL对象,表示要爬取图片的那个网页;
  2. 通过上面的到的URL对象,打开到这个网页的连接(URLConnection);
  3. 通过上面获取的连接对象,获取一个输入流,
  4. 通过上面的输入流,获取网页的源代码(html代码,一个大的字符串)
  5. 通过正则表达式,从上面获取的源代码中,找到所有图片的url地址;
  6. 循环遍历上面获取的所有图片的资源地址,一个一个下载图片;
public class PicsDownloadDemo {
	//处理连接对象URLConnection,解决服务器防盗链问题
 public static URLConnection getConn(URL url) throws IOException {
		URLConnection conn = url.openConnection();
		//设置连接参数,伪造浏览器请求
		conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36");
		//设置链接参数,模拟从网址 http://www.mm131.com/xinggan/ 发出请求
		conn.setRequestProperty("Referer", url.getProtocol()+"://"+url.getHost());
 return conn;
	}
 public static void main(String[] args) throws IOException {
//		1、创建一个URL对象,表示要爬取图片的那个网页;
		URL url = new URL("https://image.baidu.com/search/index?tn=baiduimage&ct=201326592&lm=-1&cl=2&ie=gb18030&word=%C3%C0%C5%AE%CD%BC%C6%AC&fr=ala&ala=1&alatpl=adress&pos=0&hs=2&xthttps=111111");
//		2、通过上面的到的URL对象,打开到这个网页的连接(URLConnection);
		URLConnection conn = getConn(url);
//		3、通过上面获取的连接对象,获取一个输入流,
		InputStream input = conn.getInputStream();
//		4、通过上面的输入流,获取网页的源代码(html代码,一个大的字符串)
		String html = IOUtils.toString(input, "utf-8");
//		5、通过正则表达式,从上面获取的源代码中,找到所有图片的url地址;
		String[] picPaths = MyRegexUtil.find(html, "(http|https)://[^"^(^)^}^>^<^{]+.(jpg|png|jpeg|gif)");
//		6、循环遍历上面获取的所有图片的资源地址,一个一个下载图片;
 if(picPaths != null && picPaths.length > 0) {
 for (String string : picPaths) {
 new Thread() {
 public void run() {
 try {
 download(string);
						} catch (IOException e) {
							e.printStackTrace();
						}
					}
				}.start();
			}
		}
	}
 public static void download(String path) throws IOException {
		URL url = new URL(path);
		URLConnection connection = getConn(url);
		InputStream input = connection.getInputStream();
		OutputStream output = new FileOutputStream("F:picdownloads"+MyUUIDUtils.getUUID()+"."+FilenameUtils.getExtension(path));
		IOUtils.copy(input, output);
		input.close();
		output.close();
		System.out.println(path + "下载成功!");
	}
}
爬虫
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值