简单爬虫编写Python篇(1)

简单爬虫编写Python篇(1)

使用Python的urllib库中的urlopen从指定的 URL 地址获取网页所有内容
代码如下

#coding=utf-8
import urllib.request
url="http://www.hzau.edu.cn/2014/ch/"
get=urllib.request.urlopen(url).read()
print(get)

爬取结果:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">\n<head>\n  <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />\n<meta http-equiv="X-UA-Compatible" content="IE=9; IE=8; IE=7; IE=EDGE">\n\n  <title>\xe5\x8d\x8e\xe4\xb8\xad\xe5\x86\x9c\xe4\xb8\x9a\xe5\xa4\xa7\xe5\xad\xa6\xe6\xac\xa2\xe8\xbf\x8e\xe4\xbd\xa0</title>\n      <link rel="stylesheet" href="./style/base.css" />\n  <link rel="stylesheet" href="./style/nav.css" />\n  <link rel="stylesheet" href="./style/font.css" />\n \n  <script type="text/javascript" src="./js/jquery.min.js"></script>\n  <script type="text/javascript" src="./js/jquery.slideBox.min.js"></script>\n\n\n  <!--[if IE 6]>\n  <link href="./style/ie6.css" type="text/css" rel="stylesheet" />\n  <script src="./js/DD_belatedPNG_0.0.8a-min.js"></script>\n  <script>\n          DD_belatedPNG.fix(\'img, .sprite, .contact_field, .contact_question, a.ui-dialog-titlebar-close, #bgeffect, #header ,#logo \');\n        </script>\n  <![endif]-->\n\n  <!--[if IE 7]>\n  <link rel="stylesheet" href="./style/ie7.css" />\n  <script src="../js/html5.js"></script>\n  <![endif]-->\n</head>\n<body>\n\n  <div id="bgeffect">\n\n    <div id="layout">\n      <div id="header">\n  <div id="logo" onclick="location.href=\'./\'"></div>\n  <div id="top">\n    <a href="http://www.hzau.edu.cn/2013/home/">\xe8\xae\xbf\xe9\x97\xae\xe6\x97\xa7\xe7\x89\x88</a> | <a href=".//page/sitemap/">\xe7\xbd\x91\xe7\xab\x99\xe5\x9c\xb0\xe5\x9b\xbe</a> | <a href="http://www.hzau.edu.cn/2014/en/">English</a>\n  </div>\n  <div id="search">\n   <form action="http://zhannei.baidu.com/cse/search" method="get" target="_blank" class="bdcs-search-form" id="bdcs-search-form"> <input type="hidden" name="s" value="18271239725310565818" /> <input type="hidden" name="entry" value="1" />               <input type="text" name="q" class="bdcs-search-form-input" id="s" placeholder />\n\n<input type="image" src="./images/btn_search_box.gif" width="27" height="24" alt="Search" title="Search" class="bdcs-search-form-submit" id="go" />         </form>\n  </div>\n</div>\n<!--\xe5\xaf\xbc\xe8\x88\xaa\xe7\x9b\xae\xe5\xbd\x95\xe5\xbc\x80\xe5\xa7\x8b-->\n<ul id="nav">\n  <li class="nosub">\n    <a href="./" class="drop">\xe4\xb8\xbb&nbsp;&nbsp;\xe9\xa1\xb5</a>\n  </li>\n  <li>\n  <a href="./about_hzau/brief/" class="drop">\xe5\xad\xa6\xe6\xa0\xa1\xe6\xa6\x82\xe5\x86\xb5</a>\n    <div class="dropdown_1column">\n      <div class="col_1">\n      <ul class="simple">\n        <li><a href="./about_hzau/brief/">\xe5\xad\xa6\xe6\xa0\xa1\xe7\xae\x80\xe4\xbb\x8b</a></li>\n        <li><a href="./about_hzau/xxzc/">\xe5\xad\xa6\xe6\xa0\xa1\xe7\xab\xa0\xe7\xa8\x8b</a></li>\n        <li><a href="./about_hzau/history/">\xe5\x8e\x86\xe5\x8f\xb2\xe6\xb2\xbf\xe9\x9d\xa9</a></li>\n        <li><a href="./about_hzau/college/">\xe9\x99\xa2\xe7\xb3\xbb\xe8\xae\xbe\xe7\xbd\xae</a></li>\n        <li><a href="./about_hzau/department/">\xe6\x9c\xba\xe6\x9e\x84\xe8\xae\xbe\xe7\xbd\xae</a></li>\n        <li><a href="./about_hzau/LeadershipTeam/">\xe7\x8e\xb0\xe4\xbb\xbb\xe9\xa2\x86\xe5\xaf\xbc</a></li>\n        <li><a href="./about_hzau/Successiveleadership/">\xe5\x8e\x86\xe4\xbb\xbb\xe9\xa2\x86\xe5\xaf\xbc</a></li>\n\n      </ul>\n    </div>\n  </div>\n  </li>\n<li>\n  <a href="./education/rcpyjd/" class="drop">\xe4\xba\xba\xe6\x89\x8d\xe5\x9f\xb9\xe5\x85\xbb</a>\n    <div class="dropdown_2columns">\n      <div class="col_2">\n      <ul class="simple">\n        <li><a href="./education/rcpyjd/">\xe5\x9b\xbd\xe5\xae\xb6\xe4\xba\xba\xe6\x89\x8d\xe5\x9f\xb9\xe5\x85\xbb\xe5\x9f\xba\xe5\x9c\xb0</a></li>\n        <li><a href="./education/chuangxinqu/">\xe5\x9b\xbd\xe5\xae\xb6\xe4\xba\xba\xe6\x89\x8d\xe5\x9f\xb9\xe5\x85\xbb\xe6\xa8\xa1\xe5\xbc\x8f\xe5\x88\x9b\xe6\x96\xb0\xe5\xae\x9e\xe9\xaa\x8c\xe5\x8c\xba</a></li>\n        <li><a href="./education/scholarship/">\xe5\xa5\x96\xe5\xad\xa6\xe9\x87\x91\xef\xbc\x88\xe5\x8a\xa9\xe5\xad\xa6\xe9\x87\x91\xef\xbc\x89</a></li>\n        <li><a href="./education/gjsyjxsfzx/">\xe5\x9b\xbd\xe5\xae\xb6\xe5\xae\x9e\xe9\xaa\x8c\xe6\x95\x99\xe5\xad\xa6\xe7\xa4\xba\xe8\x8c\x83\xe4\xb8\xad\xe5\xbf\x83</a></li>\n        <li><a href="./education/jxcg/">\xe5\x9b\xbd\xe5\xae\xb6\xe7\xba\xa7\xe6\x95\x99\xe5\xad\xa6\xe6\x88\x90\xe6\x9e\x9c</a></li>\n        <li><a href="./education/jpkc/">\xe5\x9b\xbd\xe5\xae\xb6\xe7\xb2\xbe\xe5\x93\x81\xe8\xaf\xbe\xe7\xa8\x8b</a></li>\n        <li><a href="./education/kjcx/">\xe5\xa4\xa7\xe5\xad\xa6\xe7\x94\x9f\xe7\xa7\x91\xe6\x8a\x80\xe5\x88\x9b\xe6\x96\xb0</a></li>\n        <li><a href="./education/xljk/">\xe5\xa4\xa7\xe5\xad\xa6\xe7\x94\x9f\xe5\xbf\x83\xe7\x90\x86\xe5\x81\xa5\xe5\xba\xb7</a></li>\n        <li><a href="./education/jcxy/">\xe6\x9d\xb0\xe5\x87\xba\xe6\xa0\xa1\xe5\x8f\x8b</a></li>\n        <li><a href="./education/ssxz/">\xe8\x8e\x98\xe8\x8e\x98\xe5\xad\xa6\xe5\xad\x90</a></li>\n      </ul>\n    </div>\n  </div>\n</li>\n\n<li>\n  <a href="./admissions_employment/bkzs/" class="drop">\xe6\x8b\x9b\xe7\x94\x9f\xe5\xb0\xb1\xe4\xb8\x9a</a>\n  <div class="dropdown_1column">\n    <div class="col_1">\n      <ul class="simple">\n        <li><a href="./admissions_employment/bkzs/">\xe6\x9c\xac\xe7\xa7\x91\xe7\x94\x9f\xe6\x8b\x9b\xe7\x94\x9f</a></li>\n        <li><a href
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值