用python爬虫来爬华科宿舍查电费

准备工作:

  • 华科查电费的网址:”http://202.114.18.218/main.aspx

  • 用(谷歌)浏览器访问网址,右键打开’检查’,审查该网址的元素,查看检查框中的NetWork选项。通过尝试人工进行电费查询,来查找Request请求的url和请求时所带的数据,如下列图:

  • 这里写图片描述

  • 这里写图片描述

  • 从上面图来看,我们知道请求时所带的数据除了我们所选择的楼层信息外,还有其他两个奇怪的信息:__EVENTVALIDATION 和 __VIEWSTATE 。这给我们的爬虫带来一定的麻烦。

  • 学会使用python的urllib.request库和BeautifulSoup库

代码部分 :

import urllib.request
from bs4 import BeautifulSoup
import urllib.error
#环境:phthon3.5.1


#请求的链接
url = "http://202.114.18.218/main.aspx"

#请求的头信息
head = {}
head['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36'

#请求所带的数据:

data = {}
data['programId']='东区'
data['txtyq']='沁苑东十舍'
data['txtld']='1层'
data['Txtroom']='120'
data['ImageButton1.x']='56'
data['ImageButton1.y']='13'

data['TextBox2']='2016-3-6 7:08:35'
data['TextBox3']='8.7'

data['__EVENTTARGET']=''
data['__EVENTARGUMENT']=''
data['__LASTFOCUS']=''

data['__EVENTVALIDATION']='/wEWKgKcy5/7BALorceeCQLc1sToBgL+zqXMDgK50MfoBgKhi6GaBQLdnbOlBgLtuMzrDQLrwqHzBQKX+9a3BALahOrMBwLahO6ZAQLahOLMBwLahMqFAQLahJKFAQLahKrHBwLahI6ZAQLahP6FAQKsioTXAwL4w577DwKH0cqFAQKVre6ZAQKVrZKFAQKVrf6FAQK/yONFArDhhMwNArvghMwNAr/I87wGAqTghMwNApSUsNoIAoOU+OMOAoKU+OMOAoGU+OMOAoCU+OMOAoeU+OMOAoaU+OMOAo+UvJ4CAvrV2qsGAtLCmdMIAtLC1eQCAuzR9tkMAuzRirUFOLHrHczOAnP0f8H0iOrdziJ/rT8='

data['__VIEWSTATE']='/wEPDwULLTEyNjgyMDA1OTgPZBYCAgMPZBYOAgEPEA8WBh4NRGF0YVRleHRGaWVsZAUM5qW85qCL5Yy65Z+fHg5EYXRhVmFsdWVGaWVsZAUM5qW85qCL5Yy65Z+fHgtfIURhdGFCb3VuZGdkEBUHBuS4nOWMugznlZnlrabnlJ/mpbwG6KW/5Yy6DOmfteiLkeS6jOacnwzpn7Xoi5HkuIDmnJ8G57Sr6I+YCy3or7fpgInmi6ktFQcG5Lic5Yy6DOeVmeWtpueUn+alvAbopb/ljLoM6Z+16IuR5LqM5pyfDOmfteiLkeS4gOacnwbntKvoj5gCLTEUKwMHZ2dnZ2dnZxYBZmQCBQ8QDxYGHwAFBualvOWPtx8BBQbmpbzlj7cfAmdkEBUUCeS4nOWFq+iIjQnkuJzkuozoiI0J5Lic5YWt6IiNCeS4nOS4g+iIjQnkuJzkuInoiI0J5Lic5Zub6IiNCeS4nOS6lOiIjQnkuJzkuIDoiI0P6ZmE5Lit5a6e6aqM5qW8DOmZhOS4reS4u+alvAnmlZnkuIPoiI0J5Y2X5LqM6IiNCeWNl+S4ieiIjQnljZfkuIDoiI0P5rKB6IuR5Lic5Lmd6IiNEuaygeiLkeS4nOWNgeS6jOiIjRLmsoHoi5HkuJzljYHkuInoiI0P5rKB6IuR5Lic5Y2B6IiNEuaygeiLkeS4nOWNgeS4gOiIjQst6K+36YCJ5oupLRUUCeS4nOWFq+iIjQnkuJzkuozoiI0J5Lic5YWt6IiNCeS4nOS4g+iIjQnkuJzkuInoiI0J5Lic5Zub6IiNCeS4nOS6lOiIjQnkuJzkuIDoiI0P6ZmE5Lit5a6e6aqM5qW8DOmZhOS4reS4u+alvAnmlZnkuIPoiI0J5Y2X5LqM6IiNCeWNl+S4ieiIjQnljZfkuIDoiI0P5rKB6IuR5Lic5Lmd6IiNEuaygeiLkeS4nOWNgeS6jOiIjRLmsoHoi5HkuJzljYHkuInoiI0P5rKB6IuR5Lic5Y2B6IiNEuaygeiLkeS4nOWNgeS4gOiIjQItMRQrAxRnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZxYBAhFkAgkPEA8WBh8ABQnmpbzlsYLlj7cfAQUJ5qW85bGC5Y+3HwJnZBAVBwQx5bGCBDLlsYIEM+WxggQ05bGCBDXlsYIENuWxggst6K+36YCJ5oupLRUHBDHlsYIEMuWxggQz5bGCBDTlsYIENeWxggQ25bGCAi0xFCsDB2dnZ2dnZ2dkZAITDw8WAh4EVGV4dAUQMjAxNi0zLTYgNzozMjowMmRkAhUPDxYCHwMFBDMzLjdkZAIXDzwrAA0CAA8WBB8CZx4LXyFJdGVtQ291bnQCB2QMFCsAAhYIHgROYW1lBQzmioTooajmlbDmja4eCklzUmVhZE9ubHloHgRUeXBlGSlbU3lzdGVtLkRlY2ltYWwsIG1zY29ybGliLCBWZXJzaW9uPTIuMC4wLjAsIEN1bHR1cmU9bmV1dHJhbCwgUHVibGljS2V5VG9rZW49Yjc3YTVjNTYxOTM0ZTA4OR4JRGF0YUZpZWxkBQzmioTooajmlbDmja4WCB8FBQzmioTooajml7bpl7QfBmgfBxkpXFN5c3RlbS5EYXRlVGltZSwgbXNjb3JsaWIsIFZlcnNpb249Mi4wLjAuMCwgQ3VsdHVyZT1uZXV0cmFsLCBQdWJsaWNLZXlUb2tlbj1iNzdhNWM1NjE5MzRlMDg5HwgFDOaKhOihqOaXtumXtBYCZg9kFhACAQ9kFgRmDw8WAh8DBQQzMy43ZGQCAQ8PFgIfAwUQMjAxNi0zLTYgNzozMjowMmRkAgIPZBYEZg8PFgIfAwUEMzYuMWRkAgEPDxYCHwMFEDIwMTYtMy01IDc6MzE6NTlkZAIDD2QWBGYPDxYCHwMFBDM3LjZkZAIBDw8WAh8DBRAyMDE2LTMtNCA3OjMyOjAzZGQCBA9kFgRmDw8WAh8DBQQzOS4wZGQCAQ8PFgIfAwUQMjAxNi0zLTMgNzozMjoyNGRkAgUPZBYEZg8PFgIfAwUENDEuNmRkAgEPDxYCHwMFEDIwMTYtMy0yIDc6MzI6MDlkZAIGD2QWBGYPDxYCHwMFBDQyLjZkZAIBDw8WAh8DBRAyMDE2LTMtMSA3OjMxOjQwZGQCBw9kFgRmDw8WAh8DBQQ0My41ZGQCAQ8PFgIfAwURMjAxNi0yLTI5IDc6MzI6MTVkZAIIDw8WAh4HVmlzaWJsZWhkZAIZDzwrAA0CAA8WBB8CZx8EAgFkDBQrAAMWCB8FBQzlhYXlgLznlLXph48fBmgfBxkrBB8IBQzlhYXlgLznlLXph48WCB8FBQzlrp7mlLbnlLXotLkfBmgfBxkrBB8IBQzlrp7mlLbnlLXotLkWCB8FBQzotK3nlLXml7bpl7QfBmgfBxkrBR8IBQzotK3nlLXml7bpl7QWAmYPZBYEAgEPZBYGZg8PFgIfAwUEODQuMGRkAgEPDxYCHwMFBzUwLjAwMDBkZAICDw8WAh8DBRIyMDE2LTItMTcgMTc6MTk6NDJkZAICDw8WAh8JaGRkGAMFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYCBQxJbWFnZUJ1dHRvbjEFDEltYWdlQnV0dG9uMgUJR3JpZFZpZXcxDzwrAAoBCAIBZAUJR3JpZFZpZXcyDzwrAAoBCAIBZLUoV78/KqHO6pxcUsDjqGujVf0f'

    #数据解析
data = urllib.parse.urlencode(data).encode('utf-8')

#生成请求
req = urllib.request.Request(url,data,head)

#获取并解析请求得到的回复
try:
    response = urllib.request.urlopen(req)
except urllib.error.URLError as e:
    print(e.reason)
else:
    #对回复读取并解码
    html = response.read().decode('utf-8')

    #print(html)

    #通过BeautifulSoup来解析html
    soup =BeautifulSoup(html,"html.parser")

    rest2 = soup.findAll('table',attrs={"rules" : "all"})
    r = rest2[0].findAll('td')
    for e in r:
        print(e.string)

后续:

  • 几乎查询每个不同的宿舍时,可能会对应到不同的__EVENTVALIDATION 和 __VIEWSTATE,经个人的抽样测验,输入条件正确的情况下,上面那对__EVENTVALIDATION 和 __VIEWSTATE 对“沁苑东十舍”的宿舍查询几乎都是可以的。
项目地址:地址
  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值