python爬虫之爬取体锻打卡次数

最新推荐文章于 2022-03-09 15:13:00 发布

Tiger_SM

最新推荐文章于 2022-03-09 15:13:00 发布

阅读量543

点赞数 3

分类专栏： python爬虫文章标签： python pyinstaller 爬虫

本文链接：https://blog.csdn.net/weixin_42428485/article/details/100850762

版权

python爬虫专栏收录该内容

0 篇文章

订阅专栏

python爬虫--爬取体锻打卡次数

思路：
代码：
打包生成可以执行文件：
- pyinstaller安装
- pyinstaller使用
效果：
总结：

思路：

post 保存cookie,get请求html数据，数据处理

代码：

#attendance.py
from bs4 import BeautifulSoup
import http.cookiejar
import urllib 
import urllib.error
opener=None  #全局变量

def getcookie(username,password):
    '''
    !!!get cookie!!!
    '''
    #获取cookie 请求失败报异常，原因：服务器关闭该服务，不在内外
    try:    
        params=urllib.parse.urlencode({"userName":username,"passwd":password}).encode('utf-8')
        url = "http://10.28.102.51/student/checkUser.jsp"
        req=urllib.request.Request(url,data=params) #post方法
        cj=http.cookiejar.CookieJar()
        global opener
        opener=urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
        r=opener.open(req)
    except :
        print('')
        input()
        exit(0)
def getdata(username,password):
	'''
    !!!get data!!!
    '''
    getcookie(username,password)
    #爬数据
    get_url="http://10.28.102.51/student/queryExerInfo.jsp"
    get_request=urllib.request.Request(get_url)   #get方法
    get_response=opener.open(get_request)
    data=get_response.read()
    soup=BeautifulSoup(data,"html.parser")
    tds=soup.find_all("td",attrs={'bgcolor':"#FFFFFF"}) #所有需要的标签刚好都有'bgcolor':"#FFFFFF"
    
    #数据处理，并保存数据到字典，注意<td bgcolor="#FFFFFF">&nbsp;0 次</td>中的不间断空格
    datas={}
    datas["早操"]=tds[2].getText().split(" ")[0][1]
    datas["体育俱乐部考勤"]=tds[4].getText().split(" ")[0][1]
    datas["引体向上考勤"]=tds[6].getText().split(" ")[0][1]
    datas["篮球比赛"]=tds[8].getText().split(" ")[0][1]
    datas["田径"]=tds[10].getText().split(" ")[0][1]
    datas["运动会单项"]=tds[12].getText().split(" ")[0][1]
    datas["竞赛管理"]=tds[14].getText().split(" ")[0][1]
    datas["考勤4"]=tds[16].getText().split(" ")[0][1]
    datas["考勤5"]=tds[18].getText().split(" ")[0][1]
    datas["增加次数"]=tds[20].getText().split(" ")[0][1]
    return datas


def sum(S):
    num=0
    for value in  S.values():
        num+=eval(value)
    return num
if __name__=="__main__":
    print("=====开始查询=====\n=====输入学号=====")
    username=input()
    print("\n=====各项数据=====\n")
    datas=getdata(username,username)    #给出学号，密码开始查询
    for key in datas.keys():
        print(key+":"+datas[key]+"次")
    print("\n总和：{}".format(sum(datas)))
    print("\n=====查询结束=====\n=====回车结束=====")
    input()                            #页面停留

打包生成可以执行文件：

pyinstaller安装

pip安装命令 : pip install pyinstaller 如果pip出错，一般是墙的原因，需要梯子。（也可以再在网上找包手动安装，再者使用豆瓣的镜像这里不鳌述)

pyinstaller使用

在.py代码所在文件夹处，打开命令行，可以通过cd 命令,或者在目标文件夹下按着shift 右键选择在此处打开powershell窗口。

输入pyinstaller -F attendance.py回车，将会生成三个文件夹，.exe在disk文件夹中,其他的文件夹可以删除。
同样可以给.exe文件添加图标。命令: pyinstaller -i main.ico -F attendance.py,
这里要注意一下，ico文件的尺寸有要求的，实测256*256可用，这里推荐两个网站
ico 查找网站
https://www.easyicon.net/
ico 转换网站
https://lvwenhan.com/convertico/

效果：

总结：

学会这个简单的爬虫，基本可以爬取大多数网站，因为大多数网站数据都是直接放在html中的。
有些网站登录需要验证码，就需要图像识别，推荐使用百度的，如果嫌麻烦，可以通过selenium,webdrive模拟浏览器，让用户手动登录。或者干脆到浏览器里手动复制cookie，不过这样代码的寿命就会受限于cookie的寿命。过几天我会写一篇关于如何使用selenium.webdrive的blog。下一篇我会讲解对本篇blog做一个扩展，主要是，如何分析网站，以及如何利用beautifulsoup,以及分析html类型的响应数据，还有json类型的数据，鉴于xml比较少见，就不多说了。
这里给出exe文件https://pan.baidu.com/s/16a9hTaw66Fom3QilaBaOxA