由于刚学爬虫,本文则是记录学爬虫的经历。
本文为模拟登录系统,然后爬取想要的资料,其中有验证码,可以直接下载。而后的信息则是js动态加载,无法直接用pyquery或者xpath直接获取,对于js动态加载,使用“抓包“
步骤:
1、获取session会话。
2、获取表单,表单的获取需要试错,即随便输入账号、密码、(验证码)然后登录,再取NetWork里面找有Form data(表单)的文件,拿到URL,作为login函数的url,即表单实际的提交网站。
3、获取验证码,验证码的获取可以直接从NetWork里面的验证码url获取,如果后面有时间戳,则删去时间戳。
注:由于是学校的系统,外网可能进不去,主要是为了记录学习爬虫的过程而已,没有其他目的。
代码如下:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2017/12/13 21:23
# @Author : TsungHan Lee
# @File : Oral_login.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2017/12/12 20:38
# @Author : TsungHan Lee
# @File : login.py
import requests
import time
def start_ses():
"""
获取session会话
"""
ses = requests.session()
return ses
def login(ses,username_,pwd,vc):
ses.cookies.update({'JSESSIONID':'450BB24BE1BAD33859B876FB93435682'})
data={
"username":username_,
"password":pwd,
"verifycode":vc
}
url = 'http://222.197.178.226:8080/FLOP/user/json/login.do'
res = ses.post(url,data)
time.sleep(1)
header = {'Connection':'keep-alive',
'Cookie':'JSESSIONID=450BB24BE1BAD33859B876FB93435682',
'Host':'222.197.178.226:8080',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'*******'}
"""
以下为带获取的json信息
"""
appoint_info = ses.get('http://222.197.178.226:8080/FLOP/order/json/listToView.do?userId=40905&type=speaking',headers = header).json()
for i in range(3):
result = str(appoint_info[i]['appoint']['userInfo']['name']) +' '+ str(appoint_info[i]['appoint']['place']) + ' ' +str(appoint_info[i]['appoint']['date']) +' '+ str(appoint_info[i]['appoint']['status'])
print(result)
def get_vcode(ses):
url ='http://222.197.178.226:8080/FLOP/vcodeServlet'
headers = {'Cookie':'JSESSIONID=450BB24BE1BAD33859B876FB93435682',
'Host': '222.197.178.226:8080',
'Referer':'http://222.197.178.226:8080/FLOP/front/login',
'User-Agent':'*******'}
res = requests.get(url,headers=headers)
with open('vcode.jpeg','wb') as f:
f.write(res.content)
def main(ses,username,pwd):
get_vcode(ses)
vc = input('vc:')
login(ses,username,pwd,vc)
if __name__ == '__main__':
ses = start_ses()
username = '2********4'
pwd = '******'
main(ses,username,pwd)
运行结果: