反爬
我们使用flask-login
实现登录验证
新建一个如下的flask项目:
下载包
pip install flask
pip install flask-login
pip install werkzeug
index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>主页</title>
</head>
<body>
<strong>Hello World!</strong>
</body>
</html>
login.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>登录</title>
</head>
<body>
<form action="/login/" method="post">
<p>用户名: </p>
<input type="text" name="username" required>
<br>
<p>密码: </p>
<input type="password" name="password" required>
<br>
<input type="submit" name="okay" value="登录">
</form>
</body>
</html>
web.py
import os
import uuid
from flask import (
render_template, redirect, url_for, request, Flask
)
from flask_login import (
login_required, login_user, logout_user,
LoginManager, UserMixin
)
from werkzeug.security import generate_password_hash, check_password_hash
app = Flask(__name__)
app.secret_key = os.urandom(36) # 设置密钥,建议在开发中使用os.urandom
login_manager = LoginManager()
login_manager.init_app(app) # 设置Flask对象
login_manager.login_view = 'login' # 设置登录页
USERS = [
{
'id': uuid.uuid4(), # 使用uuid1也行
'name': 'admin',
'password': generate_password_hash('abc') # 不要在程序中明文存放用户密码
}
] # 定义全局列表users,在实际开发中建议使用数据库
def create_user(username, password): # 创建用户
user = {
'name': username,
'password': generate_password_hash(password),
'id': uuid.uuid4()
}
USERS.append(user)
def get_user(username): # 根据用户名获得用户记录
for user in USERS:
if user.get('name') == username:
return user
return None
class User(UserMixin): # 继承用户基类UserMixin
def __init__(self, user):
self.username = user.get('name')
self.password_hash = user.get('password')
self.id = user.get('id')
def verify_password(self, password): # 密码验证
if self.password_hash is None:
return False
return check_password_hash(self.password_hash, password)
def get_id(self): # 获取用户ID
return self.id
@staticmethod
def get(user_id): # 根据用户ID获取用户实体,为login_user方法提供支持
if not user_id:
return None
for user in USERS:
if user.get('id') == user_id:
return User(user)
return None
@login_manager.user_loader # 定义获取登录用户的方法
def load_user(user_id):
return User.get(user_id)
@app.route('/login/', methods=['GET', 'POST']) # methods=['GET', 'POST'] 表示接收GET和POST请求
def login(): # 定义login方法,登录user
emsg = ''
if request.method == 'POST': # 用户登录表单
username = request.form.get('username')
password = request.form.get('password')
user_info = get_user(username)
if not user_info:
emsg = '用户不存在!'
else:
user = User(user_info)
if not user.verify_password(password):
emsg = '密码错误!'
else:
login_user(user) # 创建session
return redirect(url_for('index')) # 重定向至主页
return render_template('login.html', emsg=emsg)
@app.route('/')
@login_required # 只有登录用户可访问,未登录用户转到登录页
def index():
return render_template('index.html')
if __name__ == '__main__':
app.run(port=8888)
爬虫
打开http://127.0.0.1:8888/,看到如下界面:
输入admin的密码:
然后看到:
假设我们要爬下Hello World!
,就应该先看URL:
http://127.0.0.1:8888/
然后写代码:
import requests
url = 'http://127.0.0.1:8888/'
response = requests.get(url)
print(response.text)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>登录</title>
</head>
<body>
<form action="/login/" method="post">
<p>用户名: </p>
<input type="text" name="username" required>
<br>
<p>密码: </p>
<input type="password" name="password" required>
<br>
<input type="submit" name="okay" value="登录">
</form>
</body>
</html>
返回了一个登录页,可以看到,由于我们是未登录用户,因此我们的请求被拦截了。
方案
1.form模拟登录
进入http://127.0.0.1:8888/login,开F12,找到NetWork
页并输入密码:
点击登录,看到了一个POST请求:
点开,找到Form Data:
写一个post请求:
import requests
import re
url = 'http://127.0.0.1:8888/login/'
form_data = {
'username': 'admin',
'password': 'abc',
}
requests.post(url, data=form_data) # 发起post请求
# 以上代码相当于你在浏览器中登录了
url = 'http://127.0.0.1:8888/'
response = requests.get(url)
hello_world = re.search('<strong>(.*)</strong>', response.text, re.S)
print(hello_world.group())
<strong>Hello World!</strong>
成功获取。
在实战中的问题:
- 在form表单中密码有js加密
- 由于访问频繁导致账号被封号
2.cookies
先登录,开F12,找到Network页,重新加载一下后点开网页请求:
找到Request Headers中的Cookie:
复制下来:
.eJwdzj0OwyAMQOGrVMwd8A9gcpkKY1vtmjRTlLsXdX9P-q70it2Pd9q---nP9PpY2lJuotgFm3ujmFGL2LQYiB1nVs3dhuqEcFDPlAdjySpYqPHoFlJcK5qAgvUAZyu9Mg1yI2MHEp3FG68Oc_Q8hAgbGfAEqo3SgpyH73_NlR7nMtVKBWDNrsLSrAPBrIoAAqNypPv-AcSpOTc.YSMsAw.Ww0-emAPf_Q8lbv2UqFY6Nh4M0c
写脚本:
import requests
import re
url = 'http://127.0.0.1:8888/'
headers = {
'cookie': 'session=.eJwdzj0OwyAMQOGrVMwd8A9gcpkKY1vtmjRTlLsXdX9P-q70it2Pd9q---nP9PpY2lJuotgFm3ujmFGL2LQYiB1nVs3dhuqEcFDPlAdjySpYqPHoFlJcK5qAgvUAZyu9Mg1yI2MHEp3FG68Oc_Q8hAgbGfAEqo3SgpyH73_NlR7nMtVKBWDNrsLSrAPBrIoAAqNypPv-AcSpOTc.YSMsAw.Ww0-emAPf_Q8lbv2UqFY6Nh4M0c'
}
response = requests.get(url, headers=headers)
hello_world = re.search('<strong>(.*)</strong>', response.text, re.S)
print(hello_world.group())
<strong>Hello World!</strong>
在实战中的问题:
- 在cookies中有js加密
- 由于访问频繁导致账号被封号
使用curl2py
这里再推荐一款神器curl2py
,由CSDN博主小小明开发。
下载:
pip install filestools
找到请求,右键:
执行命令:
curl2py
然后粘贴一下:
#######################################
# The generated by curl2py.
# author:小小明
#######################################
import requests
import json
headers = {
"Connection": "keep-alive",
"Cache-Control": "max-age=0",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Sec-Fetch-Site": "same-origin",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-User": "?1",
"Sec-Fetch-Dest": "document",
"Referer": "http://127.0.0.1:8888/login/",
"Accept-Language": "zh-CN,zh;q=0.9"
}
cookies = {
"session": ".eJwdzj0OwyAMQOGrVMwd8A9gcpkKY1vtmjRTlLsXdX9P-q70it2Pd9q---nP9PpY2lJuotgFm3ujmFGL2LQYiB1nVs3dhuqEcFDPlAdjySpYqPHoFlJcK5qAgvUAZyu9Mg1yI2MHEp3FG68Oc_Q8hAgbGfAEqo3SgpyH73_NlR7nMtVKBWDNrsLSrAPBrIoAAqNypPv-AcSpOTc.YSMsAw.Ww0-emAPf_Q8lbv2UqFY6Nh4M0c"
}
res = requests.get(
"http://127.0.0.1:8888/",
headers=headers,
cookies=cookies
)
print(res.text)
厉害!