https://mp.weixin.qq.com/s/3SI6V8khk3g_1R1Vh19sTw
是对照这个微信公众号里面的文章学习的
一、准备工作
Chromedriver:浏览器驱动,可以理解为一个没有界面的chrome浏览器。
Selenium:用于模拟人对浏览器进行点击、输出、拖拽等操作,就相当于是个人在使用浏览器,也常常用来应付反爬虫措施。
二、抽屉网站点赞机制
三、登录准备
首先需要登录这个地址,然后注册用户名和密码
四、开始使用Selenium操作浏览器自动登录
autoLogin.py
# 自动登录 抽屉网站 并获取cookie
import time
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
# 抽屉账号
username = "your username"
password = "your password"
# url
url = 'https://dig.chouti.com/'
def init():
# 定义全局变量,方便其他模块使用
global browser, wait
# 实例化一个浏览器对象
browser = webdriver.Chrome('./chromedriver')
# 最大窗口
browser.maximize_window()
time.sleep(2)
# 设置等待超时
wait = WebDriverWait(browser, 20)
def login():
"""
登录
:return:
"""
# 打开登录页面
browser.get(url)
# 模拟点击登陆按钮
browser.find_element_by_id("login_btn").click()
# 输入账号密码
browser.find_element_by_name("phone").send_keys(username)
browser.find_element_by_name("password").send_keys(password)
# 点击登陆
time.sleep(2)
click_login_btn_js = 'document.getElementsByClassName("btn-large")[0].click()'
browser.execute_script(click_login_btn_js)
time.sleep(15)
# 获取cookie
get_cookie_js = 'return document.cookie'
cookie = browser.execute_script(get_cookie_js)
print(cookie)
# 保存cookie
with open('cookie.txt', 'w', encoding='utf-8') as f:
f.write(cookie)
# browser.close()
if __name__ == '__main__':
init()
login()
五、自动投票
autoVote.py
# 自动点赞
import time
import requests
from lxml import etree
with open('cookie.txt', 'r') as f:
cookie = f.read()
base_url = 'https://dig.chouti.com/'
# 通过请求查看
header_dict = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "zh-CN,zh;q=0.9",
"Cache-Control": "max-age=0",
"Connection": "keep-alive",
"Host": "dig.chouti.com",
"Referer": "https://dig.chouti.com/",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "same-origin",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
}
r1 = requests.get(url=base_url, headers=header_dict)
r1.encoding = r1.apparent_encoding
html = etree.HTML(r1.content)
# 文章列表
data_id_list = html.xpath("//a[@class='link-title link-statistics']/@data-id")
print(data_id_list)
# 现在这个网站,这个地址隐藏了投票的地址
lick_url = "https://dig.chouti.com/link/vote"
# 添加cookie
header_dict['Cookie'] = f'{cookie}'
for data_id in data_id_list[:10]:
print(data_id)
r1 = requests.post(url=lick_url, headers=header_dict, data={"linkId": data_id})
print(r1.text)
time.sleep(1)
请求header可以到浏览器中查看和复制
六、问题
1. Message: 'chromedriver' executable needs to be in PATH
方案一:
- 下载 chromedriver
- 使用上述方法,在实例化的时候加上chromedriver
方案二: