说明:本页并不是爬取数据 只是用selenium 进行网站自动登录(有验证码) 并获取一个网页所有的信息
from selenium import webdriver
import requests
url=’https://accounts.douban.com/login?alias=&redir=https%3A%2F%2Fwww.douban.com%2F&source=index_nav&error=1001’
driver=webdriver.Chrome()
访问网址
driver.get(url)
响应的内容
response=driver.page_source
possword 对应的id 后面为input 中输入的内容
driver.find_element_by_id(‘password’).send_keys(‘xxx’)
driver.find_element_by_id(‘email’).send_keys(‘xxx’)
在pycharm 后端手动输入验证码的值
res=input(‘请输入验证码:’)
把后端输入的值传到里面进行输入
driver.find_element_by_id(“captcha_field”).send_keys(res)
给登录绑定一个单击事件
driver.find_element_by_class_name(“btn-submit”).click()
登录进去之后有些网页是需要cookie 才能登陆 获取到信息 否则什么也获取不到
cookie_selenium=driver.get_cookies()
cookies=[]
for i in cookie_selenium:
cookie=i[‘name’] +’=’+i[‘value’]
# print(cookie)
cookies.append(cookie)
cookie=’; ‘.join(cookies)
headers={
‘Cookie’:cookie,
“User-Agent”:’Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36’
}
url=’https://www.douban.com/accounts/’
response1=requests.get(url,headers=headers)
with open(‘personsl.html’,’wb’) as ff:
ff.write(response1.content)