这是这一个用于备份豆瓣广播的爬虫,用国内的网站是很没安全感的,不小心说错了话账号一秒没。其实很多时候账号是不重要的,重要的是自己发过的内容,这些都是自己的劳动心血啊,也是情感回忆。给自己留一条后路吧
直接上干货
github项目地址:Backup-Douban-Broadcast 备份豆瓣广播
python3源代码
所需的库:1、requests 2、lxml
#!/usr/bin/python3 from lxml import etree import requests import time #伪装用户数据,用户,cookie headers = { 'Uesr-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36', 'Cookie':'bid=HcsfdfDgjY; ll="118283"; _ga=GA1.2.698252965426060945; gr_user_id=55fea3dd-24665146bb-be056d20662f; __utmv=30149280.6427; _vwo_uuid_v2=FA20618BAACsfs063B0F50F1614561268fa58eacb69817156a2; push_doumail_num=0; push_noty_num=0; viewed="26462816"; ap=1; __utmc=30145680; ct=y; ps=y; _gid=GA1.2.654610.1526725718; _pk_ref.100001.8cb4=%5B%22%22%2C%22%22%2C1526752769%2C%22https%3A%2F%2Faccounts.douban.com%2Fsafety%2Funlock_sms%2Fresetpassword%3Fconfirmation%3D5f656742cffec30%26alias%3D%22%5D; _pk_ses.100001.8cb4=*; __utma=30149280.698252912.1506060945.1526737379.1526752773.89; __utmz=30149280.1526752773.89.44.utmcsr=accounts.douban.com|utmccn=(referral)|utmcmd=referral|utmcct=/safety/unlock_sms/resetpassword; __utmt=1; dbcl2="64279887:1ntQKZ/e4dU"; ck=P2O4; _pk_id.100001.8cb4=e7e1a240646f34ee.1506738033.106.1526752928.1526737419.; __utmb=30149280.5.10.1526752773' } #获取网页数据,解析为html def getWeb(page): url = 'https://www.douban.com/people/yekingyan/statuses?p=%s' % page webData = requests.get(url,headers=headers).text s = etree.HTML(webData) #设置一个暂停时间,太快的话,豆瓣会锁号的(不是封号)。 一毛一条解锁短信:) time.sleep(2) #用lxml获得豆瓣广播,广播时间 says = s.xpath('//*[@id="content"]/div/div[1]/div[3]/div/div/div/div[2]/div[1]/blockquote/p/text()') times = s.xpath(