先获取整个页面
import requests response_index = requests.get( url='https://dig.chouti.com/', headers={ 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36' } ) print(response_index.text)
print输出效果如下:
初步分析抽屉热搜标题页面,可以看出所有标题位于id为content-list的div下面
我们先解析出所有li标签的位置
soup = BeautifulSoup(response_index.text, 'html.parser') div = soup.find(attrs={'id': 'content-list'})
然后再找出所有的li标签
items = div.find_all(attrs={'class': 'item'})
再分析标题所在的位置
打印出每个标题的id
for item in items:
tag = (item.find(attrs={'class': 'part2'}))
nid = tag.get('share-linkid')
print(nid)
此时,print出所有标题的id
然后对上一篇文章的单个点赞进行for循环就可以了,完整代码如下:
import requests
from bs4 import BeautifulSoup
# 先访问抽屉最热帮,获取cookie(未授权的)
r1 = requests.get(
url='https://dig.chouti.com/',
headers={
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
}
)
r1_cookie_dict = r1.cookies.get_dict()
# 发送用户名和密码认证 + cookie(未授权)
# 注意:防爬虫策略
response_login = requests.post(
url='https://dig.chouti.com/login',
data={
'phone': '8615921302790',
'password': 'a12',
'oneMonth': '1'
},
headers={
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
},
cookies=r1_cookie_dict
)
response_index = requests.get(
url='https://dig.chouti.com/',
headers={
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
}
)
soup = BeautifulSoup(response_index.text, 'html.parser')
div = soup.find(attrs={'id': 'content-list'})
items = div.find_all(attrs={'class': 'item'})
for item in items:
tag = (item.find(attrs={'class': 'part2'}))
nid = tag.get('share-linkid')
print(nid)
# 根据每个新闻id进行点赞
r1 = requests.post(
url='https://dig.chouti.com/link/vote?linksId=%s' % nid,
headers={
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
},
cookies=r1_cookie_dict
)
print(r1.text)
登录上抽屉,查看页面,可以发现已经自动完成单页点赞了
咱们再来看下翻页。
发现了么?一般网站都有这种显眼的规律,而我们再次返回第一页时,发现网址变为https://dig.chouti.com/all/hot/recent/1,所以我们请求主页可以改为它
for page_num in range(1,3): # 对第1到第3页进行点赞
response_index = requests.get(
url='https://dig.chouti.com/all/hot/recent/%s' % page_num,
headers={
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
}
)
soup = BeautifulSoup(response_index.text, 'html.parser')
div = soup.find(attrs={'id': 'content-list'})
items = div.find_all(attrs={'class': 'item'})
for item in items:
tag = (item.find(attrs={'class': 'part2'}))
nid = tag.get('share-linkid')
print(nid)
# 根据每个新闻id进行点赞
r1 = requests.post(
url='https://dig.chouti.com/link/vote?linksId=%s' % nid,
headers={
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
},
cookies=r1_cookie_dict
)
print(r1.text)
效果如下
这个代码有很多可以改善的地方,这里不多讲述。