自动登陆抽屉并点赞多页（3）

最新推荐文章于 2024-09-30 20:38:14 发布

weixin_34255793

最新推荐文章于 2024-09-30 20:38:14 发布

阅读量68

点赞数

文章标签： python 爬虫

原文链接：http://www.cnblogs.com/Black-rainbow/p/9216164.html

版权

先获取整个页面

import requests

response_index = requests.get(
    url='https://dig.chouti.com/',
    headers={
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
    }
)

print(response_index.text)

print输出效果如下：

初步分析抽屉热搜标题页面，可以看出所有标题位于id为content-list的div下面

我们先解析出所有li标签的位置

soup = BeautifulSoup(response_index.text, 'html.parser')
div = soup.find(attrs={'id': 'content-list'})

然后再找出所有的li标签

items = div.find_all(attrs={'class': 'item'})

再分析标题所在的位置

打印出每个标题的id

for item in items:
    tag = (item.find(attrs={'class': 'part2'}))
    nid = tag.get('share-linkid')
    print(nid)

此时，print出所有标题的id

然后对上一篇文章的单个点赞进行for循环就可以了，完整代码如下：

import requests
from bs4 import BeautifulSoup

# 先访问抽屉最热帮，获取cookie（未授权的）
r1 = requests.get(
    url='https://dig.chouti.com/',
    headers={
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
    }
)
r1_cookie_dict = r1.cookies.get_dict()

# 发送用户名和密码认证 + cookie（未授权）
# 注意：防爬虫策略
response_login = requests.post(
    url='https://dig.chouti.com/login',
    data={
        'phone': '8615921302790',
        'password': 'a12',
        'oneMonth': '1'
    },
    headers={
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
    },
    cookies=r1_cookie_dict
)


response_index = requests.get(
    url='https://dig.chouti.com/',
    headers={
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
    }
)


soup = BeautifulSoup(response_index.text, 'html.parser')
div = soup.find(attrs={'id': 'content-list'})
items = div.find_all(attrs={'class': 'item'})
for item in items:
    tag = (item.find(attrs={'class': 'part2'}))
    nid = tag.get('share-linkid')
    print(nid)


    # 根据每个新闻id进行点赞
    r1 = requests.post(
        url='https://dig.chouti.com/link/vote?linksId=%s' % nid,
        headers={
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
        },
        cookies=r1_cookie_dict
    )
    print(r1.text)

登录上抽屉，查看页面，可以发现已经自动完成单页点赞了

咱们再来看下翻页。

发现了么？一般网站都有这种显眼的规律，而我们再次返回第一页时，发现网址变为https://dig.chouti.com/all/hot/recent/1，所以我们请求主页可以改为它

for page_num in range(1,3):     # 对第1到第3页进行点赞
    response_index = requests.get(
        url='https://dig.chouti.com/all/hot/recent/%s' % page_num,
        headers={
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
        }
    )


    soup = BeautifulSoup(response_index.text, 'html.parser')
    div = soup.find(attrs={'id': 'content-list'})
    items = div.find_all(attrs={'class': 'item'})
    for item in items:
        tag = (item.find(attrs={'class': 'part2'}))
        nid = tag.get('share-linkid')
        print(nid)


        # 根据每个新闻id进行点赞
        r1 = requests.post(
            url='https://dig.chouti.com/link/vote?linksId=%s' % nid,
            headers={
                'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'
            },
            cookies=r1_cookie_dict
        )
        print(r1.text)

效果如下