爬取豆瓣《毒木圣经》短书评前50条及其评分

最新推荐文章于 2022-02-18 07:36:21 发布

ysw116

最新推荐文章于 2022-02-18 07:36:21 发布

阅读量501

点赞数

分类专栏： Python 文章标签： python 爬虫书评

本文链接：https://blog.csdn.net/ysw116/article/details/82927781

版权

Python 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

自己写的代码：

import requests
import re
import time
from bs4 import BeautifulSoup

url = ["https://book.douban.com/subject/26630480/comments/hot?p=" + str(i) for i in range(1,5)]
count1, count2 = 1,1
s = 0

for i in range(1,5):
    requset = requests.get(url[i-1])
    soup = BeautifulSoup(requset.text, "lxml")
    comments = soup.find_all('span', 'short')
    for comment in comments:
        if count1 < 50:
            with open("26630480comments.txt", "a", encoding="utf-8")as f:
                f.write(comment.string + "\n")
                count1 += 1
    pattern = re.compile('<span class="user-stars allstar(.*?) rating"')
    p = re.findall(pattern, requset.text)
    for star in p:
        s += int(star)
        count2 += 1
        if count2 == 50:
            print(s/count2)
    time.sleep(5)

别人写的代码：

import requests, re, time
from bs4 import BeautifulSoup

count = 0
i = 0
s, count_s = 0, 0
while count < 50:
    try:
        r = requests.get('https://book.douban.com/subject/26630480/comments/hot?p=' + str(i+1))
    except Exception as err:
        print(err)
        break
    soup = BeautifulSoup(r.text, 'lxml')
    comments = soup.find_all('span', 'short')
    for item in comments:
        count = count + 1
        print(count, item.string)
        if count == 50:
            break
        pattern = re.compile('<span class="user-stars allstar(.*?) rating"')
        p = re.findall(pattern, r.text)
        for star in p:
            count_s = count_s + 1
            s += int(star)
    time.sleep(5) # delay request from douban's robots.txt
    i += 1
    if count == 50:
        print('\n平均分是：%d' %(s // count_s))

ysw116

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
爬取豆瓣《毒木圣经》短书评前50条及其评分

自己写的代码：import requestsimport reimport timefrom bs4 import BeautifulSoupurl = ["https://book.douban.com/subject/26630480/comments/hot?p=" + str(i) for i in range(1,5)]count1, count2 = 1,1s =...
复制链接

扫一扫