Python爬取豆瓣网图书评论

最新推荐文章于 2024-07-28 15:46:11 发布

weixin_30907935

最新推荐文章于 2024-07-28 15:46:11 发布

阅读量1.6k

点赞数 1

文章标签： python 爬虫

原文链接：http://www.cnblogs.com/orcaleZhang/p/8903124.html

版权

准备工作

1、进入豆瓣网图书频道：https://book.douban.com

2、寻找感兴趣的图书，进入其页面并查看该图书的评论

3、分析评论数据URL地址特性，得到其共有部分为：https://book.douban.com/subject/book_id/comments?

　　其中book_id为图书在网页地址栏中的编号

编码实现爬虫

# 获取HTML页面
def getHtml(url):
    try:
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        return r.text
    except:
        return ''


# 获取评论
def getComment(html):
    soup = BeautifulSoup(html, 'html.parser')
    comments_list = []  # 评论列表
    comment_nodes = soup.select('.comment > p')
    for node in comment_nodes:
        comments_list.append(node.get_text().strip().replace("\n", "") + u'\n')
    return comments_list


# 获取并将评论保存到文件中
def saveCommentText(fpath):
    pre_url = "https://book.doub

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

weixin_30907935

关注关注

1
点赞
踩
14

收藏

觉得还不错? 一键收藏
0
评论
Python爬取豆瓣网图书评论

准备工作1、进入豆瓣网图书频道：https://book.douban.com2、寻找感兴趣的图书，进入其页面并查看该图书的评论3、分析评论数据URL地址特性，得到其共有部分为：https://book.douban.com/subject/book_id/comments?　　其中book_id为图书在网页地址栏中的编号编码实现爬虫# 获取HTML页面def ge...
复制链接

扫一扫