豆瓣图书TOP250爬虫项目

最新推荐文章于 2024-07-27 12:20:46 发布

itmei

最新推荐文章于 2024-07-27 12:20:46 发布

阅读量586

点赞数

分类专栏：爬虫文章标签： python 爬虫

本文链接：https://blog.csdn.net/itmei/article/details/90552855

版权

人生中第一个爬虫项目，嘻嘻
使用python中的requests爬取页面，BeautifulSoup解析页面
具体代码如下，供大家参考

# -*- coding: utf-8 -*-
"""
Created on Sat May 25 19:58:21 2019

@author: Administrator
"""

import requests
from bs4 import BeautifulSoup

#解析页面
def html_parse():
    for url in get_page():
        resp = requests.get('https://book.douban.com/top250?start=0')
        #设置一个soup对象
        soup = BeautifulSoup(resp.text, 'lxml')
        #获取书名
        alldiv = soup.find_all('div', class_='pl2')
        names = [div.find('a')['title'] for div in alldiv]
        #获取作者
        allp = soup.find_all('p', class_='pl')
        authors = [p.text.split('/')[0] for p in allp]
        # 评分
        starspan = soup.find_all('span', class_='rating_nums')
        scores = [s.get_text() for s in starspan]
        # 简介

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

itmei

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
豆瓣图书TOP250爬虫项目

@[TO`# -- coding: utf-8 --“”"Created on Sat May 25 19:58:21 2019@author: Administrator“”"import requestsfrom bs4 import BeautifulSoup#解析页面def html_parse():for url in get_page():resp = reques...
复制链接

扫一扫