工作室课题-第四周笔记

最新推荐文章于 2024-07-29 14:37:54 发布

m0_58881355

最新推荐文章于 2024-07-29 14:37:54 发布

阅读量72

点赞数

文章标签： python

本文链接：https://blog.csdn.net/m0_58881355/article/details/124025722

版权

本周代码练习顺利通过
贪心算法

而且一直在想用python爬虫
附上代码
爬取书城三国演义
数据分析：bs4

import requests
from bs4 import BeautifulSoup
if __name__ == '__main__':
    url = 'https://www.shicimingju.com/book/sanguoyanyi.html'
    headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36 Edg/100.0.1185.29'
    }
    page_text = requests.get(url=url,headers=headers).content
    #print(page_text)
    soup = BeautifulSoup(page_text,'lxml')
    li_lst = soup.select('.book-mulu > ul > li')
    fp = open('sanguoyanyi.text', 'w', encoding='utf-8')
    for li in li_lst:
        title = li.a.string
        detail_url = 'http://www.shicimingju.com' + li.a['href']
        detail_page_text = requests.get(url = detail_url,headers=headers).content
        detail_soup = BeautifulSoup(detail_page_text,'lxml')
        div_tag = detail_soup.find('div',class_='chapter_content')
        content = div_tag.text
        fp.write(title+':'+content+'\n')
        print(title+'爬取成功')
    fp.close()
    print('三国演义已全部爬取成功!!!')

m0_58881355

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
工作室课题-第四周笔记

本周代码练习顺利通过贪心算法而且一直在想用python爬虫附上代码爬取书城三国演义数据分析：bs4import requestsfrom bs4 import BeautifulSoupif __name__ == '__main__': url = 'https://www.shicimingju.com/book/sanguoyanyi.html' headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0
复制链接

扫一扫