获取小说工具

Taurus.W_

已于 2024-02-18 14:48:07 修改

阅读量328

点赞数 8

文章标签： python 爬虫

于 2024-02-18 14:41:59 首次发布

本文链接：https://blog.csdn.net/qq_45023811/article/details/136151722

版权

本文介绍了如何使用Python的requests库发送HTTP请求，并结合lxml解析HTML内容，抓取特定网页上的信息，如标题和段落文本，存储在本地文件中。

摘要由CSDN通过智能技术生成

#如何发送请求
import requests
from lxml import etree

#发送到哪里
url = 'https://www.kunnu.com/douluo/40629.htm'
while True:
    #伪装访问
    headers = {
        'User-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
        'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
        }
    #发送请求
    resp = requests.get(url,headers=headers)
    #设置编码
    resp.encoding = 'utf-8'
    #向应信息
    #print(resp.text)
    e = etree.HTML(resp.text)
    info = '\n'.join(e.xpath('//div[@id="nr1"]/p/text()'))
    title = e.xpath('//h1/text()')[0]
    url = f'https://www.kunnu.com/douluo/{e.xpath("//ul/li[2]/a/@href")[0]}'
    # print(info)
    #print(title)
    #保存数据
    with open('斗罗大陆.txt','w',encoding='utf-8') as f:
        f.write(title+'\n\n'+info+'\n\n')

'''
退出循环break
if url == 'https://www.kunnu.com/douluo/'
    break
'''