xpath 爬取起点小说写入mysql前端

最新推荐文章于 2024-03-24 18:00:20 发布

Cep�Murphy laws

最新推荐文章于 2024-03-24 18:00:20 发布

阅读量314

点赞数

分类专栏：爬虫人工智能

本文链接：https://blog.csdn.net/weixin_44600471/article/details/104656543

版权

人工智能同时被 2 个专栏收录

9 篇文章 0 订阅

订阅专栏

爬虫

8 篇文章 0 订阅

订阅专栏

import requests
import lxml.etree as etree

def getBook(url):
    response = requests.get(url)
    response.encoding='utf-8'
    #print(response)
    # 生成选择器对象
    selector = etree.HTML(response.text)
    #print(selector)
    #xpath 数据 注意 输出类型定位精确
    bookTitles = selector.xpath('//div[@class="book-mid-info"]/h4/a/text()')  #注意etree 需要解析到文本
    bookhrefs = selector.xpath('//div[@class="book-mid-info"]/h4/a/@href')
    authors = selector.xpath('//div[@class="book-mid-info"]/p[1]/a[1]/text()')
    intros = selector.xpath('//div[@class="book-mid-info"]/p[2]/text()')
    n = 1
    for i in range(len(bookTitles)):
        data = {
            'ID':n,
            'Title':bookTitles[i],
            'BookLink':"https:" + bookhrefs[i],
            'Author':authors[i],
            'Intro':intros[i].strip()
        }

        print(data)
        n += 1

#url = 'https://www.qidian.com/rank/collect?chn=21&page=1'
urlList = ['https://www.qidian.com/rank/collect?chn=21&page=' + str(i) for i in range(1,100)]



for url in urlList:
    getBook(url)

Cep�Murphy laws

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
xpath 爬取起点小说写入mysql前端

import requestsimport lxml.etree as etreedef getBook(url): response = requests.get(url) response.encoding='utf-8' #print(response) # 生成选择器对象 selector = etree.HTML(response.text)...
复制链接

扫一扫