node爬虫（二）—— 获取小说章节详情

最新推荐文章于 2023-07-19 21:48:16 发布

书剑走江湖

最新推荐文章于 2023-07-19 21:48:16 发布

阅读量367

点赞数

分类专栏： node爬虫文章标签： node.js

本文链接：https://blog.csdn.net/QiuMenglin_/article/details/111667425

版权

node爬虫专栏收录该内容

2 篇文章 0 订阅

订阅专栏

node爬虫（二）—— 获取小说章节详情

/**
 * 获取小说章节详情
 */

//引入模块
const http = require('http')
const fs = require('fs')
const cheerio = require('cheerio')
const iconv = require('iconv-lite')


//第一章url
const url = 'http://www.biquge.com/91_91711/4720017.html'
//开始章节数
let i = 1
//最大获取章节数
let num = 3

function main (url) {
    startRequest(url)
}

function startRequest (url) {
    http.get(url, res => {
        //定义空数组存放html
        const html = []
        res.on('data', (chunk) => {
            //把数据块添加进数组
            html.push(chunk)
        })
        res.on('end', () => {
            //获取数据完毕后，使用iconv-lite转码，decedo中为Buffer对象，Buffer.concat为数组
            const html1 = iconv.decode(Buffer.concat(html), 'utf-8')
            //使用cheerio解析html，cheerio模块的语法跟jQuery基本一样
            const $ = cheerio.load(html1, { decodeEntities: false })
            //处理数据
            const title = $('.bookname h1').text()
            const arr = []
            const content = $("#content").html()
            //分析结构后分割html
            const contentArr = content.split('<br><br>')

            //去除内容的两端空格和 
            contentArr.forEach(elem => {
                const data = trim(elem.toString())
                arr.push(data)
            })

            const bookName = $(".con_top a").eq(1).text()

            //定义存入数据库的对象
            const obj = {
                id: i,
                bookName: bookName,
                title: title,
                content: arr
            }

            //获取当前章节的下一章地址，递归调用fetchPage
            const link = $(".bottem2 a").eq(2).attr('href')
            const nextLink = `http://www.biquge.com/${link}`

            //保存数据
            saveContent(obj, nextLink)
            console.log(`第${i + 1}章：${nextLink}`)
            i++
            if (i <= num) {
                setTimeout(() => {
                    main(nextLink)
                }, 1000)
            }
        })
    })
}

function saveContent (obj, nextLink) {
    console.log(`${i}--${obj.title}`)
    //判断书名文件夹是否存在，不存在则创建
    if (!fs.existsSync(`data/${obj.bookName}`)) {
        fs.mkdirSync(`data/${obj.bookName}`);
        fs.writeFile(`data/${obj.bookName}/chapter.json`, "", (err) => {
            if (err) throw err;
        })
    }

    // 写入json文件
    fs.appendFile(`./data/${obj.bookName}/chapter.json`, JSON.stringify(obj) + ",", 'utf-8', err => {
        if (err) throw err
    })
}

function trim (str) {
    return str.replace(/(^\s*)|(\s*$)/g, '').replace(/ /g, '')
}

main(url)

书剑走江湖

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
node爬虫（二）—— 获取小说章节详情

node爬虫（二）—— 获取小说章节详情/** * 获取小说章节详情 *///引入模块const http = require('http')const fs = require('fs')const cheerio = require('cheerio')const iconv = require('iconv-lite')//第一章urlconst url = 'http://www.biquge.com/91_91711/4720017.html'//开始章节数let
复制链接

扫一扫