20200223——起点文学免费小说爬取

这个单子爬取的是起点文学的免费小说,最开始由于只传入两个参数,我就手动了,结果坑参了,只要手动量大于50的一定要写代码完成!!!!!!!!!
在这里插入图片描述

from lxml import etree
import requests
import pandas as pd
#20*5
urls='https://www.qidian.com/free/all?orderId=&vip=hidden&style=1&pageSize=20&siteid=1&pubflag=0&hiddenField=1&page=1'
res=requests.get(url=urls).content.decode('utf-8')
ele=etree.HTML(res)
#文章标题,可以用来下一环节的目录名称
name=ele.xpath('//*[@id="free-channel-wrap"]/div/div/div[2]/div[2]/div/ul/li/div[2]/h4/a/text()')
//*[@id="free-channel-wrap"]/div/div/div[2]/div[2]/div/ul/li[1]/div[2]/h4/a
#可以获取用来下一个页面的传入参数
next_url=ele.xpath('//*[@id="free-channel-wrap"]/div/div/div[2]/div[2]/div/ul/li/div[2]/h4/a/@href')
#以获取用来下一个页面的传入参数(数字)
next_url2=['http:'+ ele +'#Catalog' for ele in next_url]
#小说的作者
author=ele.xpath('//*[@id="free-channel-wrap"]/div/div/div[2]/div[2]/div/ul/li/div[2]/p[1]/a[1]/text()')
#小说分类
label1=ele.xpath('//*[@id="free-channel-wrap"]/div/div/div[2]/div[2]/div/ul/li/div[2]/p[1]/a[2]/text()')
label2=ele.xpath('//*[@id="free-channel-wrap"]/div/div/div[2]/div[2]/div/ul/li/div[2]/p[1]/a[3]/text()')
filename = 'test.txt'
with open(filename,'w') as f: # 如果filename不存在会自动创建, 'w'表示写数据,写之前会清空文件中的原有数据!
    a=("书名").format(name)
    f.write(a)
    f.write("\n--------------------------")
with open(filename,'a') as f: # 'a'表示append,即在原来文件内容后继续写数据(不清楚原有数据)
    b=('英语平均分: {0}').format(int(test['英语'].mean()))
    f.write("\n")
    f.write(b)
import urllib.request
for i in range(1,6):
    urls='https://www.qidian.com/free/all?orderId=&vip=hidden&style=1&pageSize=20&siteid=1&pubflag=0&hiddenField=1&page={0}'.format(i)
    res=requests.get(url=urls).content.decode('utf-8')
    ele=etree.HTML(res)
    #文章标题,可以用来下一环节的目录名称
    for i in range(1,21):
        name=ele.xpath('//*[@id="free-channel-wrap"]/div/div/div[2]/div[2]/div/ul/li[{0}]/div[2]/h4/a/text()'.format(i))
        print(name)
        #小说的作者
        next_url=ele.xpath('//*[@id="free-channel-wrap"]/div/div/div[2]/div[2]/div/ul/li[{0}]/div[2]/h4/a/@href'.format(i))
        print(next_url)
        author=ele.xpath('//*[@id="free-channel-wrap"]/div/div/div[2]/div[2]/div/ul/li[{0}]/div[2]/p[1]/a[1]/text()'.format(i))
        #小说分类
        label1=ele.xpath('//*[@id="free-channel-wrap"]/div/div/div[2]/div[2]/div/ul/li[{0}]/div[2]/p[1]/a[2]/text()'.format(i))
        label2=ele.xpath('//*[@id="free-channel-wrap"]/div/div/div[2]/div[2]/div/ul/li[{0}]/div[2]/p[1]/a[3]/text()'.format(i))
#         filename = '{0}.txt'.format(name)
#         with open(filename,'w') as f: # 如果filename不存在会自动创建, 'w'表示写数据,写之前会清空文件中的原有数据!
#             a=("书名{0}******作者名{1}******分类{2}{3}").format(name,author,label1,label2)
#             f.write(a)
#             f.write("\n--------------------------")
#     #     #小说的封面
        img=ele.xpath('//*[@id="free-channel-wrap"]/div/div/div[2]/div[2]/div/ul/li[{0}]/div[1]/a/img/@src'.format(i))
        img2=['http:'+ ele for ele in img][0]
        print(img2)
        path='image/'
        urllib.request.urlretrieve(img2,''.join([path,'{0}.jpg']).format(name))
['数风流人物']
['//book.qidian.com/info/1017596768']
http://bookcover.yuewen.com/qdbimg/349573/1017596768/150
['盖世双谐']
['//book.qidian.com/info/1017371184']
http://bookcover.yuewen.com/qdbimg/349573/1017371184/150
['我是神话创世主']
['//book.qidian.com/info/1017514184']
http://bookcover.yuewen.com/qdbimg/349573/1017514184/150
['重生创业时代']
['//book.qidian.com/info/1016320879']
http://bookcover.yuewen.com/qdbimg/349573/1016320879/150
['基本剑术']
['//book.qidian.com/info/1016096305']
http://bookcover.yuewen.com/qdbimg/349573/1016096305/150
['庶族无名']
['//book.qidian.com/info/1017361973']
http://bookcover.yuewen.com/qdbimg/349573/1017361973/150
['从火影开始掌控时间']
['//book.qidian.com/info/1017621987']
http://bookcover.yuewen.com/qdbimg/349573/1017621987/150
['妖灵保护协会']
['//book.qidian.com/info/1017549911']
http://bookcover.yuewen.com/qdbimg/349573/1017549911/150
['我在足坛疯狂刷钱']
['//book.qidian.com/info/1017503698']
http://bookcover.yuewen.com/qdbimg/349573/1017503698/150
['万族之劫']
['//book.qidian.com/info/1018027842']
http://bookcover.yuewen.com/qdbimg/349573/1018027842/150
['从九龙夺嫡开始']
['//book.qidian.com/info/1017625899']
http://bookcover.yuewen.com/qdbimg/349573/1017625899/150
['战争工坊']
['//book.qidian.com/info/1017512762']
http://bookcover.yuewen.com/qdbimg/349573/1017512762/150
['我真不想当恶魔']
['//book.qidian.com/info/1017390905']
http://bookcover.yuewen.com/qdbimg/349573/1017390905/150
['我的帝国无双']
['//book.qidian.com/info/1017737715']
http://bookcover.yuewen.com/qdbimg/349573/1017737715/150
['仙魔编辑器']
['//book.qidian.com/info/1017696480']
http://bookcover.yuewen.com/qdbimg/349573/1017696480/150
['都市大进化时代']
['//book.qidian.com/info/1017661402']
http://bookcover.yuewen.com/qdbimg/349573/1017661402/150
['文娱璀璨']
['//book.qidian.com/info/1012621275']
http://bookcover.yuewen.com/qdbimg/349573/1012621275/150
['大国重坦']
['//book.qidian.com/info/1017558916']
http://bookcover.yuewen.com/qdbimg/349573/1017558916/150
['我有一座藏武楼']
['//book.qidian.com/info/1017434674']
http://bookcover.yuewen.com/qdbimg/349573/1017434674/150
['位面练级大师']
['//book.qidian.com/info/1017559230']
http://bookcover.yuewen.com/qdbimg/349573/1017559230/150
['木叶之波风家的崛起']
['//book.qidian.com/info/1017556476']
http://bookcover.yuewen.com/qdbimg/349573/1017556476/150
['大宋很野蛮']
['//book.qidian.com/info/1017512048']
http://bookcover.yuewen.com/qdbimg/349573/1017512048/150
['科技树保姆']
['//book.qidian.com/info/1017675321']
http://bookcover.yuewen.com/qdbimg/349573/1017675321/150
['诸天最强女主']
['//book.qidian.com/info/1017702416']
http://bookcover.yuewen.com/qdbimg/349573/1017702416/150
['孤岛谍战']
['//book.qidian.com/info/1017496873']
http://bookcover.yuewen.com/qdbimg/349573/1017496873/150
['开局50岁我还可以火三年']
['//book.qidian.com/info/1017570591']
http://bookcover.yuewen.com/qdbimg/349573/1017570591/150
['超品命师']
['//book.qidian.com/info/1017694591']
http://bookcover.yuewen.com/qdbimg/349573/1017694591/150
['深宵酒馆']
['//book.qidian.com/info/1017180943']
http://bookcover.yuewen.com/qdbimg/349573/1017180943/150
['我投篮实在太准了']
['//book.qidian.com/info/1018126504']
http://bookcover.yuewen.com/qdbimg/349573/1018126504/150
['洪荒之我不是哪吒']
['//book.qidian.com/info/1017374273']
http://bookcover.yuewen.com/qdbimg/349573/1017374273/150
['赛博英雄传']
['//book.qidian.com/info/1018180913']
http://bookcover.yuewen.com/qdbimg/349573/1018180913/150
['钢铁城市']
['//book.qidian.com/info/1017639935']
http://bookcover.yuewen.com/qdbimg/349573/1017639935/150
['西南崛起']
['//book.qidian.com/info/1016182259']
http://bookcover.yuewen.com/qdbimg/349573/1016182259/150
['诡异游戏空间']
['//book.qidian.com/info/1017664765']
http://bookcover.yuewen.com/qdbimg/349573/1017664765/150
['楚门狼']
['//book.qidian.com/info/1017589032']
http://bookcover.yuewen.com/qdbimg/349573/1017589032/150
['神秘让我强大']
['//book.qidian.com/info/1018023786']
http://bookcover.yuewen.com/qdbimg/349573/1018023786/150
['我要做阁老']
['//book.qidian.com/info/1018002065']
http://bookcover.yuewen.com/qdbimg/349573/1018002065/150
['以力服人']
['//book.qidian.com/info/1018164861']
http://bookcover.yuewen.com/qdbimg/349573/1018164861/150
['放怪物一条生路不行吗']
['//book.qidian.com/info/1018171817']
http://bookcover.yuewen.com/qdbimg/349573/1018171817/150
['独行诸天末日']
['//book.qidian.com/info/1018044323']
http://bookcover.yuewen.com/qdbimg/349573/1018044323/150
['帝国枭色']
['//book.qidian.com/info/1017690656']
http://bookcover.yuewen.com/qdbimg/349573/1017690656/150
['我的1999年']
['//book.qidian.com/info/1017749812']
http://bookcover.yuewen.com/qdbimg/349573/1017749812/150
['楚氏赘婿']
['//book.qidian.com/info/1018195792']
http://bookcover.yuewen.com/qdbimg/349573/1018195792/150
['从灵气复苏到末法时代']
['//book.qidian.com/info/1018230018']
http://bookcover.yuewen.com/qdbimg/349573/1018230018/150
['我能拉低别人的智商']
['//book.qidian.com/info/1018219521']
http://bookcover.yuewen.com/qdbimg/349573/1018219521/150
['我真没想成大佬']
['//book.qidian.com/info/1018094243']
http://bookcover.yuewen.com/qdbimg/349573/1018094243/150
['天命萤惑']
['//book.qidian.com/info/1017494444']
http://bookcover.yuewen.com/qdbimg/349573/1017494444/150
['无限地球卫士']
['//book.qidian.com/info/1018337057']
http://bookcover.yuewen.com/qdbimg/349573/1018337057/150
['百岁大爷激活修仙系统']
['//book.qidian.com/info/1018198459']
http://bookcover.yuewen.com/qdbimg/349573/1018198459/150
['我的分身是玉皇大帝']
['//book.qidian.com/info/1018339789']
http://bookcover.yuewen.com/qdbimg/349573/1018339789/150
['日本战国走一遭']
['//book.qidian.com/info/1012757932']
http://bookcover.yuewen.com/qdbimg/349573/1012757932/150
['重生写推理小说']
['//book.qidian.com/info/1016350338']
http://bookcover.yuewen.com/qdbimg/349573/1016350338/150
['废土修真的日常']
['//book.qidian.com/info/1016234812']
http://bookcover.yuewen.com/qdbimg/349573/1016234812/150
['食戟之盖世龙厨']
['//book.qidian.com/info/1016075145']
http://bookcover.yuewen.com/qdbimg/349573/1016075145/150
['卡塞尔里的混血君王']
['//book.qidian.com/info/1015940062']
http://bookcover.yuewen.com/qdbimg/349573/1015940062/150
['大田园']
['//book.qidian.com/info/1017249858']
http://bookcover.yuewen.com/qdbimg/349573/1017249858/150
['开局八百个火影']
['//book.qidian.com/info/1017469084']
http://bookcover.yuewen.com/qdbimg/349573/1017469084/150
['柯南之我不是蛇精病']
['//book.qidian.com/info/1017470457']
http://bookcover.yuewen.com/qdbimg/349573/1017470457/150
['我死了也变强了']
['//book.qidian.com/info/1016937210']
http://bookcover.yuewen.com/qdbimg/349573/1016937210/150
['我真不是仙二代']
['//book.qidian.com/info/1017596129']
http://bookcover.yuewen.com/qdbimg/349573/1017596129/150
['李朝万古一逆贼']
['//book.qidian.com/info/1015407245']
http://bookcover.yuewen.com/qdbimg/349573/1015407245/150
['我本初唐']
['//book.qidian.com/info/1013429012']
http://bookcover.yuewen.com/qdbimg/349573/1013429012/150
['我的火影真是太稳健了']
['//book.qidian.com/info/1017256868']
http://bookcover.yuewen.com/qdbimg/349573/1017256868/150
['我有好多复活币']
['//book.qidian.com/info/1017380601']
http://bookcover.yuewen.com/qdbimg/349573/1017380601/150
['我的生活能开挂']
['//book.qidian.com/info/1016519510']
http://bookcover.yuewen.com/qdbimg/349573/1016519510/150
['从火影开始的锻造师']
['//book.qidian.com/info/1017501005']
http://bookcover.yuewen.com/qdbimg/349573/1017501005/150
['砂隐的崛起之路']
['//book.qidian.com/info/1017387894']
http://bookcover.yuewen.com/qdbimg/349573/1017387894/150
['回到明朝做昏君']
['//book.qidian.com/info/1017224028']
http://bookcover.yuewen.com/qdbimg/349573/1017224028/150
['浑沌记']
['//book.qidian.com/info/3267635']
http://bookcover.yuewen.com/qdbimg/349573/3267635/150
['我真不想当圣师']
['//book.qidian.com/info/1017456326']
http://bookcover.yuewen.com/qdbimg/349573/1017456326/150
['苦境武学系统']
['//book.qidian.com/info/1017381974']
http://bookcover.yuewen.com/qdbimg/349573/1017381974/150
['漫威之我是防火女']
['//book.qidian.com/info/1017523110']
http://bookcover.yuewen.com/qdbimg/349573/1017523110/150
['机战世界']
['//book.qidian.com/info/1017442662']
http://bookcover.yuewen.com/qdbimg/349573/1017442662/150
['这款游戏绝对有问题']
['//book.qidian.com/info/1017371463']
http://bookcover.yuewen.com/qdbimg/349573/1017371463/150
['民国之远东巨商']
['//book.qidian.com/info/1017287530']
http://bookcover.yuewen.com/qdbimg/349573/1017287530/150
['湖人有个孙大圣']
['//book.qidian.com/info/1017349860']
http://bookcover.yuewen.com/qdbimg/349573/1017349860/150
['刺客伍六七之剑客陆九']
['//book.qidian.com/info/1017483282']
http://bookcover.yuewen.com/qdbimg/349573/1017483282/150
['阿拉德的不正经救世主']
['//book.qidian.com/info/1017377808']
http://bookcover.yuewen.com/qdbimg/349573/1017377808/150
['在超神学院的那些年']
['//book.qidian.com/info/1017546226']
http://bookcover.yuewen.com/qdbimg/349573/1017546226/150
['有个沙雕血族老婆是什么体验']
['//book.qidian.com/info/1017435917']
http://bookcover.yuewen.com/qdbimg/349573/1017435917/150
['当医生遇上不正经系统']
['//book.qidian.com/info/1017087484']
http://bookcover.yuewen.com/qdbimg/349573/1017087484/150
['如何在推理番中装好人']
['//book.qidian.com/info/1017596228']
http://bookcover.yuewen.com/qdbimg/349573/1017596228/150
['三国从救曹操老爹开始']
['//book.qidian.com/info/1017374000']
http://bookcover.yuewen.com/qdbimg/349573/1017374000/150
['回档在2008']
['//book.qidian.com/info/1017562274']
http://bookcover.yuewen.com/qdbimg/349573/1017562274/150
['从主播开始成为巨星']
['//book.qidian.com/info/1016506925']
http://bookcover.yuewen.com/qdbimg/349573/1016506925/150
['神豪从愿望成真开始']
['//book.qidian.com/info/1017422310']
http://bookcover.yuewen.com/qdbimg/349573/1017422310/150
['文体之路']
['//book.qidian.com/info/1017205414']
http://bookcover.yuewen.com/qdbimg/349573/1017205414/150
['时停在玄幻世界']
['//book.qidian.com/info/1017165730']
http://bookcover.yuewen.com/qdbimg/349573/1017165730/150
['漫威里的旅法师']
['//book.qidian.com/info/1017341840']
http://bookcover.yuewen.com/qdbimg/349573/1017341840/150
['漫威之电影大破坏']
['//book.qidian.com/info/1017497343']
http://bookcover.yuewen.com/qdbimg/349573/1017497343/150
['我可以无限装备']
['//book.qidian.com/info/1017500099']
http://bookcover.yuewen.com/qdbimg/349573/1017500099/150
['无限流生存游戏']
['//book.qidian.com/info/1017257783']
http://bookcover.yuewen.com/qdbimg/349573/1017257783/150
['诸天配角交流群']
['//book.qidian.com/info/1016861661']
http://bookcover.yuewen.com/qdbimg/349573/1016861661/150
['这个刺客有毛病']
['//book.qidian.com/info/1017433918']
http://bookcover.yuewen.com/qdbimg/349573/1017433918/150
['我真的重生了']
['//book.qidian.com/info/1017596673']
http://bookcover.yuewen.com/qdbimg/349573/1017596673/150
['从UP主开始']
['//book.qidian.com/info/1016419324']
http://bookcover.yuewen.com/qdbimg/349573/1016419324/150
['老婆的神级陪练']
['//book.qidian.com/info/1017442346']
http://bookcover.yuewen.com/qdbimg/349573/1017442346/150
['大明王冠']
['//book.qidian.com/info/1016942258']
http://bookcover.yuewen.com/qdbimg/349573/1016942258/150
['我是自己的头号黑粉']
['//book.qidian.com/info/1016566684']
http://bookcover.yuewen.com/qdbimg/349573/1016566684/150
['斗罗之黄猿斗罗']
['//book.qidian.com/info/1017795366']
http://bookcover.yuewen.com/qdbimg/349573/1017795366/150

这是两个代码需要分割开,第二个只需要传入两个参数就可以,一个是number_book(起点文学书都有个id号) name(这个主要创建文件夹使用)

import requests
import re
from bs4 import BeautifulSoup
from requests.exceptions import *
import random
import json
import time
import os
import sys
#定义要爬的文章的序号
number_book='1016350338'
name='重生写推理小说'
def random_user_agent():
    list = ['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36',
            'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/44.0.2403.155 Safari/537.36',
            'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36',
            'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2226.0 Safari/537.36',
            'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36',
            'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36']
    seed = random.randint(0, len(list)-1)
    return list[seed]
 
def getJson():
    url = 'https://book.qidian.com/ajax/book/category?_csrfToken=lpPyO6EWOZoz5LggKFp43eq7jbMf8WfSF2ndrCca&bookId='+number_book
    headers = {'User-Agent': random_user_agent(),
               'Referer': 'https://book.qidian.com/info/'+number_book,
               'Cookie': '_csrfToken=BXnzDKmnJamNAgLu4O3GknYVL2YuNX5EE86tTBAm;newstatisticUUID=1564467217_1193332262; qdrs=0%7C3%7C0%7C0%7C1; showSectionCommentGuide=1; qdgd=1; lrbc=1013637116%7C436231358%7C0%2C1003541158%7C309402995%7C0; rcr=1013637116%2C1003541158; bc=1003541158%2C1013637116; e1=%7B%22pid%22%3A%22qd_P_limitfree%22%2C%22eid%22%3A%22qd_E01%22%2C%22l1%22%3A4%7D; e2=%7B%22pid%22%3A%22qd_P_free%22%2C%22eid%22%3A%22qd_A18%22%2C%22l1%22%3A3%7D'
    }
    try:
        res = requests.get(url=url, params=headers)
        if res.status_code == 200:
            json_str = res.text
            list = json.loads(json_str)['data']['vs']
            response = {
                'VolumeId_List': [],
                'VolumeNum_List': []
            }
            for i in range(len(list)):
                json_str = json.dumps(list[i]).replace(" ", "")
                volume_id = re.search('.*?"vId":(.*?),', json_str, re.S).group(1)
                volume_num = re.search('.*?"cCnt":(.*?),', json_str, re.S).group(1)
                response['VolumeId_List'].append(volume_id)
                response['VolumeNum_List'].append(volume_num)
            return response
        else:
            print('No response')
            return None
    except ReadTimeout:
        print("ReadTimeout!")
        return None
    except RequestException:
        print("请求页面出错!")
        return None
 
def getPage(VolId_List, VolNum_List):
    '''
    通过卷章Id找到要爬取的页面,并返回页面html信息
    :param VolId_List: 卷章Id列表
    :param VolNum_List: 每一卷含有的章节数量列表
    :return:
    '''
    size = VolId_List
    for i in range(2):
        path = name+'//卷' + str(i + 1)
        mkdir(path)
#         https://book.qidian.com/info/1014218975#Catalog
        url = 'https://read.qidian.com/hankread/'+number_book+'/'+VolId_List[i]
        print('\n当前访问路径:'+url)
        headers = {
            'User-Agent': random_user_agent(),
            'Referer': 'https://book.qidian.com/info/3144877',
            'Cookie': 'e1=%7B%22pid%22%3A%22qd_P_hankRead%22%2C%22eid%22%3A%22%22%2C%22l1%22%3A3%7D; e2=%7B%22pid%22%3A%22qd_P_hankRead%22%2C%22eid%22%3A%22%22%2C%22l1%22%3A2%7D; _csrfToken=BXnzDKmnJamNAgLu4O3GknYVL2YuNX5EE86tTBAm; newstatisticUUID=1564467217_1193332262; qdrs=0%7C3%7C0%7C0%7C1; showSectionCommentGuide=1; qdgd=1; e1=%7B%22pid%22%3A%22qd_P_limitfree%22%2C%22eid%22%3A%22qd_E01%22%2C%22l1%22%3A4%7D; e2=%7B%22pid%22%3A%22qd_P_free%22%2C%22eid%22%3A%22qd_A18%22%2C%22l1%22%3A3%7D; rcr=3144877%2C1013637116%2C1003541158; lrbc=3144877%7C52472447%7C0%2C1013637116%7C436231358%7C0%2C1003541158%7C309402995%7C0; bc=3144877'
        }
        try:
            res = requests.get(url=url, params=headers)
            if res.status_code == 200:
                print('第'+str(i+1)+'卷已开始爬取:')
                parsePage(res.text, url, path, int(VolNum_List[i]))
            else:
                print('No response')
                return None
        except ReadTimeout:
            print("ReadTimeout!")
            return None
        except RequestException:
            print("请求页面出错!")
            return None
        time.sleep(3)
 
def parsePage(html, url, path, chapNum):
    '''
    解析小说内容页面,将每章内容写入txt文件,并存储到相应的卷目录下
    :param html: 小说内容页面
    :param url: 访问路径
    :param path: 卷目录路径
    :return: None
    '''
    if html == None:
        print('访问路径为'+url+'的页面为空')
        return
    soup = BeautifulSoup(html, 'lxml')
    ChapInfoList = soup.find_all('div', attrs={'class': 'main-text-wrap'})
    alreadySpiderNum = 0.0
    for i in range(len(ChapInfoList)):
        sys.stdout.write('\r已爬取{0}'.format('%.2f%%' % float(alreadySpiderNum/chapNum*100)))
        sys.stdout.flush()
        time.sleep(0.5)
        soup1 = BeautifulSoup(str(ChapInfoList[i]), 'lxml')
        ChapName = soup1.find('h3', attrs={'class': 'j_chapterName'}).span.string
        ChapName = re.sub('[\/:*?"<>|]', '', ChapName)
        if ChapName == '无题':
            ChapName = '第'+str(i+1)+'章 无题'
        filename = path+'//'+ChapName+'.txt'
        readContent = soup1.find('div', attrs={'class': 'read-content j_readContent'}).find_all('p')
        for item in readContent:
            paragraph = re.search('.*?<p>(.*?)</p>', str(item), re.S).group(1)
            save2file(filename, paragraph)
        alreadySpiderNum += 1.0
    sys.stdout.write('\r已爬取{0}'.format('%.2f%%' % float(alreadySpiderNum / chapNum * 100)))
 
 
def save2file(filename, content):
    with open(r''+filename, 'a', encoding='utf-8') as f:
        f.write(content+'\n')
        f.close()
 
def mkdir(path):
    '''
    创建卷目录文件夹
    :param path: 创建路径
    :return: None
    '''
    folder = os.path.exists(path)
    if not folder:
        os.makedirs(path)
    else:
        print('路径'+path+'已存在')
 
def main():
    response = getJson()
    if response != None:
        VolId_List = response['VolumeId_List']
        VolNum_List = response['VolumeNum_List']
        getPage(VolId_List, VolNum_List)
    else:
        print('无法爬取该小说!')
    print("小说爬取完毕!")
 
if __name__ == '__main__':
    main()
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值