python之小说下载器version1.0

最新推荐文章于 2023-02-24 13:53:51 发布

KingViker

最新推荐文章于 2023-02-24 13:53:51 发布

阅读量1.7k

点赞数

分类专栏： python 文章标签： Python

本文链接：https://blog.csdn.net/kingviker/article/details/8955662

版权

python 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

首先声明,我写这个是为了练手,我不看小说了.因为眼睛近视太厉害了,我连手机都不玩了.

小说下载器的目的是为了解决现在市面上能下载最新小说的网站是在太少了,但是在线观看的却很多,所以我写了这个在线抓取小说的工具.代码是针对特定的网站编写的代码,但是我觉得这个网站时间很长,小说也很全,应该能满足绝大多数的需求,网站名字这里不说,一会大家代码里看,我怕有法律纠纷.

因为这个是一个网页抓取去读html的一个工具,所以需要一个解析html的框架,我发现了pyquery,因为我自己认为jquery学得不错(jquery 写过自己的插件,浏览器兼容性问题不大都能处理,jqueryui基本上所有的东西都用过,还自定制过很多jqueryui插件.能自己修复官方bug),发现了这个pyquery宝贝,肯定不能放过.安装python插件我使用的是easy_install,

我原先使用的pip但是发现不如easy_install好用,我在装pyquery的时候,用pip就不能安装成功,pip在处理依赖库的时候报错了,我用easy_install就安装成功了.easy_install 和pip的安装可以看这里:http://blog.csdn.net/qq413041153/article/details/8950247

安装好easy_install 之后直接在cmd里面输入:

easy_install pyquery

如图,因为我已经安装过了,所以直接提示我已经在easy-install.pth中激活了pyquery1.2.4.

下面直接上代码:

# -*- coding:gbk -*-
'''
file desc:novel downloader
author:kingviker
email:kingviker@163.com.kingviker88@gmail.com
date:2013-05-21
depends:python 2.7.4,pyquery
'''

import os,codecs
from pyquery import PyQuery as pq


saveMode="singleFile" #singleFile or singleChapter

#novel's main webpage.
url = "http://www.dushuge.net/html/14/14712/"
#where the novels will be saved
baseSavePath="E:/enovel/"

#using pyquery to grub the webpage's content
html_pq = pq(url=url)

#using jquery's grammar to get the novel's name/
novelName = html_pq("div.book_news_style_text2 > h1").text()
print novelName


#if the novel's file system  not exists,created.
if os.path.exists(baseSavePath+novelName) is not True:
    os.mkdir(baseSavePath+novelName)

#using to save pieces and chapter lists
pieceList=[]
chapterList=[]


#find the first piece of the novel.
piece = pq(html_pq("div.book_article_texttable").find(".book_article_texttitle")[0])

#get the current piece's text
pieceList.append(piece.text())
print "piece Text:", piece.text()

#scan out the piece and chapter lists
nextPiece=False
while nextPiece==False:
    chapterDiv = piece.next()
    #print "章节div长度:",chapterDiv.length
    piece = chapterDiv
    if chapterDiv.length==0:
        pieceList.append(chapterList[:])
        del chapterList[:]
        nextPiece=True
    elif chapterDiv.attr("class")=="book_article_texttitle":
        pieceList.append(chapterList[:])
        del chapterList[:]
        pieceList.append(piece.text())
        
    else:
        chapterUrls = chapterDiv.find("a");
        for urlA in chapterUrls:
            urlList_temp = [pq(urlA).text(),pq(urlA).attr("href")]
            chapterList.append(urlList_temp)

print "下载列表收集完成",len(pieceList)


#based on the piecelist,grub the special webpage's novel content and save them .
if saveMode == "singleFile":
    
    if os.path.exists(baseSavePath+novelName+".txt"):os.remove(baseSavePath+novelName+".txt")

    #using codecs to create a file. write mode(w+) is appended.
    novelFile = codecs.open(baseSavePath+novelName+".txt","wb+","utf-8")
    #just using two for loops to analyze the piecelist.
    for pieceNum in range(0,len(pieceList),2):
        piece = pieceList[pieceNum]
        print "开始下载",pieceList[pieceNum]
        chapterList = pieceList[pieceNum+1]
        for chapterNum in range(0,len(chapterList)):
            chapter = chapterList[chapterNum]
            print "开始下载",chapter[0],"地址:",chapter[1]
            chapterPage = pq(url=url+chapter[1])

            chapterContent = piece+" "+chapter[0]+"\r\n"
            chapterContent += chapterPage("#booktext").html().replace("<br />","\r\n")
            print "小说内容:",len(chapterContent)
            novelFile.write(chapterContent+"\r\n"+"\r\n")
            
    novelFile.close()
else:
    # as same as above
   for pieceNum in range(0,len(pieceList),2):
        piece = pieceList[pieceNum]
        print "开始下载",pieceList[pieceNum]
        chapterList = pieceList[pieceNum+1]
        for chapterNum in range(0,len(chapterList)):
            chapter = chapterList[chapterNum]
            print "开始下载",chapter[0],"地址:",chapter[1]
            novelFile = codecs.open(baseSavePath+novelName+os.sep+piece+chapter[0]+".txt","wb","utf-8")
            chapterPage = pq(url=url+chapter[1])

            chapterContent = piece+" "+chapter[0]+"\r\n"
            chapterContent += chapterPage("#booktext").html().replace("<br />","\r\n")
            print "小说内容:",len(chapterContent)
            novelFile.write(chapterContent+"\r\n"+"\r\n")
            novelFile.close()

print "下载完成"

直接更换代码中的小说主页面即可下载,小说文件会放在e:/novel/下,可以选择单章保存或者单文件保存.

我没有封装成函数,因为我比较懒.

有问题或者错误欢迎批评指正.

补充:

代码里面用到了codecs,这里有篇文章可以帮助大家了解codecs:传送门

KingViker

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
python之小说下载器version1.0

首先声明,我写这个是为了练手,我不看小说了.因为眼睛近视太厉害了,我连手机都不玩了. 小说下载器的目的是为了解决现在市面上能下载最新小说的网站是在太少了,但是在线观看的却很多,所以我写了这个在线抓取小说的工具.代码是针对特定的网站编写的代码,但是我觉得这个网站时间很长,小说也很全,应该能满足绝大多数的需求,网站名字这里不说,一会大家代码里看,我怕有法律纠纷.因为这个是一个网页
复制链接

扫一扫