python之小说下载器version1.0

        首先声明,我写这个是为了练手,我不看小说了.因为眼睛近视太厉害了,我连手机都不玩了.

        小说下载器的目的是为了解决现在市面上能下载最新小说的网站是在太少了,但是在线观看的却很多,所以我写了这个在线抓取小说的工具.代码是针对特定的网站编写的代码,但是我觉得这个网站时间很长,小说也很全,应该能满足绝大多数的需求,网站名字这里不说,一会大家代码里看,我怕有法律纠纷.

因为这个是一个网页抓取去读html的一个工具,所以需要一个解析html的框架,我发现了pyquery,因为我自己认为jquery学得不错(jquery 写过自己的插件,浏览器兼容性问题不大都能处理,jqueryui基本上所有的东西都用过,还自定制过很多jqueryui插件.能自己修复官方bug),发现了这个pyquery宝贝,肯定不能放过.安装python插件我使用的是easy_install,

我原先使用的pip但是发现不如easy_install好用,我在装pyquery的时候,用pip就不能安装成功,pip在处理依赖库的时候报错了,我用easy_install就安装成功了.easy_install 和pip的安装可以看这里:http://blog.csdn.net/qq413041153/article/details/8950247 

安装好easy_install 之后直接在cmd里面输入:

easy_install pyquery
        如图,因为我已经安装过了,所以直接提示我已经在easy-install.pth中激活了pyquery1.2.4.


        下面直接上代码:

# -*- coding:gbk -*-
'''
file desc:novel downloader
author:kingviker
email:kingviker@163.com.kingviker88@gmail.com
date:2013-05-21
depends:python 2.7.4,pyquery
'''

import os,codecs
from pyquery import PyQuery as pq


saveMode="singleFile" #singleFile or singleChapter

#novel's main webpage.
url = "http://www.dushuge.net/html/14/14712/"
#where the novels will be saved
baseSavePath="E:/enovel/"

#using pyquery to grub the webpage's content
html_pq = pq(url=url)

#using jquery's grammar to get the novel's name/
novelName = html_pq("div.book_news_style_text2 > h1").text()
print novelName


#if the novel's file system  not exists,created.
if os.path.exists(baseSavePath+novelName) is not True:
    os.mkdir(baseSavePath+novelName)

#using to save pieces and chapter lists
pieceList=[]
chapterList=[]


#find the first piece of the novel.
piece = pq(html_pq("div.book_article_texttable").find(".book_article_texttitle")[0])

#get the current piece's text
pieceList.append(piece.text())
print "piece Text:", piece.text()

#scan out the piece and chapter lists
nextPiece=False
while nextPiece==False:
    chapterDiv = piece.next()
    #print "章节div长度:",chapterDiv.length
    piece = chapterDiv
    if chapterDiv.length==0:
        pieceList.append(chapterList[:])
        del chapterList[:]
        nextPiece=True
    elif chapterDiv.attr("class")=="book_article_texttitle":
        pieceList.append(chapterList[:])
        del chapterList[:]
        pieceList.append(piece.text())
        
    else:
        chapterUrls = chapterDiv.find("a");
        for urlA in chapterUrls:
            urlList_temp = [pq(urlA).text(),pq(urlA).attr("href")]
            chapterList.append(urlList_temp)

print "下载列表收集完成",len(pieceList)


#based on the piecelist,grub the special webpage's novel content and save them .
if saveMode == "singleFile":
    
    if os.path.exists(baseSavePath+novelName+".txt"):os.remove(baseSavePath+novelName+".txt")

    #using codecs to create a file. write mode(w+) is appended.
    novelFile = codecs.open(baseSavePath+novelName+".txt","wb+","utf-8")
    #just using two for loops to analyze the piecelist.
    for pieceNum in range(0,len(pieceList),2):
        piece = pieceList[pieceNum]
        print "开始下载",pieceList[pieceNum]
        chapterList = pieceList[pieceNum+1]
        for chapterNum in range(0,len(chapterList)):
            chapter = chapterList[chapterNum]
            print "开始下载",chapter[0],"地址:",chapter[1]
            chapterPage = pq(url=url+chapter[1])

            chapterContent = piece+" "+chapter[0]+"\r\n"
            chapterContent += chapterPage("#booktext").html().replace("<br />","\r\n")
            print "小说内容:",len(chapterContent)
            novelFile.write(chapterContent+"\r\n"+"\r\n")
            
    novelFile.close()
else:
    # as same as above
   for pieceNum in range(0,len(pieceList),2):
        piece = pieceList[pieceNum]
        print "开始下载",pieceList[pieceNum]
        chapterList = pieceList[pieceNum+1]
        for chapterNum in range(0,len(chapterList)):
            chapter = chapterList[chapterNum]
            print "开始下载",chapter[0],"地址:",chapter[1]
            novelFile = codecs.open(baseSavePath+novelName+os.sep+piece+chapter[0]+".txt","wb","utf-8")
            chapterPage = pq(url=url+chapter[1])

            chapterContent = piece+" "+chapter[0]+"\r\n"
            chapterContent += chapterPage("#booktext").html().replace("<br />","\r\n")
            print "小说内容:",len(chapterContent)
            novelFile.write(chapterContent+"\r\n"+"\r\n")
            novelFile.close()

print "下载完成"

        直接更换代码中的小说主页面 即可下载,小说文件会放在e:/novel/下,可以选择单章保存或者单文件保存.

        我没有封装成函数,因为我比较懒.

        有问题或者错误 欢迎批评指正.


补充:

        代码里面用到了codecs,这里有篇文章可以帮助大家了解codecs:传送门

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值