小说下载器V1.0

最新推荐文章于 2024-08-02 22:11:51 发布

77努力变强

最新推荐文章于 2024-08-02 22:11:51 发布

阅读量3.5k

点赞数 1

分类专栏： Python爬虫实战文章标签： python thread pyqt

本文链接：https://blog.csdn.net/qq_38830282/article/details/108267755

版权

本文介绍了小说下载器V1.0的实现，包括小说获取部分、用户界面和多线程下载。使用Python编程，通过分析小说网站的URL结构获取小说列表，利用PyQT创建用户界面，结合多线程进行章节下载。同时，文章还提到了2020.08.28的更新内容，修复了搜索结果为1时的错误，并加入了章节合并功能。

摘要由CSDN通过智能技术生成

小说下载器V1.0

小说获取部分

小说来源我选择的是全本小说网，本来想的是多选择几个小说源，不过有点麻烦，就留到以后在添加吧。

小说列表获取

在这里插入图片描述

这个网站要找到想要的小说只能通过图中的搜索栏，通过观察，搜索结果页的网址构成是有规律的：

‘https://www.qb5.tw/modules/article/search.php?searchkey=’+编码后的关键字，为了将其编码，编写以下函数：

def str_encode(str):
    return urllib.parse.quote(str.encode('gb2312'))

对输入字符串先进行gb2312编码，再进行urlencode编码。得到了正确的网址，就能开始爬取小说列表了。

def Find_Novel(self,novel_name):              #function: find the novel  parameter:novel's name  return:the url of novel
    self.novel_name=novel_name
    url=self.search_url+convert.str_encode(self.novel_name)  #Get the search url
    htmlSource=requests.get(url,headers=self.header)   #Get the html of search url
    soup=BeautifulSoup(htmlSource.text,'lxml')    #Parsing the html
    head=soup.find('head')
    self.novel_list = []                #judge whether the page is a search page
    self.items = []
    if head.title.text!=novel_name+self.titleroot:    #This is the homepage of a novel
        info=soup.find('div',attrs={
   'id':'bookdetail'}).find('div',attrs={
   'id':'info'}) 
        										#get the information of this novel
        name,author=info.h1.text.split('/ ')     #Novel's name author state link and introduction
        state=info.p.span.next_sibling.text
        self.items.append(name)
        link = head.link['href']
        novel_info = (name, author, '无法获取', state, link)
        self.novel_list.append(novel_info)
    else:                                          # This is search page
        trs=soup.find_all('tr',attrs={
   'align':False})   #Get information about all novels on this page
        for tr in trs:
            tds=tr.find_all('td')
            novel_info=(tds[0].text,tds[2].text,tds[3].text,tds[5].text,tds[0].find('a')['href'])
            self.items.append(tds[0].text)          #Add all novels' name into the items
            self.novel_list.append(novel_info)      #Add all novels' information into the novel_list
    if self.items == []:
        print('未检测到小说')
    else:
        print('检索完成！')

此函数用于获取搜索页的说有小说，并将其储存到列表中，便于之后的UI调用，特别的是，当搜索的关键字只能检索到一本小说的时候，网站会自动跳转到该小说的主页。对此，我使用了一个if来判断。

小说章节列表获取

def Get_chapter_list(self):
    html=requests.get(self.novel_list[self.choose_index][4],headers=self.header).text
    soup = BeautifulSoup(html, 'lxml')                            #Parsing the html
    self.introduction=soup.find('div',attrs={
   'id':'intro'}).text
    self.introduction=''.join(self.introduction.split())               #get the novel's introduction
    self.img=soup.find('div'