小说下载器V1.0
小说获取部分
小说来源我选择的是全本小说网,本来想的是多选择几个小说源,不过有点麻烦,就留到以后在添加吧。
小说列表获取
这个网站要找到想要的小说只能通过图中的搜索栏,通过观察,搜索结果页的网址构成是有规律的:
‘https://www.qb5.tw/modules/article/search.php?searchkey=’+编码后的关键字,为了将其编码,编写以下函数:
def str_encode(str):
return urllib.parse.quote(str.encode('gb2312'))
对输入字符串先进行gb2312编码,再进行urlencode编码。得到了正确的网址,就能开始爬取小说列表了。
def Find_Novel(self,novel_name): #function: find the novel parameter:novel's name return:the url of novel
self.novel_name=novel_name
url=self.search_url+convert.str_encode(self.novel_name) #Get the search url
htmlSource=requests.get(url,headers=self.header) #Get the html of search url
soup=BeautifulSoup(htmlSource.text,'lxml') #Parsing the html
head=soup.find('head')
self.novel_list = [] #judge whether the page is a search page
self.items = []
if head.title.text!=novel_name+self.titleroot: #This is the homepage of a novel
info=soup.find('div',attrs={
'id':'bookdetail'}).find('div',attrs={
'id':'info'})
#get the information of this novel
name,author=info.h1.text.split('/ ') #Novel's name author state link and introduction
state=info.p.span.next_sibling.text
self.items.append(name)
link = head.link['href']
novel_info = (name, author, '无法获取', state, link)
self.novel_list.append(novel_info)
else: # This is search page
trs=soup.find_all('tr',attrs={
'align':False}) #Get information about all novels on this page
for tr in trs:
tds=tr.find_all('td')
novel_info=(tds[0].text,tds[2].text,tds[3].text,tds[5].text,tds[0].find('a')['href'])
self.items.append(tds[0].text) #Add all novels' name into the items
self.novel_list.append(novel_info) #Add all novels' information into the novel_list
if self.items == []:
print('未检测到小说')
else:
print('检索完成!')
此函数用于获取搜索页的说有小说,并将其储存到列表中,便于之后的UI调用,特别的是,当搜索的关键字只能检索到一本小说的时候,网站会自动跳转到该小说的主页。对此,我使用了一个if来判断。
小说章节列表获取
def Get_chapter_list(self):
html=requests.get(self.novel_list[self.choose_index][4],headers=self.header).text
soup = BeautifulSoup(html, 'lxml') #Parsing the html
self.introduction=soup.find('div',attrs={
'id':'intro'}).text
self.introduction=''.join(self.introduction.split()) #get the novel's introduction
self.img=soup.find('div'