【下载所有XKCD漫画】详细解析

最新推荐文章于 2021-09-15 10:50:00 发布

llxxyy507

最新推荐文章于 2021-09-15 10:50:00 发布

阅读量1.4k

点赞数

分类专栏： Python 文章标签：下载CKDK漫画 python脚本实例 python自动下载漫画

本文为博主原创文章，未经博主允许不得转载。

本文链接：https://blog.csdn.net/llxxyy507/article/details/103647803

版权

Python 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

1，实现目的

XKCD 是一个流行的极客漫画网站。首页 http://xkcd.com/有一个“Prev”按钮，让用户导航到前面的漫画。

Python脚本实现：从第1幅到第n幅（用户输入），下载这些XKCD网上的漫画。

2，准备工作

2.1，确保所需模块已安装

本脚本需要 requests， bs4， os 三个模块，os是Python自带，其余需要自己安装。不会请参考通过命令行安装 Python的第三方模块

2.2，脚本思路

• 提示用户键入一个数字，获得起始的URL。

• 利用 requests 模块下载页面。

• 利用 Beautiful Soup 找到页面中漫画图像的 URL。（要清楚该网站的HTML结构）

• 利用 iter_content()下载漫画图像，并保存到硬盘。

• 找到前一张漫画的链接 URL，然后重复。

XKCD网站第一幅漫画URL为http://xkcd.com/1/ ，再点击“Prev”按钮会发现URL为http://xkcd.com/1/# ，那么就可以通过这个来作为循环的终止条件

3，全部代码

#! python3
#This script is used to download single XKCD comic.
import requests, os, bs4
#download comic from 1 to n
print('Enter num')
url = 'http://xkcd.com/' + input() + '/'

os.makedirs('xkcd', exist_ok = True)
while not url.endswith('#'):   #the first comic's url ends with '#'. https://xkcd.com/1/#
	#Download the page
	print('Downloading page  | %s...' %url)
	res = requests.get(url)
	res.raise_for_status()
	
	soup = bs4.BeautifulSoup(res.text, features = 'html.parser')
	
	#Find URL of the comic image
	comicElem = soup.select('#comic img')
	if comicElem == []:
		print("Could not find comic image")
	else:
		comicUrl = 'http:' + comicElem[0].get('src')
	
	#Download the image
		print('Downloading image | %s...' %comicUrl)
		res = requests.get(comicUrl)
		res.raise_for_status()
		
	#Save the image to ./xkcd
		imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
		for chunk in res.iter_content(100000):
			imageFile.write(chunk)
		imageFile.close()
		
	#Get the Prev button's url
	prevLink = soup.select('a[rel="prev"]')[0]
	url = 'http://xkcd.com' + prevLink.get('href')
	
print('DONE')