用xpath方法,对第一PPT网站进行批量下载模板,仅对第一行进行了PPT模板批量下载,如可自行更改文件中的内容实现对不同行的获取
由于本人能力有限,没有找到那两个文件内容的xpath路径,所以自己创建了两个文本文档
urls.txt文件内容:
http://www.1ppt.com/moban/jianjie/
http://www.1ppt.com/moban/danya/
http://www.1ppt.com/moban/shangwu/
http://www.1ppt.com/moban/yishu/
http://www.1ppt.com/moban/katong/
http://www.1ppt.com/moban/aiqing/
http://www.1ppt.com/moban/ziran/
http://www.1ppt.com/moban/jianzhu/
http://www.1ppt.com/moban/gudian/
http://www.1ppt.com/moban/shishang/
http://www.1ppt.com/moban/renwu/
http://www.1ppt.com/moban/zhiwu/
http://www.1ppt.com/moban/liti/
http://www.1ppt.com/moban/zhongguofeng/
names.txt文件内容:
简洁模板
淡雅模板
商务模板
艺术设计
卡通动漫
浪漫爱情
自然风景
建筑模板
古典模板
时尚模板
人物模板
植物模板
微立体模板
中国风模板
import requests
from lxml import etree
from fake_useragent import UserAgent
import os
import time
#读取链接和名字
o_names = []
fo = open("urls.txt","r")
oringin_urls = fo.readlines()
fo.close()
fi = open("names.txt","r")
oringin_names = fi.readlines()
for name in oringin_names:
o_names.append(name.strip("\n"))
#print(o_name[1])
fi.close()
#创建文件
lt = []
titles = []
def prepare_get(o_url,o_name):
pathname = './{}/'.format(o_name)
if not os.path.exists(pathname):
os.mkdir(pathname)
#请求头部
url = '{}'.format(o_url)
headers = {
'User-Agent':UserAgent().random}
r = requests.get(url,headers)
r.encoding = 'gb2312'
#xpath获取信息
etrees = etree.HTML(r.text)
ppt_titles = etrees.xpath('//ul[@class="tplist"]//h2/a/text()')
for title in ppt_titles:
titles.append(title)
ppt_urls = etrees.xpath('//ul[@class="tplist"]//a/@href')
#print(len(ppt_titles))
download_urls = []
for i in range(len(ppt_urls)):
if i % 4 == 0:
download_urls.append(ppt_urls[i])
for i in range(len(download_urls)):
lt.append(download_urls[i][9:14])
return o_name
#print(lt)
def download_ppt(url_number,name,o_name):
url = 'http://www.1ppt.com/plus/download.php?open=0&aid={}&cid=3'.format(url_number)
#print(url)
r = requests.get(url)
r.encoding = 'gb2312'
etrees = etree.HTML(r.text)
d_url = etrees.xpath("//li[@class='c1']/a/@href")[0]
#print(d_url)
r = requests.get(d_url)
r.encoding = 'gb2312'
with open('{}/'.format(o_name) + name + '.zip', 'wb') as f:
f.write(r.content)
print('正在下载 ' + name + ' 到{}文件夹'.format(o_name))
for m in range(len(oringin_urls)):
o_name = o_names[m]
#print(o_name)
prepare_get(oringin_urls[m].strip("\n"), oringin_names[m].strip("\n"))
time.sleep(1)
for i in range(len(titles)):
download_ppt(lt[i],titles[i],o_name)
time.sleep(1)
lt = []
titles = []
代码为原创,请勿商用。
运行结果如下:
代码所在路径,文件夹就会在该路径下生成,每个文件夹中大约有20个PPT模板。
本人已将程序导出成exe文件,如有需可私信。