这是一个罪恶的爬虫
爬取 http://www.27gif.net/gifcc 中的gif图,并以‘神秘代码’为它的文件名保存。
------------------------------------------------------------------------------------------------------
import requests
from bs4 import BeautifulSoup
page = 1
while True:
# 请求起始页,找到每个图帖子的连接,并自动保存在list中
star_url = 'http://www.27gif.net/gifcc/page/%s/' % str(page)
star_html = requests.get(star_url).text
star_soup = BeautifulSoup(star_html,'lxml')
gif_list = star_soup.find_all('div',class_='wow fadeInUp')
# 遍历所有帖子的list
for gif_html in gif_list:
# 找到img标签中的'alt属性' 整理得到gif的url
try:
gif_name = gif_html.find('img')['alt'].split(':')[1]
except TypeError as E:
continue
except IndexError as e:
gif_name = gif_html.find('img')['alt']
try:
gif_url = gif_html.find('img')['src'].split('src=')[1].split('&w=')[0]
except TypeError as E:
continue
# 请求gif的url 并保存
gif_content = requests.get(gif_url).content
with open(gif_name+'.gif','wb') as f:
f.write(gif_content)
print(gif_name+' OK!')
if page < 13:
page += 1
else:
break
运行完毕后,会在当前文件夹保存GIF图。
使用前请备好纸巾,使用后请及时喝营养快线