昨天突然想起乌青体,找到了他的网站:http://wuqing.org
点进了“手写诗”:http://wuqing.org/sxs/sxs032
发现了后台的文件路径:http://wuqing.org/wp-content/uploads/2013/02/sxs032.jpg
然后发现:http://wuqing.org/wp-content/uploads/ 居然不添加访问权限。。。。。。
因为之前学过一点python,也看过一点前端,于是无聊就想把图片爬下来,于是写了下面一段粗糙的小脚本。。。。。
写了个同步单线程的,后来还改成了多线程的,发现网络连接有问题,所以又注释掉了,什么时候有时间再看看吧_(:з」∠)_
import urllib,urllib2
#import bs4, re
import sys
reload(sys)
sys.setdefaultencoding('utf8')
import os,shutil
import BeautifulSoup,re
import threading
#sys.setdefaultencoding('utf-8')
class getImgThread(threading.Thread):
def __init__(self,imgUrl,fileName):
threading.Thread.__init__(self)
self.url=imgUrl
self.fileName=fileName
def run(self):
mutex.acquire()
#print self.url
print 'getting...',self.url
mutex.release()
urllib.urlretrieve(self.url,self.fileName)
print 'saving...',self.fileName
if __name__ == '__main__':
purl = 'http://wuqing.org/wp-content/uploads/2013/'
psavepath = r'D:/mycode/Python/MyWorks/wuqingshi'
headers = { 'Use-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6' }
# if os.path.isdir("wuqingshi"):
# shutil.rmtree("wuqingshi") # delete dir
if not os.path.isdir('wuqingshi'):
os.makedirs('wuqingshi') # make dir
mutex = threading.Lock()
threads = []
for i in range(1,13):
if i < 10:
url = purl + '0' + str(i)
else:
url = purl + str(i)
try:
req = urllib2.Request(url, headers=headers)
content = urllib2.urlopen(req).read()
#content = BeautifulSoup.BeautifulSoup(content, from_encoding='GB18030') # BeautifulSoup
content = BeautifulSoup.BeautifulSoup(content)
except Exception,e:
pass
file = content.findAll(href=re.compile(r'.jpg'))
for ii in range(1,len(file)):
picname = str(file[ii].text)
picurl = url + '/' + picname
filename = psavepath + r'/' + picname
print 'getting...',picurl
try:
urllib.urlretrieve(picurl, filename)
except Exception,e:
pass
print 'saving...',filename
# try:
# threads.append(getImgThread(picurl,filename))
# except Exception,e:
# pass
# for t in threads:
# t.start()
# for t in threads:
# t.join()
# print 'End'
print 'all downloading is done!'
ps:只是爬着玩玩,没有用作商业用途,程序也仅供学习。乌青的诗还是挺好玩的,尊重知识产权,想买的同志还是点其淘宝链接进去买吧。侵权则删。