python实现美空图片抓取机器人

最新推荐文章于 2024-09-03 17:22:28 发布

pushme_pli

最新推荐文章于 2024-09-03 17:22:28 发布

阅读量1.3k

点赞数

文章标签： python path url 照片 exception file

本文链接：https://blog.csdn.net/pushme_pli/article/details/7335142

版权

最近挺闲，没事逛美空看美女，忽然觉得为什么不把照片都下到本地，没事的时候慢慢看呢，于是就有了以下代码：

#-*- coding:utf-8 -*-

import urllib
import re
import os

#IMG_REG = re.compile('<img[^>]*?src[^>]*?=[\"\'][^"]*?[\'\"]')
IMG_REG = re.compile('<img[^>]*?src2=[\"\'][^"]*?[\'\"]')
URL_REG = re.compile('<a href="(.*?)" title="(.*?)" hidefocus="true" target="_blank">')
LOCAL_DIR = 'c://tmp/pictrue/'

def cbk(a, b, c):
per = 100 * a * b / c
if per > 100:
per = 100
print '%.2f%%' % per

def getPictrueFromOnePage(url, dirPath):
file = urllib.urlopen(url)
content = file.read()
for match in IMG_REG.findall(content):
print match
imgurl = match[match.index("http"):][:-1]
filename = imgurl[imgurl.rindex("/") + 1:]
print imgurl
print filename
local = dirPath + filename
urllib.urlretrieve(imgurl, local, cbk)

def mainPorcess(url):
content = urllib.urlopen(url).read()
i = 0
for matched in URL_REG.findall(content):
i = i + 1
subUrl = 'http://www.moko.cc' + matched[0]
print subUrl
path = LOCAL_DIR + matched[1].decode('utf-8').encode('gbk') + '\\'
if not os.path.isdir(path):
try:
os.mkdir(path)
except Exception as e:
path = LOCAL_DIR + str(i) + '\\'
print path
getPictrueFromOnePage(subUrl, path)


if __name__ == '__main__':
mainPorcess('http://www.moko.cc/channels/post/23/1.html')

它能自动下载照片并以美女的名字生成文件夹来存贮照片。

这个程序有几个缺陷：

1. 只能抓取美空页面中按照美女姓名分类的二级目录下的照片。

2. 只能抓取当前页，不能自动翻页

最后再说一句，python真是巨方便！！！

pushme_pli

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
python实现美空图片抓取机器人

最近挺闲，没事逛美空看美女，忽然觉得为什么不把照片都下到本地，没事的时候慢慢看呢，于是就有了以下代码：#-*- coding:utf-8 -*-import urllibimport reimport os#IMG_REG = re.compile(']*?src[^>]*?=[\"\'][^"]*?[\'\"]')IMG_REG = re.compile(
复制链接

扫一扫