python花瓣网图片_Python抓取花瓣网图片脚本

最新推荐文章于 2024-04-23 10:11:43 发布

weixin_39810441

最新推荐文章于 2024-04-23 10:11:43 发布

阅读量124

点赞数

文章标签： python花瓣网图片

#!/usr/bin/env python

# -*- encoding:utf-8 -*-

# author :insun

#http://yxmhero1989.blog.163.com/blog/static/112157956201311994027168/

import urllib, urllib2, re, sys, os

reload(sys)

#url = 'http://huaban.com/favorite/'

if(os.path.exists('beauty') == False):

os.mkdir('beauty')

def get_huaban_beauty():

pin_id = 48145457

limit = 20 #他默认允许的limit为100

while pin_id != None:

url = 'http://huaban.com/favorite/beauty/?max=' + str(pin_id) + '&limit=' + str(limit) + '&wfl=1'

try:

i_headers = {"User-Agent": "Mozilla/5.0(Windows; U; Windows NT 5.1; zh-CN; rv:1.9.1)\

Gecko/20090624 Firefox/3.5", \

"Referer": 'http://baidu.com/'}

req = urllib2.Request(url, headers=i_headers)

html = urllib2.urlopen(req).read()

reg = re.compile('"pin_id":(.*?),.+?"file":{"farm":"farm1", "bucket":"hbimg",.+?"key":"(.*?)",.+?"type":"image/(.*?)"', re.S)

groups = re.findall(reg, html)

print str(pin_id) + "Start to catch " + str(len(groups)) + " photos"

for att in groups:

pin_id = att[0]

att_url = att[1] + '_fw554'

img_type = att[2]

img_url = 'http://img.hb.aicdn.com/' + att_url

if(urllib.urlretrieve(img_url, 'beauty/' + att_url + '.' + img_type)):

print img_url + '.' + img_type + ' download success!'

else:

print img_url + '.' + img_type + ' save failed'

#print pin_id

except:

print 'error occurs'

get_huaban_beauty()

weixin_39810441

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python花瓣网图片_Python抓取花瓣网图片脚本

#!/usr/bin/envpython#-*-encoding:utf-8-*-#author:insun#http://yxmhero1989.blog.163.com/blog/static/112157956201311994027168/importurllib,urllib2,re,sys,osreload(sys)#url='http://huaban.co...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。