python爬虫抓图_Python 爬虫网页抓图保存

最新推荐文章于 2022-11-04 10:45:00 发布

胡格

最新推荐文章于 2022-11-04 10:45:00 发布

阅读量146

点赞数

文章标签： python爬虫抓图

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_32364911/article/details/113672673

版权

本文介绍了一个Python爬虫程序，用于从汽车主题的桌面壁纸网站抓取图片。程序通过解析HTML标签，找到链接和图片URL，并下载保存到指定路径。使用了正则表达式匹配图片文件名，并在调试时提供了打印输出。

摘要由CSDN通过智能技术生成

网站选择桌面壁纸网站的汽车主题：

下面的两个print在调试时打开

#print tag

#print attrs

#!/usr/bin/env python

import re

import urllib2

import HTMLParser

base = "http://desk.zol.com.cn"

path = '/home/mk/cars/'

star = ''

def get_url(html):

parser = parse(False)

request = urllib2.Request(html)

response = urllib2.urlopen(request)

resp = response.read()

parser.feed(resp)

def download(url):

content = urllib2.urlopen(url).read()

format = '[0-9]*\.jpg';

res = re.search(format,url);

print 'downloading:',res.group()

filename = path+res.group()

f = open(filename,'w+')

f.write(content)

f.close()

class parse(HTMLParser.HTMLParser):

def __init__(self,Index):

self.Index = Index;

HTMLParser.HTMLParser.__init__(self)

def handle_starttag(self,tag,attrs):

#print tag

#print attrs

if(self.Index):

if not cmp(tag,'a'):

if(len(attrs) == 4):

if(attrs[0] ==('class','pic')):

#print tag

#print attrs

new = base+attrs[1][1]

print 'found a link:',new

global star

star = new

get_url(new)

else:

if not cmp(tag,'img'):

if(attrs[0] == ('id','bigImg')):

#print tag

#print attrs

Image_url = attrs[1][1]

print 'found a picture:',Image_url

download(Image_url)

if not cmp(tag,'a'):

if (len(attrs) == 4):

if (attrs[1] == ('class','next')):

#print tag

#print attrs

next = base + attrs[2][1]

print 'found a link:',next

if (star != next):

get_url(next)

Index_url = 'http://desk.zol.com.cn/qiche/'

con = urllib2.urlopen(Index_url).read()

Parser_index = parse(True)

Parser_index.feed(con)

仅仅就是抓桌面壁纸网站上的优美的壁纸。。。

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python爬虫抓图_Python 爬虫网页抓图保存

网站选择桌面壁纸网站的汽车主题：下面的两个print在调试时打开#print tag#print attrs#!/usr/bin/env pythonimport reimport urllib2import HTMLParserbase = "http://desk.zol.com.cn"path = '/home/mk/cars/'star = ''def get_url(html):pars...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。