python爬图片的一个实例

最新推荐文章于 2024-07-04 23:15:41 发布

wjfzzhxy

最新推荐文章于 2024-07-04 23:15:41 发布

阅读量1.0k

点赞数

分类专栏：爬虫文章标签： python 爬虫图片网页

本文链接：https://blog.csdn.net/wjfzzhxy/article/details/45672333

版权

爬虫专栏收录该内容

1 篇文章 0 订阅

订阅专栏

拜读《python练习手册》之爬日本美女图图片，个人有一点体会：
在对网页打开时，有三种方式；

列表内容

import re,urllib,os

url="http://tieba.baidu.com/p/2166231880"
find=re.compile(r'<img pic_type="0" class="BDE_Image" src="(.*?).jpg"')

data=urllib.urlopen(url).read()
picture_url_list=find.findall(data.decode('utf-8'))

2.直接导入requests，通过get（url）获得网页

import requests,urllib,os
url="http://tieba.baidu.com/p/2166231880"

find=re.compile(r'<img pic_type="0" class="BDE_Image" src="(.*?).jpg"')

html=requests.get(url)
data=html.content.decode('utf-8')
picture_url_list=find.findall(data)#对改url需要加上图片的后缀名“.jpg”

3.通过urllib中的Request请求创建一个Request对象

import re,urllib,os
url="http://tieba.baidu.com/p/2166231880"

find=re.compile(r'<img pic_type="0" class="BDE_Image" src="(.*?).jpg"')

req=urllib.Request(url)
response = urllib2.urlopen(req) 
the_page = response.read()

注意
在对网站爬虫的过程中注意编码的改写，一般来说“utf-8”和“GBK”为常用解码类型