requests应用step1

爬取说明

爬取的是小黄鸭的图片并保存到本地

使用模块主要作用说明

import requests
from urllib.request import urlretrieve
import re
import os

urlretrieve:保存下载的图片
os:判断文件目录是否存在和文件目录的创建
re:正则模块,查找需要的内容

代码解释

设置了请求头:

url="http://www.ivsky.com/tupian/xiaohuangren_t21343/"
headers = {
        'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50',
        'Referer': url,
        'Connection': 'Keep-alive'
    }

提交请求:

s=requests.get(url,headers=headers)
# print(s.url)
s=s.text
# print(s)

使用re匹配需要的数据:

pattern = r'<div class="il_img".*?<img src="(.*?.jpg)" width'
pa=re.compile(pattern)
uls=re.findall(pattern=pa,string=s)

使用urlretrieve保存图片:

for item in uls:
    # print(item)
    #http://img.ivsky.com/img/tupian/t/201411/01/xiaohuangren-004.jpg
    path = re.split("\/[0-9]{2}(\/.*?\.jpg)",item,2)[1]
    path = '/root/python/python/taobao%s'%path
    # print(os.path.exists(os.path.split(path)[0]))
    if not (os.path.exists(os.path.split(path)[0])):
        os.mkdir(os.path.split(path)[0])

    print(path)
    urlretrieve(item,path)

使用文件流保存图片:

for item in uls:
    path = re.split("\/[0-9]{2}(\/.*?\.jpg)", item, 2)[1]
    path = '/root/python/python/taobao%s' % path
    imgedata=requests.get(item).content
    print(path)
    with open(path,"wb") as f:
        f.write(imgedata)

总结:两种保存方式,文件流比urlretrieve快

完整代码

#coding:utf-8
import requests
from urllib.request import urlretrieve
import re
import os


url="http://www.ivsky.com/tupian/xiaohuangren_t21343/"
headers = {
        'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50',
        'Referer': url,
        'Connection': 'Keep-alive'
    }

s=requests.get(url,headers=headers)
# print(s.url)
s=s.text
# print(s)
pattern = r'<div class="il_img".*?<img src="(.*?.jpg)" width'
pa=re.compile(pattern)
uls=re.findall(pattern=pa,string=s)

'''urlretrieve
for item in uls:
    # print(item)
    #http://img.ivsky.com/img/tupian/t/201411/01/xiaohuangren-004.jpg
    path = re.split("\/[0-9]{2}(\/.*?\.jpg)",item,2)[1]
    path = '/root/python/python/taobao%s'%path
    # print(os.path.exists(os.path.split(path)[0]))
    if not (os.path.exists(os.path.split(path)[0])):
        os.mkdir(os.path.split(path)[0])

    print(path)
    urlretrieve(item,path)

# print(len(uls))'''

for item in uls:
    path = re.split("\/[0-9]{2}(\/.*?\.jpg)", item, 2)[1]
    path = '/root/python/python/taobao%s' % path
    imgedata=requests.get(item).content
    print(path)
    with open(path,"wb") as f:
        f.write(imgedata)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值