python scrapy 爬取bing的背景图片

最新推荐文章于 2021-06-13 14:31:40 发布

weixin_34383618

最新推荐文章于 2021-06-13 14:31:40 发布

阅读量369

点赞数

文章标签： python

原文链接：https://my.oschina.net/zooy/blog/783869

版权

为什么80%的码农都做不了架构师？>>>

最近看了下python，就想着获取下bing的背景图片，每天定时爬取，保存到本地，可以做背景图片用。也在网上看了一些其他的例子。就自己动手写了一个小的爬图片的python脚本。

scrapy安装

安装是参照官网http://scrapy-chs.readthedocs.io/zh_CN/0.24/intro/tutorial.html 安装完成，本人的系统是ubuntu，所以按照ubuntu系统来安装的，建立一个scrapy项目。实现步骤： 1.是写的根据url获取到response body内容来解析出来的图片地址，正则写的不好（个人感觉）。 2.获取图片的流保存到本地文件目录

# -*- coding: utf-8 -*-
import scrapy as sc
import os
import re
import cookielib
import urllib2


def get_file(url):
    try:
        cj = cookielib.LWPCookieJar()
        opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
        urllib2.install_opener(opener)
        req = urllib2.Request(url)
        operate = opener.open(req)
        data = operate.read()
        return data
    except BaseException, e:
        print e
        return None


class ExampleSpider(sc.Spider):

    name = "bing"
    allowed_domains = ["cn.bing.com"]
    start_urls = (
        "http://cn.bing.com/",
    )

    def parse(self, response):
        bg = re.compile("g_img=\{url:..(http:.*)\",id:")
        path = "/home/zooy/Pictures/bing"
        if not os.path.exists(path):
            os.makedirs(path)

        url = bg.search(response.body).groups()[0]
        file_path = path + "/" + url.split('/')[-1]
        if not os.path.isfile(file_path):
            with open(path + "/" + url.split('/')[-1], "wb") as f:
                f.write(get_file(url))
                f.flush()
                f.close()

3.通过crontab 定义了一个定时，每天执行一下这个程序 sh脚本如下：

#！ /bin/sh
export PATH=$PATH:/usr/local/bin
cd /home/zooy/workcode/tutorial
nohup scrapy crawl bing >> bing.log 2>&1 &

现在每天都会去cn.bing.com去抓取一张图片到本地来。

转载于:https://my.oschina.net/zooy/blog/783869

weixin_34383618

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫