搭建Nginx 图片服务器

最新推荐文章于 2022-07-13 15:52:37 发布

渔夫的石头

最新推荐文章于 2022-07-13 15:52:37 发布

阅读量88

点赞数

文章标签： linux nginx 爬虫

本文链接：https://blog.csdn.net/qq_40025970/article/details/116722557

版权

@简介

项目需求

由于抓取的微信公众号中的图片链接插入新的html文本后出现访问问题，因此几乎用云主机的链接下载图片后，替换原来的微信默认链接，保证图片访问正常。

解决思路

下载图片到云主机（scrapy-imagePipeline)
用云主机的链接地址替换微信默认链接
启用Nginx服务

详细流程

下载图片：

修改settings.py,添加image pipeline

ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}

同样在settings.py中添加图片存储路径

IMAGES_STORE = '/path/to/valid/dir'

在items.py中添加用来存储图片的k值

import scrapy

class MyItem(scrapy.Item):
    # ... other item fields ...
    image_urls = scrapy.Field()
    images = scrapy.Field()

在spider中获取链接，并存入对应的k值

## other field
item['image_urls'] = response.css('div#js_content').css('img::attr(data-src)').getall()
##other field

此部分详细参考 Scrapy image pipeline

替换云主机的链接地址

from bs4 import Tag
def change_img_links(df):
    
    '''change src tag of image
    args:
        df, pandas dataframe
        
    '''
    base_url = '*****'
    
    for row in df.itertuples():
        for ele in row.new_content:
            img_link_local = ast.literal_eval(row.images)
            if isinstance(ele, Tag) and ele.name == 'img' and img_link_local \
                    and ele.has_attr('data-src'):
                        img_link = ele.attrs['data-src']
                        for img_local in img_link_local :
                            if img_link  == img_local['url']:
                                ele.attrs['src'] = base_url +img_local.get('path')                    
                                break
        continue

搭建nginx服务(OS:linux ubuntu 18.04)

安装nginx服务: 安装nginx.其中的问题，当make 后出现如下错误

   cc1: all warnings being treated as errors
   objs/Makefile:460: recipe for target 'objs/src/core/ngx_murmurhash.o' failed
   make[1]: *** [objs/src/core/ngx_murmurhash.o] Error 1
   make[1]: Leaving directory '/usr/local/nginx-1.11.3'
   Makefile:8: recipe for target 'build' failed
   make: *** [build] Error 2

然后参照下面链接修复make error的问题：
安装nginx: 修复make.
2. 配置server

 server {
    listen       99; ##因为80占用，修改了端口
    server_name  localhost;
    location / {
        root   html;
        index  index.html index.htm;
    	}
    location /full/ { ##当访问/full/时映射云主机物理地址为 /mnt/weixin_images/full/
        root /mnt/weixin_images;
        autoindex on;
    	}
    }

启用Nginx服务
然后又碰到403错误。参照如下链接
链接: 修复403 forbidden .

有了各位大神的文章，nginx服务终于启用成功～

渔夫的石头

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
搭建Nginx 图片服务器

这里写目录标题项目需求解决思路详细流程项目需求由于抓取的微信公众号中的图片链接插入新的html文本后出现访问问题，因此几乎用云主机的链接下载图片后，替换原来的微信默认链接，保证图片访问正常。解决思路下载图片到云主机（scrapy-imagePipeline)用云主机的链接地址替换微信默认链接启用Nginx服务详细流程下载图片：修改settings.py,添加image pipelineITEM_PIPELINES = {'scrapy.pipelines.images.Im
复制链接

扫一扫