爬虫 code为200，但页面错误的问题

最新推荐文章于 2024-01-10 10:08:56 发布

一位路过的程序员

最新推荐文章于 2024-01-10 10:08:56 发布

阅读量5.3k

点赞数 2

分类专栏： js逆向文章标签： python 爬虫 post

本文链接：https://blog.csdn.net/weixin_45541986/article/details/116757929

版权

js逆向专栏收录该内容

11 篇文章 1 订阅

订阅专栏

先放输出结果

E:\Python38\python.exe E:/PycharmProjects/test.py
http://www.szse.cn/api/disc/announcement/annList?random=0.16208973259833276
<Response [200]>
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <link href="/maintain/images/favicon.ico" rel="shortcut icon" type="image/x-icon">
    <title>深圳证券交易所</title>
    <title>50x</title>
  <style>
    * {
      padding: 0;
      margin: 0;
    }

    html,
    body {
      width: 100%;
      height: 100%;
      position: relative;
      background: #fff;
    }

    #wrap {
      position: absolute;
      top: 0;
      right: 0;
      bottom: 0;
      left: 0;
      margin: auto;
    }

    #contImg {
      max-width: 100%;
      max-height: 100%;
    }
  </style>
</head>

<body>
  <div id="wrap">
    <img id="contImg" src="">
  </div>
</body>
<script>
  (function () {
    var img = new Image();
    var src = '/maintain/images/50x_b.png';
    var wrap = document.getElementById('wrap');
    var contImg = document.getElementById('contImg');
    var vWidth = window.innerWidth;
    var vHeight = window.innerHeight;


    img.onload = function () {
      window.cartoonIWidth = img.width;
      window.cartoonIHeight = img.height;

      cartoonHImgOnloaded(vWidth, vHeight, wrap, contImg);
    };
    img.src = src;

    window.cartoonHImgOnloaded = function (vWidth, vHeight, wrap, contImg) {
      var wrAspectRatio = cartoonIWidth / cartoonIHeight;
      var wrWidth = cartoonIWidth;
      var wrHeight = cartoonIHeight;

      if (wrWidth < vWidth && wrHeight < vHeight) {
        wrap.style.width = wrWidth + 'px';
        wrap.style.height = wrHeight + 'px';
        contImg.style.height = cartoonIHeight + 'px';
      }

      if (wrWidth >= vWidth) {
        var h = vWidth * .9 / wrAspectRatio;

        if (h <= vHeight) {
          wrap.style.width = '90%';
          contImg.style.height = h + 'px';
          wrap.style.height = h + 'px';
        }
      }

      if (wrHeight >= vHeight) {
        var h = vHeight * .9;
        var w = h * wrAspectRatio;

        if (w <= vWidth) {
          wrap.style.height = '90%';
          contImg.style.height = h + 'px';
          wrap.style.width = w + 'px';
        }
      }

      contImg.src = src;
    }


  })()
</script>
import json
import time
import datetime
import requests

t = time.time()
random = '0.' + str(t).replace(".", '')
url = "http://www.szse.cn/api/disc/announcement/annList?random=" + random
print(url)
headers = {
    "Host": "www.szse.cn",
    "Referer": "http://www.szse.cn/disclosure/bond/notice/index.html",
    "Origin": "http://www.szse.cn",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36",
}
data = {"seDate": ["", ""],
        "channelCode": ["bondinfoNotice_disc"],
        "smallCategoryId": ["013901"],
        "pageSize": 30,
        "pageNum": 1
        }
response = requests.post(url=url, headers=headers, data=json.dumps(data))
print(response)
print(response.content.decode())

</html>

Process finished with exit code 0

代码

import json
import time
import datetime
import requests

t = time.time()
random = '0.' + str(t).replace(".", '')
url = "http://www.szse.cn/api/disc/announcement/annList?random=" + random
print(url)
headers = {
    "Host": "www.szse.cn",
    "Referer": "http://www.szse.cn/disclosure/bond/notice/index.html",
    "Origin": "http://www.szse.cn",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36",
}
data = {"seDate": ["", ""],
        "channelCode": ["bondinfoNotice_disc"],
        "smallCategoryId": ["013901"],
        "pageSize": 30,
        "pageNum": 1
        }
response = requests.post(url=url, headers=headers, data=json.dumps(data))
print(response)
print(response.content.decode())

在这里我们可以看到requests请求是成功的,code为200，但返回的页面确实错误的。

经过多次尝试在不断添加headers中的参数
在这里插入图片描述
最后在添加"Content-Type": "application/json",之后成功获得了数据。

我想这也是一种反爬手段，你必须要去请求获得的数据类型，才能获得数据。

一位路过的程序员

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
爬虫 code为200，但页面错误的问题

先放输出结果E:\Python38\python.exe E:/PycharmProjects/test.pyhttp://www.szse.cn/api/disc/announcement/annList?random=0.16208973259833276<Response [200]><!DOCTYPE html><html> <head> <meta charset="utf-8"> <link href
复制链接

扫一扫