Python爬虫

最新推荐文章于 2022-04-23 18:28:08 发布

LZ_Luzhuo

最新推荐文章于 2022-04-23 18:28:08 发布

阅读量541

点赞数

分类专栏： Python 文章标签： python爬虫

本文链接：https://blog.csdn.net/Rozol/article/details/80010235

版权

42 篇文章 4 订阅

订阅专栏

本文由 Luzhuo 编写,转发请保留该信息.
原文: https://blog.csdn.net/rozol/article/details/80010235

GET / HTTP/1.1
- 请求方法: GET / POST / PUT / DELETE / …
Host: www.baidu.com
- 主机+端口号
Connection: keep-alive[重要]
- 连接类型: 客户端发起Connection: keep-alive
  - 服务端支持, 则回复Connection: keep-alive响应, 不关闭连接
  - 服务端不支持, 则回复Connection: close响应, 关闭连接
- 利用keep-alive可重用连接, 不必每次请求都建立连接, 减轻服务端压力
Upgrade-Insecure-Requests: 1
- 升级为https请求, 在加载资源时, 会自动将http请求替换成https请求
User-Agent: Mozilla/5.0 ...[重要]
- 浏览器名称, 平时尽量多去收集一些
Accept: text/html, ...
- 传输的数据类型: /(所有) / image/jpeg(jpeg图片) / image/gif(gif图片) / text/html(html文本) / application/xxx(数据或文件) / …
Referer: https://blog.csdn.net/rozol[重要]
- 页面跳转, 表示请求的网页来自于哪个url
Accept-Encoding: gzip, deflate, br[重要]
- 编码格式, 一般是压缩类型, 一般不写
Accept-Language: zh-CN,zh;q=0.9
- 语言: en / en-us / zh / zh-cn
Cookie: PSTM=1496126358; ...
- Cookie
Content-Type: application/x-www-form-urlencoded
- POST请求发送的数据类型

Server: Tengine
- 服务器类型
Connection: keep-alive
- 连接类型
Keep-Alive: timeout=20
- 保持TCP通道的时长
Cache-Control：must-revalidate, no-cache, private
- 告诉客户端不要缓存资源
- 客户端发送Cache-Control: max-age=0请求, 表示没有缓存资源
- 服务器回应Cache-Control: no-cache
Pragma: no-cache
- 同 Cache-Control
Content-Type: text/html;charset=UTF-8
- 资源类型
Date: Sat, 14 Apr 2018 14:08:23 GMT
- 服务器发送数据时间
Expires: Thu, 01 Jan 1970 00:00:00 GMT
- 在这个时间之前, 可以直接访问缓存
Transfer-Encoding：chunked
- 服务器发送资源以分块的方式发送