# 巨小白爬虫记录

最新推荐文章于 2023-03-29 10:58:30 发布

猪1妖

最新推荐文章于 2023-03-29 10:58:30 发布

阅读量84

点赞数

分类专栏：爬虫文章标签： python爬虫

本文链接：https://blog.csdn.net/weixin_43591988/article/details/89361200

版权

爬虫专栏收录该内容

4 篇文章 0 订阅

订阅专栏

爬虫记录

从下面的连接学习的
https://www.jqhtml.com/13264.html

一.Request 用法

import requests
response = requests.get('http://www.baidu.com/')

上式response得到了baidu的html
再利用

print(response.text)

就可以输出baidu的HTML了

带输出参数的（暂且理解为给输出纪律做标记）

import requests
data = {'name' : 'jack',‘age' : 20}
resp = requests.get('www.baid.com', params=data)
print(resp.text)

这样子输出的结果，是在百度HTML前加了name 跟 age 的参数

json解析（暂时不知道干嘛用）
json是一个库，他有自带的解析方式
requests有json这个函数，利用json和json.load二者解析出来的结果没区别


import requests
import json
resp = requests.get('http://httpbin.org/get')

print(resp.json())
print(json.loads(resp.text))

resp.json()与json.loads(resp.text)的结果一致，没有区别
网页获取图片

import requests

resp = requests.get('http://www.baidu.com/img/baidu_jgylogo3.gif')
print(resp.content)
print(resp.text)

输出的东西完全看不懂。。。

content获取的结果是二进制的数，text获取的是字符串。
所以如果要获取图片，使用的content参数。
图片保存

with open('logo.gif','wb') as f:
    f.write(resp.content)

在代码之后，加上open函数，会自动保存在与该py文件相同的地址，并且命名为logo.gif（wb就不知道是干嘛用的。）

添加headers

为了反反爬虫，加个headers能进行有效访问。

import requests
 
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36'}
resp = requests.get('http://www.baidu.com', headers=headers)
print(resp.text)

headers地址百度自行查找。

各种获取信息的代码


import requests
 
response = requests.get('http://www.baidu.com/')
print(type(response.status_code)) # 状态码
print(type(response.text)) # 网页源码
print(type(response.headers)) # 头部信息
print(type(response.cookies)) # Cookie
print(type(response.url)) # 请求的url
print(type(response.history)) # 访问的历史记录

输出过后不了解

我输出了text跟cookie跟url，不知道为啥是这样。

猪1妖

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
# 巨小白爬虫记录

爬虫记录从下面的连接学习的https://www.jqhtml.com/13264.html一.Request 用法import requestsresponse = requests.get('http://www.baidu.com/') 上式response得到了baidu的html再利用print(response.text)就可以输出baidu的HTML了带输出参...
复制链接

扫一扫