pythonrequests查询_python爬虫之requests库

最新推荐文章于 2023-10-27 10:30:00 发布

weixin_39765290

最新推荐文章于 2023-10-27 10:30:00 发布

阅读量244

点赞数

文章标签： pythonrequests查询

requests库介绍

发送http请求的第三方库，兼容python2和python3

安装：

pip install requests

使用：

import requests

发送请求

response = requests.get(url)

response = requests.post(url)

响应内容

请求返回的值是一个response对象，是对http协议服务端返回数据的封装

response对象主要属性和方法:

response.status_code

返回码

response.headers

返回的头信息，字典类型

response.content

响应的原始数据字节类型，图片、音频、视频一般用这种

response.encoding

text数据转码格式，先设置encoding，然后再取出text，就解决了乱码问题

response.text

响应的网页源代码，数据经过转码的字符串

response.cookies

服务器返回的cookies

response.json()

当结果为json格式数据时，把它转成字典

response = requests.get('http://www.baidu.com')print(response.status_code) #200

print(response.headers) #服务器返回的头信息

print(response.content) #原始数据，字节类型

print(response.content.decode()) #网页源码已转码

print(response.text) #网页源码因转码方式为iso-8859,中文乱码当返回的头信息中的content-type 有charset属性时，

#转码按照charset的值来，如果没有charset而有text类型，则按照iso-8859来

response.encoding = 'utf-8'

print(response.text) #网页源码把转码方式设为utf-8,解决中文乱码

print(response.cookies)

查询参数

get请求对url进行传参(url拼接)

importrequests

payload= {'wd':'python'}

response= requests.get('http://www.baidu.com/s?',params=payload)

response.encoding= 'utf-8'

print(response.text)print(response.url) #打印最终请求的url http://www.baidu.com/s?wd=python

post请求提交参数

importrequests

data= {'user':'qqq'} #参数

response = requests.post('http://httpbin.org/post',data=data)

response.encoding= 'utf-8'

print(response.text)

超时设置

importrequests

response= requests.get('https://www.google.com',timeout=5) #5秒后还没有应答，就会报错超时，后续可以进行异常处理

cookies处理

比如登录页面之后，把cookies保存起来，然后在后续请求中，把cookies传入

data ={'account_name': 'asda','password':'qwe123'}

result= requests.post('https://qiye.163.com/login/',data=data)if result.status_code == 200:

cookies=result.cookies

response= requests.get('https://qiye.163.com/News',cookies=cookies)

这样，带上登录后的cookies的请求，就可以正常地访问登录后数据了

session

为了维持客户端和服务端的通信状态

session=requests.session()

session.get()#session对象的api和requests基本一样并且用session请求，会自动保存cookies，并且下次请求会自己带上，方便

SSL证书认证

无证书访问

importrequests

response= requests.get('https://www.12306.cn')#在请求https时，request会进行证书的验证，如果验证失败则会抛出异常

print(response.status_code)

关闭证书验证

importrequests#关闭验证，但是仍然会报出证书警告

response = requests.get('https://www.12306.cn',verify=False)print(response.status_code)

消除关闭证书验证的警告

from requests.packages importurllib3importrequests#关闭警告

urllib3.disable_warnings()

response= requests.get('https://www.12306.cn',verify=False)print(response.status_code)

手动设置证书

importrequests#设置本地证书

response = requests.get('https://www.12306.cn', cert=('/path/server.crt','/path/key'))print(response.status_code)

携带headers头信息

headers ={'User=Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',

}

r= requests.get('https://www.zhihu.com',headers=headers,verify=False) #添加头信息发送请求,不添加会被知乎拒绝访问

关闭重定向：allow_redirects=False

headers ={'User=Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',

}

r= requests.get('https://www.zhihu.com',headers=headers,verify=False,allow_redirects=False) #关闭重定向

设置代理

普通代理

proxies = {'http':'183.232.188.18:80','https':'183.232.188.18:80'}

r= requests.get(url='www.baidu.com',proxies=proxies) #使用代理进行请求

有密码的代理

importrequests

proxies={"http":"http://user:password@127.0.0.1:9743/",

}

response= requests.get("https://www.taobao.com", proxies=proxies)print(response.status_code)

SOCKS 代理

Requests 自 2.10.0 版起，开始支持 SOCKS 协议的代理，如果要使用，我们还需安装一个第三方库：

pip install requests[socks]

SOCKS 代理的使用和 HTTP 代理类似：

importrequests

proxies={"http": "socks5://user:pass@host:port","https": "socks5://user:pass@host:port",

}

requests.get("http://example.org", proxies=proxies)

转换json格式数据

r = requests.get('http://httpbin.org/ip')print(r.json()) #当返回的数据是json格式时，可以直接通过json()方法把json格式的数据转成字典

文件上传

importrequests

files= {'file':open('favicon.ico','rb')}#往POST请求头中设置文件(files)

response = requests.post('http://httpbin.org/post',files=files)print(response.text)

上传多个分块编码的文件

你可以在一个请求中发送多个文件。例如，假设你要上传多个图像文件到一个 HTML 表单，使用一个多文件 field 叫做 "images":

要实现，只要把文件设到一个元组的列表中，其中元组结构为 (form_field_name, file_info):

>>> url = 'http://httpbin.org/post'

>>> multiple_files =[

('images', ('foo.png', open('foo.png', 'rb'), 'image/png')),

('images', ('bar.png', open('bar.png', 'rb'), 'image/png'))]>>> r = requests.post(url, files=multiple_files)>>>r.text

{

...'files': {'images': 'data:image/png;base64,iVBORw ....'}'Content-Type': 'multipart/form-data; boundary=3131623adb2043caaeb5538cc7aa0b3a',

...

}

认证设置

有时请求某个网站，但是那个网站会弹出账户密码的框，输入账号密码才能访问，

importrequestsfrom requests.auth importHTTPBasicAuth

r= requests.get('http://120.27.34.24:9001', auth=HTTPBasicAuth('user','123'))#r = requests.get('http://120.27.34.24:9001', auth=('user', '123'))

print(r.status_code)

下载大文件

当使用requests的get下载大文件/数据时，建议使用使用stream模式。

当把get函数的stream参数设置成False时，它会立即开始下载文件并放到内存中，如果文件过大，有可能导致内存不足。

当把get函数的stream参数设置成True时，它不会立即开始下载，当你使用iter_content或iter_lines遍历内容或访问内容属性时才开始下载。需要注意一点：文件没有下载之前，它也需要保持连接。

iter_content：一块一块的遍历要下载的内容

iter_lines：一行一行的遍历要下载的内容

使用上面两个函数下载大文件可以防止占用过多的内存，因为每次只下载小部分数据。

示例代码：

r = requests.get(url_file, stream=True)

f= open("file_path", "wb")for chunk in r.iter_content(chunk_size=512):ifchunk:

f.write(chunk)

实例:用requests模拟github登录

'''思路：github登录需要携带首页的cookies，并且设置头信息中的UA，而且post表单中有一个token参数需要请求首页才能得到'''

importreimportrequestsimporturllib3

urllib3.disable_warnings()#取消警告

defget_params():

start_url= 'https://github.com/login' #从login页面获取cookies和token参数

response = requests.get(start_url,verify=False) #关闭ssl验证

cookies =response.cookies#print(response.text)

token = re.findall(r'',response.text)[0] #正则取出token

returncookies,tokendeflogin():

post_url= 'https://github.com/session' #真正登录提交数据的页面cookies,token=get_params()#headers里面注意要有referer，表明是从该链接过来的，防盗链

headers ={'Host': 'github.com','Referer': 'https://github.com/login','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36','Accept-Encoding': 'gzip, deflate, br',

}#data是通过抓包获取的，是登录时提交的表单参数

data ={'commit': 'Sign in','utf8': '✓','authenticity_token': token,'login': 'xxxxxx','password': 'xxxxxxxx',

}

r= requests.post(url=post_url,data=data,headers=headers,cookies=cookies,verify=False)print(r.text)if __name__ == '__main__':

最后在输出的文本中搜索一下 Start a project(我们在浏览器进入github，首页里有这个)

搜索到说明登录成功了！

weixin_39765290

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pythonrequests查询_python爬虫之requests库

requests库介绍发送http请求的第三方库，兼容python2和python3安装：pip install requests使用：import requests发送请求response = requests.get(url)response = requests.post(url)响应内容请求返回的值是一个response对象，是对http协议服务端返回数据的封装response对象主要属性...
复制链接

扫一扫