python网络爬虫学习笔记之二发起http请求及传参

最新推荐文章于 2022-08-24 10:36:49 发布

盛桃云

最新推荐文章于 2022-08-24 10:36:49 发布

阅读量768

点赞数

分类专栏： python 文章标签： python 爬虫

本文链接：https://blog.csdn.net/bowei026/article/details/90183795

版权

python 专栏收录该内容

29 篇文章 1 订阅

订阅专栏

获取响应内容

response对象有属性：
text 请求返回的所有内容
status_code 状态码
encoding 编码
content 字节方式的响应内容，比如以\n表示回车符, 还有\t \r等
r.json() 如果返回的是json串，则会使用Requests自带的json解码器进行json的解析

传递请求参数

import requests

dict = {'key1' : 'value1', 'key2' : 'value2'}
link = 'http://httpbin.org/get'
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36', 'Content-Type': 'text/html'}
r = requests.get(link, headers=headers, params=dict)
print(r.content)
print(r.status_code)
print(r.json())
程序运行结果：
b'{\n "args": {\n "key1": "value1", \n "key2": "value2"\n }, \n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate", \n "Content-Type": "text/html", \n "Host": "httpbin.org", \n "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"\n }, \n "origin": "223.72.90.250, 223.72.90.250", \n "url": "https://httpbin.org/get?key1=value1&key2=value2"\n}\n'
200
{'args': {'key1': 'value1', 'key2': 'value2'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Content-Type': 'text/html', 'Host': 'httpbin.org', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}, 'origin': '223.72.90.250, 223.72.90.250', 'url': 'https://httpbin.org/get?key1=value1&key2=value2'}

可见通过params=dict已经正确传递了请求参数key1=value1&key2=value2 。另外，如果想将紧凑格式的json数据进行格式化，可以使用在线格式工具 http://www.bejson.com/

定制请求头

上面的例子中通过指定 headers参数传递了User-Agent，我们也可以传递更多的headers信息，如
import requests

link = 'http://httpbin.org/get'
headers = {'Host' : 'www.santostang.com', 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36', 'Content-Type': 'text/html'}
r = requests.get(link, headers=headers)
print(r.status_code)
还可以传递更多的headers参数，从浏览器的请求中查看Request Headers 中的内容都可以加入。

发送post请求

import requests

dict = {'key1' : 'value1', 'key2' : 'value2'}
headers = {'Host' : 'www.santostang.com', 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36', 'Content-Type': 'text/html'}
r = requests.post('http://httpbin.org/post', headers=headers, data=dict)
print(r.text)
运行结果：
{
"args": {},
"data": "key1=value1&key2=value2",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "23",
"Content-Type": "text/html",
"Host": "www.santostang.com",
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36"
},
"json": null,
"origin": "223.72.90.250, 223.72.90.250",
"url": "https://www.santostang.com/post"
}
post请求通过data参数指定请求的参数值

设置超时

import requests

r = requests.post('http://httpbin.org/post', timeout=0.001)
print(r.text)
运行结果：
因为超时参数 timeout 设置的值0.001太小，执行后程序报错 socket.timeout: timed out

爬取豆瓣网的top250 电影

import requests
from bs4 import BeautifulSoup

def getMovies():
	headers = {'Host' : 'movie.douban.com', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
	movies = []
	for i in range(0, 10):
		r = requests.post('https://movie.douban.com/top250?start=' + str(i * 25), headers=headers)

		soup = BeautifulSoup(r.text, 'lxml')
		div_list = soup.find_all('div', class_='hd')
		for div in div_list:
			title = div.a.span.text
			movies.append(title)

	return movies
		

movies = getMovies()
for i, movie in enumerate(movies):
	print(str(i+1) + "==" + movie)

运行程序会显示豆瓣网的前250部电影。

BeautifulSoup的文档可参考 https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/

本文内容到此结束，更多内容可关注公众号和个人微信号：

盛桃云

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
python网络爬虫学习笔记之二发起http请求及传参

获取响应内容response对象有属性：text 请求返回的所有内容status_code 状态码encoding 编码content 字节方式的响应内容，比如以\n表示回车符, 还有\t \r等r.json() 如果返回的是json串，则会使用Requests自带的json解码器进行json的解析传递请求参数import requestsdict = {'key1'...
复制链接

扫一扫