python爬虫工程师--手把手教会你--04发送get请求.py

最新推荐文章于 2024-09-29 00:00:00 发布

要争就争第一

最新推荐文章于 2024-09-29 00:00:00 发布

阅读量178

点赞数 2

分类专栏：爬虫文章标签：爬虫

本文链接：https://blog.csdn.net/m0_75084899/article/details/138046484

版权

爬虫专栏收录该内容

9 篇文章 0 订阅

订阅专栏

本文介绍了如何在Python中使用urllib库的Request对象发送GET请求，重点讲解了通过浏览器复制粘贴、urllib.parse.quote和urllib.parse.urlencode进行中文字符URL编码的方法，以确保爬虫能正确获取数据。

摘要由CSDN通过智能技术生成

在目前网络获取数据的方式有多种方式：GET方式

大部分被传输到浏览器的html，images，js，css, … 都是通过GET方法发出请求的。它是获取数据的主要方法

Get请求的参数都是在Url中体现的,如果有中文,需要转码,这时我们可使用

方法一：直接去浏览器里面复制，粘贴过来的汉字内容自动会转码。

from urllib.request import urlopen,Request
#在浏览器中， url = https://cn.bing.com/search?pglt=41&q=python爬虫
url = 'https://cn.bing.com/search?pglt=41&q=python%E7%88%AC%E8%99%AB'
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0' 
headers = { 'User-Agent' : user_agent } 

request = Request(url, headers=headers) 
response = urlopen(request) 
page = response.read().decode()[:1500]

print(page)

方法二：urllib.parse. quote() 转换一个值

from urllib.request import urlopen,Request
from urllib.parse import quote
# urllib.parse. quote() 转换一个值
arg = "python爬虫师"
url = f'https://cn.bing.com/search?pglt=41&q={quote(arg)}'
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0' 
headers = { 'User-Agent' : user_agent } 

request = Request(url, headers=headers) 
response = urlopen(request) 
page = response.read().decode()[:1500]

print(page)

方法三：urllib.parse.urlencode() 转换键值对

from urllib.request import urlopen,Request
from urllib.parse import quote,urlencode
# urllib.parse.urlencode() 转换键值对
arg = {"q":"python爬虫大师"}
url = f'https://cn.bing.com/search?pglt=41&q={urlencode(arg)}'
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0' 
headers = { 'User-Agent' : user_agent } 

request = Request(url, headers=headers) 
response = urlopen(request) 
page = response.read().decode()[:1500]

print(page)