在目前网络获取数据的方式有多种方式:GET方式
大部分被传输到浏览器的html,images,js,css, … 都是通过GET方法发出请求的。它是获取数据的主要方法
Get请求的参数都是在Url中体现的,如果有中文,需要转码,这时我们可使用
方法一:直接去浏览器里面复制,粘贴过来的汉字内容自动会转码。
from urllib.request import urlopen,Request
#在浏览器中, url = https://cn.bing.com/search?pglt=41&q=python爬虫
url = 'https://cn.bing.com/search?pglt=41&q=python%E7%88%AC%E8%99%AB'
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0'
headers = { 'User-Agent' : user_agent }
request = Request(url, headers=headers)
response = urlopen(request)
page = response.read().decode()[:1500]
print(page)
方法二:urllib.parse. quote() 转换一个值
from urllib.request import urlopen,Request
from urllib.parse import quote
# urllib.parse. quote() 转换一个值
arg = "python爬虫师"
url = f'https://cn.bing.com/search?pglt=41&q={quote(arg)}'
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0'
headers = { 'User-Agent' : user_agent }
request = Request(url, headers=headers)
response = urlopen(request)
page = response.read().decode()[:1500]
print(page)
方法三:urllib.parse.urlencode() 转换键值对
from urllib.request import urlopen,Request
from urllib.parse import quote,urlencode
# urllib.parse.urlencode() 转换键值对
arg = {"q":"python爬虫大师"}
url = f'https://cn.bing.com/search?pglt=41&q={urlencode(arg)}'
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0'
headers = { 'User-Agent' : user_agent }
request = Request(url, headers=headers)
response = urlopen(request)
page = response.read().decode()[:1500]
print(page)