python url中传递中文_python爬虫中对含中文的url处理

最新推荐文章于 2024-05-08 14:33:43 发布

weixin_39766910

最新推荐文章于 2024-05-08 14:33:43 发布

阅读量990

点赞数

文章标签： python url中传递中文

在练习urllib操作中，遇到了url中含有中文字符的问题。比如http://dotamax.com/，看下源码的话，上方的搜索框的name=p，输入内容点击搜索以后，通过GET方法进行传递，比如我们搜索”意“，url变为http://dotamax.com/search/?q=意。但是url中是不允许出现中文字符的，这时候就改用urllib.parse.quote方法对中文字符进行转换。

url = "http://dotamax.com/"

search = "search/?q=" + urllib.parse.quote("意")

html = urllib.request.urlopen(url + search)

这样就可以正常获取页面了。

需要注意的是不能对整个url调用quote方法。

print(urllib.parse.quote("http://dotamax.com/search/?q=意"))

上面代码输出结果：

http%3A//dotamax.com/search/%3Fq%3D%E6%84%8F

可以看到，' : ', ' ? ', ' = '都被解码，因此需要将最后的中文字符部分调用quote方法后接在后面。

但是还有更方便的方法：

import urllib.parse

b = b'/:?='

print(urllib.parse.quote("http://dotamax.com/search/?q=意", b))输出结果为：

http://dotamax.com/search/?q=%E6%84%8F这就是我们想要的结果了。对quote方法是用help命令可以看到如下信息：

Help on function quote in module urllib.parse:

quote(string, safe='/', encoding=None, errors=None)

quote('abc def') -> 'abc%20def'

Each part of a URL, e.g. the path info, the query, etc., has a

different set of reserved characters that must be quoted.

RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists

the following reserved characters.

reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |

"$" | ","

Each of these characters is reserved in some component of a URL,

but not necessarily in all of them.

By default, the quote function is intended for quoting the path

section of a URL. Thus, it will not encode '/'. This character

is reserved, but in typical usage the quote function is being

called on a path where the existing slash characters are used as

reserved characters.

string and safe may be either str or bytes objects. encoding must

not be specified if string is a str.

The optional encoding and errors parameters specify how to deal with

non-ASCII characters, as accepted by the str.encode method.

By default, encoding='utf-8' (characters are encoded with UTF-8), and

errors='strict' (unsupported characters raise a UnicodeEncodeError).

None

safe为可以忽略的字符，可以str类型或者bytes类型。

更详细的一些用法可以看这里：

http://www.nowamagic.net/academy/detail/1302863

weixin_39766910

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
python url中传递中文_python爬虫中对含中文的url处理

在练习urllib操作中，遇到了url中含有中文字符的问题。比如http://dotamax.com/，看下源码的话，上方的搜索框的name=p，输入内容点击搜索以后，通过GET方法进行传递，比如我们搜索”意“，url变为http://dotamax.com/search/?q=意。但是url中是不允许出现中文字符的，这时候就改用urllib.parse.quote方法对中文字符进行转换。url ...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。