python url中传递中文_python爬虫中对含中文的url处理

在练习urllib操作中,遇到了url中含有中文字符的问题。比如http://dotamax.com/,看下源码的话,上方的搜索框的name=p,输入内容点击搜索以后,通过GET方法进行传递,比如我们搜索”意“,url变为http://dotamax.com/search/?q=意。但是url中是不允许出现中文字符的,这时候就改用urllib.parse.quote方法对中文字符进行转换。

url = "http://dotamax.com/"

search = "search/?q=" + urllib.parse.quote("意")

html = urllib.request.urlopen(url + search)

这样就可以正常获取页面了。

需要注意的是不能对整个url调用quote方法。

print(urllib.parse.quote("http://dotamax.com/search/?q=意"))

上面代码输出结果:

http%3A//dotamax.com/search/%3Fq%3D%E6%84%8F

可以看到,' : ', ' ? ', ' = '都被解码,因此需要将最后的中文字符部分调用quote方法后接在后面。

但是还有更方便的方法:

import urllib.parse

b = b'/:?='

print(urllib.parse.quote("http://dotamax.com/search/?q=意", b))输出结果为:

http://dotamax.com/search/?q=%E6%84%8F这就是我们想要的结果了。对quote方法是用help命令可以看到如下信息:

Help on function quote in module urllib.parse:

quote(string, safe='/', encoding=None, errors=None)

quote('abc def') -> 'abc%20def'

Each part of a URL, e.g. the path info, the query, etc., has a

different set of reserved characters that must be quoted.

RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists

the following reserved characters.

reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |

"$" | ","

Each of these characters is reserved in some component of a URL,

but not necessarily in all of them.

By default, the quote function is intended for quoting the path

section of a URL. Thus, it will not encode '/'. This character

is reserved, but in typical usage the quote function is being

called on a path where the existing slash characters are used as

reserved characters.

string and safe may be either str or bytes objects. encoding must

not be specified if string is a str.

The optional encoding and errors parameters specify how to deal with

non-ASCII characters, as accepted by the str.encode method.

By default, encoding='utf-8' (characters are encoded with UTF-8), and

errors='strict' (unsupported characters raise a UnicodeEncodeError).

None

safe为可以忽略的字符,可以str类型或者bytes类型。

更详细的一些用法可以看这里:

http://www.nowamagic.net/academy/detail/1302863

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值