shell 下 urlencode/urldecode 编码/解码的几种方法

1、先弄清楚为什么要 urlencode?
http://www.genome.iastate.edu/community/angenmap/URLEncoding.html
URL Encoding is the process of converting string into valid URL format.  Valid URL format means that the URL contains only what is termed "alpha | digit | safe | extra | escape" characters.  You can read more about the what and the whys of these terms on the World Wide Web Consortium site: http://www.w3.org/Addressing/URL/url-spec.htmlandhttp://www.w3.org/International/francois.yergeau.html.  
URL encoding is normally performed to convert data passed via html forms, because such data may contain special character, such as "/", ".", "#", and so on, which could either: a) have special meanings; or b) is not a valid character for an URL; or c) could be altered during transfer.   For instance, the "#" character needs to be encoded because it has a special meaning of that of an html anchor.   The <space> character also needs to be encoded because is not allowed on a valid URL format.   Also, some characters, such as "~" might not transport properly across the internet.
2、urlencode的编码规则,这里我参考 jdk 的:
http://docs.oracle.com/javase/1.5.0/docs/api/java/net/URLEncoder.html
Utility class for HTML form encoding. This class contains static methods for converting a String to the application/x-www-form-urlencoded MIME format. For more information about HTML form encoding, consult the HTMLspecification.
When encoding a String, the following rules apply:
The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
The special characters ".", "-", "*", and "_" remain the same.
The space character " " is converted into a plus sign "+".
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", wherexyis the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
For example using UTF-8 as the encoding scheme the string "The string ü@foo-bar" would get converted to "The+string+%C3%BC%40foo-bar" because in UTF-8 the character ü is encoded as two bytes C3 (hex) and BC (hex), and the character @ is encoded as one byte 40 (hex).
3、shell 下如何处理 urlencode 问题?
在各种语言中都有专门针对 url 进行 encode/decode 的函数/API,如 python、java、perl 等。
但shell下面似乎没有专门的命令来做这个事情,不过这可难不倒sheller,
别忘了前篇文章还提到了 *nix 的设计哲学:Where there is a shell,  there is a way.
只要弄清了 url encode/decode 的原理,shell处理他们也是轻而易举的,废话少说,上代码:
(1)编码
june@~ 23:40:29>
echo '手机' | tr -d '\n' | xxd -plain | sed 's/\(..\)/%\1/g'              #echo '手机' |tr -d '\n' |od -An -tx1|tr ' ' %
%ca%d6%bb%fa
june@~ 23:40:46>
然后你在浏览器试试:
http://www.baidu.com/s?wd=%ca%d6%bb%fa
(2)解码:
june@~ 23:50:11>
url="http://www.baidu.com/s?wd=%ca%d6%bb%fa"
printf $(echo -n $url | sed 's/\\/\\\\/g;s/\(%\)\([0-9a-fA-F][0-9a-fA-F]\)/\\x\2/g')"\n"
http://www.baidu.com/s?wd=手机
june@~ 23:50:13>
4、其实我们经常用的 curl 的 --data-urlencode 选项即可实现 urlencode 编码:
june@~ 23:58:38>
curl -v -L -G --data-urlencode 'wd=手机' "http://www.baidu.com/s"
* About to connect() to www.baidu.com port 80 (#0)
*   Trying 220.181.112.143... connected
* Connected to www.baidu.com (220.181.112.143) port 80 (#0)
> GET /s?wd=%CA%D6%BB%FA HTTP/1.1
> User-Agent: curl/7.20.1 (i686-pc-cygwin) libcurl/7.20.1 OpenSSL/0.9.8r zlib/1.2.5 libidn/1.18 libssh2/1.2.5
> Host: www.baidu.com
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Sat, 05 May 2012 15:58:45 GMT
< Server: BWS/1.0
< Content-Length: 101019
< Content-Type: text/html;charset=gbk
< Cache-Control: private
< Set-Cookie: BAIDUID=F1BD65A89533B12403EAB701C73D4638:FG=1; expires=Sat, 05-May-42 15:58:44 GMT; path=/; domain=.baidu.com
< P3P: CP=" OTI DSP COR IVA OUR IND COM "
< Connection: Keep-Alive
<
<!DOCTYPE html><!--STATUS OK--><html><head>
<meta http-equiv="X-UA-Compatible" content="IE=7">
<meta http-equiv="content-type" content="text/html;charset=gb2312">
<title>百度搜索_手机      </title>
.......

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: Python中的`urlencode`和`urldecode`是用于处理URL编码解码方法。 `urlencode`方法用于将一个字典形式的参数列表转换为URL编码的字符串。例如,假设有一个字典`params`包含以下键值对: ``` params = {"name": "张三", "age": 20, "city": "北京"} ``` 使用`urlencode`方法将字典转换为URL编码的字符串: ``` import urllib.parse url_encoded = urllib.parse.urlencode(params) print(url_encoded) ``` 输出结果为: ``` name=%E5%BC%A0%E4%B8%89&age=20&city=%E5%8C%97%E4%BA%AC ``` `urldecode`方法用于将URL编码的字符串解码为字典形式的参数列表。例如,将上面的URL编码字符串解码为字典: ```python import urllib.parse url_decoded = urllib.parse.parse_qs(url_encoded) print(url_decoded) ``` 输出结果为: ``` {'name': ['张三'], 'age': ['20'], 'city': ['北京']} ``` 可以通过访问字典的键来获取对应的值: ```python name = url_decoded['name'][0] age = url_decoded['age'][0] city = url_decoded['city'][0] print(f"姓名:{name},年龄:{age},城市:{city}") ``` 输出结果: ``` 姓名:张三,年龄:20,城市:北京 ``` 通过这两个方法,我们可以方便地进行URL编码解码的操作,以便于在HTTP请求或其他URL相关的场景中使用。 ### 回答2: Python中的urlencode方法是将字典数据编码URL参数的字符串。它将字典键值对转换为一组key=value对,并用&符号链接起来。这通常用于构建GET请求的URL参数部分。 例如,将以下字典数据编码URL参数字符串: ``` params = {'name': '张三', 'age': 20, 'city': '北京'} ``` 通过使用urlencode方法,我们可以得到如下结果: ``` encoded_params = urllib.parse.urlencode(params) print(encoded_params) ``` 输出结果为:name=%E5%BC%A0%E4%B8%89&age=20&city=%E5%8C%97%E4%BA%AC urldecode的功能与urlencode相反,它用于将URL参数字符串解码为字典数据。例如,将上面编码后的字符串解码为字典数据: ``` decoded_params = urllib.parse.parse_qs(encoded_params) print(decoded_params) ``` 输出结果为:{'name': ['张三'], 'age': ['20'], 'city': ['北京']} 注意,这里使用了urllib.parse模块来进行编码解码。在Python 3中,urllib模块已经被拆分为多个子模块,其中urllib.parse包含了URL编码解码相关的功能。 综上所述,Python中的urlencodeurldecode方法提供了方便的对URL参数进行编码解码的功能,可以用于构建和解析GET请求的URL参数部分。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值