彻底解决python cgi 编程出现的编码问题

Answering this for late-comers because I don't think that the posted answers get to the root of the problem, which is the lack of locale environment variables in a CGI context. I'm using Python 3.2.

  1. open() opens file objects in text (string) or binary (bytes) mode for reading and/or writing; in text mode the encoding used to encode strings written to the file, and decode bytes read from the file, may be specified in the call; if it isn't then it is determined by locale.getpreferredencoding(), which on linux uses the encoding from your locale environment settings, which is normally utf-8 (from e.g. LANG=en_US.UTF-8)

    >>> f = open('foo', 'w') # open file for writing in text mode >>> f.encoding 'UTF-8' # encoding is from the environment >>> f.write('€') # write a Unicode string 1 >>> f.close() >>> exit() user@host:~$ hd foo 00000000 e2 82 ac |...| # data is UTF-8 encoded
  2. sys.stdout is in fact a file opened for writing in text mode with an encoding based on locale.getpreferredencoding(); you can write strings to it just fine and they'll be encoded to bytes based on sys.stdout's encoding; print() by default writes to sys.stdout - print() itself has no encoding, rather it's the file it writes to that has an encoding;

    >>> sys.stdout.encoding 'UTF-8' # encoding is from the environment >>> exit() user@host:~$ python3 -c 'print("€")' > foo user@host:~$ hd foo 00000000 e2 82 ac 0a |....| # data is UTF-8 encoded; \n is from print()

    ; you cannot write bytes to sys.stdout - use sys.stdout.buffer.write() for that; if you try to write bytes to sys.stdout using sys.stdout.write() then it will return an error, and if you try using print() then print() will simply turn the bytes object into a string object and an escape sequence like \xff will be treated as the four characters \, x, f, f

    user@host:~$ python3 -c 'print(b"\xe2\xf82\xac")' > foo user@host:~$ hd foo 00000000 62 27 5c 78 65 32 5c 78 66 38 32 5c 78 61 63 27 |b'\xe2\xf82\xac'| 00000010 0a |.|
  3. in a CGI script you need to write to sys.stdout and you can use print() to do it; but a CGI script process in Apache has no locale environment settings - they are not part of the CGI specification; therefore the sys.stdout encoding defaults to ANSI_X3.4-1968 - in other words, ASCII; if you try to print() a string that contain non-ASCII characters to sys.stdout you'll get "UnicodeEncodeError: 'ascii' codec can't encode character...: ordinal not in range(128)"

  4. a simple solution is to pass the Apache process's LANG environment variable through to the CGI script using Apache's mod_env PassEnv command in the server or virtual host configuration: PassEnv LANG; on Debian/Ubuntu make sure that in /etc/apache2/envvars you have uncommented the line ". /etc/default/locale" so that Apache runs with the system default locale and not the C (Posix) locale (which is also ASCII encoding); the following CGI script should run without errors in Python 3.2:

    #!/usr/bin/env python3
    import sys
    print('Content-Type: text/html; charset=utf-8') print() print('<html><body><pre>' + sys.stdout.encoding + '</pre>h€lló wörld<body></html>')

https://stackoverflow.com/questions/9322410/set-encoding-in-python-3-cgi-scripts

转载于:https://www.cnblogs.com/peter1994/p/7655315.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值