How to avoid Unicode pitfalls in Mojolicious

转载自:http://showmetheco.de/articles/2010/10/how-to-avoid-unicode-pitfalls-in-mojolicious.html

Unicode is hard.Unicode in Perl is even harder,because sometimes Perl is just too smart.While Mojolicious is a web framework,no wonder why it should support Unicode really well.But even if Mojolicious tries hard to make things easy for a developer,one must really understand what's going on behind the scene.

There is a really good documentation on using Unicode in Perl.Raise your hand if you haven't read perluniintro,perlunicode,perlunifaq or perlunitut.I knew that.

A byte and a character is not the same thing.Sometimes they have the same length (all ASCII characters),but most of the time (other Unicode characters) their length is different.For example,the dollar character $ is 1 byte long,but euro is 3.In Perl we want to work with characters.

For decoding bytes to characters and encoding characters to bytes we use Encode module.I kept always forgetting what is encoding and what is decoding.But if you think for a minute and imagine this evil world that lives in bytes,and you want to take something from it and bring it to Perl,you have to decode it,so you can understand it.And otherwise when you have to send something you have to encode your stuff.

There is also utf8 pragma.Everybody keeps thinking that if you use it,you don't have any problems,that Perl will do everything for you.But the problem is that utf8 pragma must be used ONLY when you have unicode characters in your module or perl file.It doesn't take care of file handles or databases.

Back to the evil world.It lives in bytes.When using Mojolicious client's request comes in bytes and Mojolicious answers in bytes too.To make things easy for a developer everything inside is automatically decoded into Perl characters.This way you have parameters,captures and stash values all in Perl characters.You can run regexes on them,sort them,count their length etc.

When a developer bypasses the Mojolicious mechanism of decoding,like opens a file or opens a database connection,he has to be sure to decode octets to Perl characters (i.e.,using ':utf8' for file handles,Mojo::ByteStream for strings,or database modules internal methods).

Below are some examples that show what is right and what is wrong to do in Mojolicious.

EXAMPLES

Mojo::JSON

decode accepts a sequence of octets (not characters!),calculates a correct encoding (UTF-8,UTF-16 etc) and returns a Perl arrayref or hashref with correctly decoded values.

use utf8;

# Wrong
$json->decode('{"foo":"ü"}');

# Right
$json->decode(b('{"foo":"ü"}')->encode('UTF-8'));
encode accepts a Perl structure with correctly decoded values and returns octets (not bytes) in UTF-8.
use utf8;

# Wrong
$json->encode(foo => 'ü') . 'ü';

# Right
b($json->encode(foo => 'ü'))->decode('UTF-8') . 'ü';

Mojo::DOM

parse accepts Perl characters OR octets, detecting and decoding them automatically.

use utf8;

# Right
$dom->parse(b('ü')->encode('UTF-8'));

# Right too
$dom->parse('ü');
And returns Perl characters and thus must be encoded before printing it to the console for example.
# Wrong
print $dom->at('a')->text;

# Right
print b($dom->at('a')->text)->encode('UTF-8');

ojo

When crawling a web site, don't forget to encode the result before printing it to console. Mojo::ByteStream's say method not only adds a newline character, but also automatically encodes the data.

# Wrong
perl -Mojo -e 'print g("mojolicio.us")->dom->at("title")->text'

# Right
perl -Mojo -e 'b(g("mojolicio.us")->dom->at("title")->text)->say'

SEE ALSO

http://www.slideshare.net/Penfold/perl-and-unicode


1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。、资源 5来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。、资 5源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。
1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看REaDME.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看README.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值