java 转 utp-8,utf8和不同的utp8有何不同?

Do these three versions all behave differently?

use open qw( :encoding(UTF-8) :std );

use open qw( :encoding(UTF8) :std );

use open qw( :utf8 :std );

解决方案

Firstly, :utf8 only markes the text as UTF-8 it does not check that it is valid. See this post on PerlMonks for information.

:encoding is an Extension Layer to PerlIO, perl perldoc perliol

":encoding" use Encoding;

makes this layer available, although PerlIO.pm "knows" where to find it. It is an example of a layer which takes an argument as it is called thus: open( $fh, "<:encoding>

The other two questions are answered in the FAQ perldoc perlunifaq

What is the difference between ":encoding" and ":utf8"? Because UTF-8 is one of Perl's internal formats, you can often just skip the encoding or decoding step, and manipulate the UTF8 flag directly. Instead of ":encoding(UTF-8)", you can simply use ":utf8", which skips the encoding step if the data was already represented as UTF8 internally. This is widely accepted as good behavior when you're writing, but it can be dangerous when reading, because it causes internal inconsistency when you have invalid byte sequences. Using ":utf8" for input can sometimes result in security breaches, so please use ":encoding(UTF-8)" instead. Instead of "decode" and "encode", you could use "_utf8_on" and "_utf8_off", but this is considered bad style. Especially "_utf8_on" can be dangerous, for the same reason that ":utf8" can. There are some shortcuts for oneliners; see "-C" in perlrun.

What's the difference between "UTF-8" and "utf8"? "UTF-8" is the official standard. "utf8" is Perl's way of being liberal in what it accepts. If you have to communicate with things that aren't so liberal, you may want to consider using "UTF-8". If you have to communicate with things that are too liberal, you may have to use "utf8". The full explanation is in Encode. "UTF-8" is internally known as "utf-8-strict". The tutorial uses UTF-8 consistently, even where utf8 is actually used internally, because the distinction can be hard to make, and is mostly irrelevant. For example, utf8 can be used for code points that don't exist in Unicode, like 9999999, but if you encode that to UTF-8, you get a substitution character (by default; see "Handling Malformed Data" in Encode for more ways of dealing with this.) Okay, if you insist: the "internal format" is utf8, not UTF-8. (When it's not some other encoding.)

The open pragma (ie., use open) only sets the default PerlIO layers for input and output; :std does the following,

The ":std" subpragma on its own has no effect, but if combined with the ":utf8" or ":encoding" subpragmas, it converts the standard filehandles (STDIN, STDOUT, STDERR) to comply with encoding selected for input/output handles. For example, if both input and out are chosen to be ":encoding(utf8)", a ":std" will mean that STDIN, STDOUT, and STDERR are also in ":encoding(utf8)". On the other hand, if only output is chosen to be in ":encoding(koi8r)", a ":std" will cause only the STDOUT and STDERR to be in "koi8r". The ":locale" subpragma implicitly turns on ":std".

So :std is a subpragma (open.pm specific) that sets the Standard Streams to receive Unicode Input perl :utf8 as above.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值