java转移 u0019_变量-Java为什么在其标识符中允许使用控制字符?

在精确地探究Java标识符中允许使用哪些字符时,我偶然发现了一件非常奇怪的事情,以至于几乎可以肯定这是一个错误。

我希望发现Java标识符符合以下要求:它们以具有Unicode属性Current_Symbol的字符开头,然后是具有ID_Continue的字符,但下划线和美元符号除外。 事实并非如此,我发现与我所听说的那种普通标识符或其他任何想法都存在极大差异。

简短演示

请考虑以下演示,证明Java标识符中允许使用ASCII ESC字符(八进制033):

$ perl -le 'print qq(public class escape { public static void main(String argv[]) { String var_\033 = "i am escape: \033"; System.out.println(var_\033); }})' > escape.java

$ javac escape.java

$ java escape | cat -v

i am escape: ^[

不过,甚至比那更糟。 实际上,几乎无限恶化。 甚至允许NULL! 还有成千上万个甚至不是标识符字符的代码点。 我已经在Solaris,Linux和运行Darwin的Mac上进行了测试,所有结果均相同。

长演示

这是一个测试程序,它将显示Java完全允许作为合法标识符名称的一部分的所有这些意外代码点。

#!/usr/bin/env perl

#

# test-java-idchars - find which bogus code points Java allows in its identifiers

#

# usage: test-java-idchars [low high]

# e.g.: test-java-idchars 0 255

#

# Without arguments, tests Unicode code points

# from 0 .. 0x1000. You may go further with a

# higher explicit argument.

#

# Produces a report at the end.

#

# You can ^C it prematurely to end the program then

# and get a report of its progress up to that point.

#

# Tom Christiansen

# tchrist@perl.com

# Sat Jan 29 10:41:09 MST 2011

use strict;

use warnings;

use encoding "Latin1";

use open IO => ":utf8";

use charnames ();

$| = 1;

my @legal;

my ($start, $stop) = (0, 0x1000);

if (@ARGV != 0) {

if (@ARGV == 1) {

for (($stop) = @ARGV) {

$_ = oct if /^0/; # support 0OCTAL, 0xHEX, 0bBINARY

}

}

elsif (@ARGV == 2) {

for (($start, $stop) = @ARGV) {

$_ = oct if /^0/;

}

}

else {

die "usage: $0 [ [start] stop ]\n";

}

}

for my $cp ( $start .. $stop ) {

my $char = chr($cp);

next if $char =~ /[\s\w]/;

my $type = "?";

for ($char) {

$type = "Letter" if /\pL/;

$type = "Mark" if /\pM/;

$type = "Number" if /\pN/;

$type = "Punctuation" if /\pP/;

$type = "Symbol" if /\pS/;

$type = "Separator" if /\pZ/;

$type = "Control" if /\pC/;

}

my $name = $cp ? (charnames::viacode($cp) || "") : "NULL";

next if $name eq "" && $cp > 0xFF;

my $msg = sprintf("U+%04X %s", $cp, $name);

print "testing \\p{$type} $msg...";

open(TESTPROGRAM, ">:utf8", "testchar.java") || die $!;

print TESTPROGRAM <

public class testchar {

public static void main(String argv[]) {

String var_$char = "variable name ends in $msg";

System.out.println(var_$char);

}

}

End_of_Java_Program

close(TESTPROGRAM) || die $!;

system q{

( javac -encoding UTF-8 testchar.java \

&& \

java -Dfile.encoding=UTF-8 testchar | grep variable \

) >/dev/null 2>&1

};

push @legal, sprintf("U+%04X", $cp) if $? == 0;

if ($? && $? < 128) {

print "\n";

exit; # from a ^C

}

printf "is %s in Java identifiers.\n",

($? == 0) ? uc "legal" : "forbidden";

}

END {

print "Legal but evil code points: @legal\n";

}

这是仅在前三个代码点(既不是空格也不是标识符字符)上运行该程序的示例:

$ perl test-java-idchars 0 0x20

testing \p{Control} U+0000 NULL...is LEGAL in Java identifiers.

testing \p{Control} U+0001 START OF HEADING...is LEGAL in Java identifiers.

testing \p{Control} U+0002 START OF TEXT...is LEGAL in Java identifiers.

testing \p{Control} U+0003 END OF TEXT...is LEGAL in Java identifiers.

testing \p{Control} U+0004 END OF TRANSMISSION...is LEGAL in Java identifiers.

testing \p{Control} U+0005 ENQUIRY...is LEGAL in Java identifiers.

testing \p{Control} U+0006 ACKNOWLEDGE...is LEGAL in Java identifiers.

testing \p{Control} U+0007 BELL...is LEGAL in Java identifiers.

testing \p{Control} U+0008 BACKSPACE...is LEGAL in Java identifiers.

testing \p{Control} U+000B LINE TABULATION...is forbidden in Java identifiers.

testing \p{Control} U+000E SHIFT OUT...is LEGAL in Java identifiers.

testing \p{Control} U+000F SHIFT IN...is LEGAL in Java identifiers.

testing \p{Control} U+0010 DATA LINK ESCAPE...is LEGAL in Java identifiers.

testing \p{Control} U+0011 DEVICE CONTROL ONE...is LEGAL in Java identifiers.

testing \p{Control} U+0012 DEVICE CONTROL TWO...is LEGAL in Java identifiers.

testing \p{Control} U+0013 DEVICE CONTROL THREE...is LEGAL in Java identifiers.

testing \p{Control} U+0014 DEVICE CONTROL FOUR...is LEGAL in Java identifiers.

testing \p{Control} U+0015 NEGATIVE ACKNOWLEDGE...is LEGAL in Java identifiers.

testing \p{Control} U+0016 SYNCHRONOUS IDLE...is LEGAL in Java identifiers.

testing \p{Control} U+0017 END OF TRANSMISSION BLOCK...is LEGAL in Java identifiers.

testing \p{Control} U+0018 CANCEL...is LEGAL in Java identifiers.

testing \p{Control} U+0019 END OF MEDIUM...is LEGAL in Java identifiers.

testing \p{Control} U+001A SUBSTITUTE...is LEGAL in Java identifiers.

testing \p{Control} U+001B ESCAPE...is LEGAL in Java identifiers.

testing \p{Control} U+001C INFORMATION SEPARATOR FOUR...is forbidden in Java identifiers.

testing \p{Control} U+001D INFORMATION SEPARATOR THREE...is forbidden in Java identifiers.

testing \p{Control} U+001E INFORMATION SEPARATOR TWO...is forbidden in Java identifiers.

testing \p{Control} U+001F INFORMATION SEPARATOR ONE...is forbidden in Java identifiers.

Legal but evil code points: U+0000 U+0001 U+0002 U+0003 U+0004 U+0005 U+0006 U+0007 U+0008 U+000E U+000F U+0010 U+0011 U+0012 U+0013 U+0014 U+0015 U+0016 U+0017 U+0018 U+0019 U+001A U+001B

这是另一个演示:

$ perl test-java-idchars 0x600 0x700 | grep -i legal

testing \p{Control} U+0600 ARABIC NUMBER SIGN...is LEGAL in Java identifiers.

testing \p{Control} U+0601 ARABIC SIGN SANAH...is LEGAL in Java identifiers.

testing \p{Control} U+0602 ARABIC FOOTNOTE MARKER...is LEGAL in Java identifiers.

testing \p{Control} U+0603 ARABIC SIGN SAFHA...is LEGAL in Java identifiers.

testing \p{Control} U+06DD ARABIC END OF AYAH...is LEGAL in Java identifiers.

Legal but evil code points: U+0600 U+0601 U+0602 U+0603 U+06DD

问题

谁能解释这个看似疯狂的行为? 从U + 0000开始,到处都有很多很多其他莫名其妙允许的代码点,这也许是最奇怪的。 如果在第一个0x1000代码点上运行它,您会看到确实出现了某些模式,例如允许使用属性Current_Symbol允许任何代码点和所有代码点。但是,至少对于我来说,还有太多其他情况是完全无法解释的。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值