java to unicode,Java中的Unicode转义语法

"本文解释了Java中`u`和`(u)*`在提及非ASCII字符时的用途,包括UnicodeEscape规则,以及它们在将Unicode源代码转换为ASCII形式和回溯过程中的作用。重点在于理解多个`u`的含义和转换规则。"
摘要由CSDN通过智能技术生成

In Java, I learned that the following syntax can be used for mentioning Unicode characters that are not on the keyboard (eg. non-ASCII characters):

(\u)(u)*(HexDigit)(HexDigit)(HexDigit)(HexDigit)

My question is:

What is the purpose of (u)* in the above syntax?

One use case that I understood which represents Yen symbol in Java is:

char ch = '\u00A5';

解决方案

UnicodeEscape:

\ UnicodeMarker HexDigit HexDigit HexDigit HexDigit

UnicodeMarker:

u

UnicodeMarker u

which translates to \\u+\p{XDigit}{4}

and

If an eligible \ is followed by u, or more than one u, and the last u is not followed by four hexadecimal digits, then a compile-time error occurs.

So you're right, there can be one or more u after the backslash. The reason is given further down:

The Java programming language specifies a standard way of transforming a program written in Unicode into ASCII that changes a program into a form that can be processed by ASCII-based tools. The transformation involves converting any Unicode escapes in the source text of the program to ASCII by adding an extra u - for example, \uxxxx becomes \uuxxxx - while simultaneously converting non-ASCII characters in the source text to Unicode escapes containing a single u each.

This transformed version is equally acceptable to a Java compiler and represents the exact same program. The exact Unicode source can later be restored from this ASCII form by converting each escape sequence where multiple u's are present to a sequence of Unicode characters with one fewer u, while simultaneously converting each escape sequence with a single u to the corresponding single Unicode character.

So this input

\u0020ä

becomes

\uu0020\u00e4

The first uu means here "this was a unicode escape sequence to begin with" while the second u says "An automatic tool converted a non-ASCII character to a unicode escape."

This information is useful when you want to convert back from ASCII to unicode: You can restore as much of the original code as possible.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值