java trim() strip()_Java 11中的String trim()和strip()方法之间的区别

简而言之:strip()是trim()的“Unicode-aware”演变.

Problem

String::trim has existed from early days of Java when Unicode

had not fully evolved to the standard we widely use today.

The definition of space used by String::trim is any code point less

than or equal to the space code point (\u0020), commonly referred to

as ASCII or ISO control characters.

Unicode-aware trimming routines should use

Character::isWhitespace(int).

Additionally, developers have not been able to specifically remove

indentation white space or to specifically remove trailing white

space.

Solution

Introduce trimming methods that are Unicode white space aware

and provide additional control of leading only or trailing only.

这些新方法的一个共同特征是它们使用与旧方法(如String.trim())不同(更新)的“空格”定义.错误JDK-8200373.

The current JavaDoc for String::trim does not make it clear which

definition of “space” is being used in the code. With additional

trimming methods coming in the near future that use a different

definition of space, clarification is imperative. String::trim uses

the definition of space as any codepoint that is less than or equal to

the space character codepoint (\u0020.) Newer trimming methods will

use the definition of (white) space as any codepoint that returns true

when passed to the Character::isWhitespace predicate.

方法isWhitespace(char)被添加到带有JDK 1.1的Character中,但是方法isWhitespace(int)在JDK 1.5之前没有被引入到Character类中.后一种方法(接受int类型参数的方法)被添加以支持增补字符. Character类的Javadoc注释定义了补充字符(通常使用基于int的“代码点”建模)与BMP字符(通常使用单个字符建模):

The set of characters from U+0000 to U+FFFF is sometimes referred to

as the Basic Multilingual Plane (BMP). Characters whose code points

are greater than U+FFFF are called supplementary characters. The Java

platform uses the UTF-16 representation in char arrays and in the

String and StringBuffer classes. In this representation, supplementary

characters are represented as a pair of char values … A char value,

therefore, represents Basic Multilingual Plane (BMP) code points,

including the surrogate code points, or code units of the UTF-16

encoding. An int value represents all Unicode code points, including

supplementary code points. … The methods that only accept a char

value cannot support supplementary characters. … The methods that

accept an int value support all Unicode characters, including

supplementary characters.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值