Java 14 发布!不使用"class"也能定义类了?还顺手要干掉Lombok!

2020 年 3 月 17 日,JDK/Java 14 正式 GA(General Available)。

JDK14特性一览:

  • JEP 305: Pattern Matching for instanceof (Preview)

  • JEP 358: Helpful NullPointerExceptions

  • JEP 361: Switch Expressions (Standard)

  • JEP 345: NUMA-Aware Memory Allocation for G1

  • JEP 349: JFR Event Streaming

  • JEP 366: Deprecate the ParallelScavenge + SerialOld GC Combination

  • JEP 363: Remove the CMS Garbage Collector

  • JEP 364: ZGC on macOS

  • JEP 368: Text Blocks (Second Preview)


此版本包含的 JEP(Java/JDK Enhancement Proposals,JDK 增强提案)比 Java 12 和 13 加起来的还要多。总共 16 个新特性如下:

305:Pattern Matching for instanceof (Preview)

为 instanceof 运算符引入模式匹配(预览阶段)

通过模式匹配,开发者可以用更简洁和更安全的方式来表达通用的程序逻辑。instanceof 运算符的模式匹配支持从对象中按条件来提取组件,此语言特性目前处于预览阶段。

343:Packaging Tool (Incubator)

打包工具(孵化阶段)

创建一个用于打包独立 Java 应用程序的工具。

345:NUMA-Aware Memory Allocation for G1

针对 G1 的 NUMA-Aware 内存分配

通过实现 NUMA-aware 内存分配,提升 G1 在大型机器上的性能。

349:JFR Event Streaming

JFR 事件流

暴露 JDK Flight Recorder 数据以进行连续监视。

352:Non-Volatile Mapped Byte Buffers

非易失性映射的字节缓冲

非易失性映射的字节缓冲将添加新的 JDK 特定文件映射模式,该模式允许 FileChannel API 用于创建引用非易失性内存(NVM)的 MappedByteBuffer 实例。

358:Helpful NullPointerExceptions

改进 NullPointerExceptions,通过准确描述哪些变量为 null 来提高 JVM 生成的异常的可用性。该提案的作者希望为开发人员和支持人员提供有关程序为何异常终止的有用信息,并通过更清楚地将动态异常与静态程序代码相关联来提高对程序的理解。https://openjdk.java.net/jeps/358

359:Records (Preview)

Records 提供了一种紧凑的语法来声明类,以帮助开发者写出更简洁的代码,这些类是浅层不可变数据(shallowly immutable data)的透明拥有者。该特性主要用在特定领域的类,这些类主要用于保存数据,不提供领域行为。

361:Switch Expressions (Standard)

Switch Expressions 在 JDK 12 与 13 中都是预览状态,现在在 JDK 14 中已成为稳定特性。switch 表达式扩展了 switch 语句,使其不仅可以作为语句(statement),还可以作为表达式(expression),并且两种写法都可以使用传统的 switch 语法,或者使用简化的“case L ->”模式匹配语法作用于不同范围并控制执行流。这些更改将简化日常编码工作,并为 switch 中的模式匹配做好准备。

362:Deprecate the Solaris and SPARC Ports

弃用 Solaris 和 SPARC 端口

弃用 Solaris/SPARC, Solaris/x64 和 Linux/SPARC 端口,以便在未来的版本进行移除。

363:Remove the Concurrent Mark Sweep (CMS) Garbage Collector

移除 CMS(Concurrent Mark-Sweep) 垃圾回收器。

364:ZGC on macOS

将 ZGC 垃圾回收器移植到 macOS。

365:ZGC on Windows

将 ZGC 垃圾回收器移植到 Windows。

366:Deprecate the ParallelScavenge + SerialOld GC Combination

弃用 ParallelScavenge + SerialOld GC 的垃圾回收算法组合。

367:Remove the Pack200 Tools and API

移除 Pack200 工具和 API

删除 java.util.jar 包中的 pack200 和 unpack200 工具以及 Pack200 API。

368:Text Blocks (Second Preview)

文本块特性(Text Blocks)与常见的 Python"any input"特性一样,它支持多行字符串文字,可以不需要使用大多数转义序列,并以一种可预测的方式自动设置字符串格式,同时可以让开发人员控制格式。虽然这不是特别复杂的特性,但对于开发中想将 HTML 代码引入 Java 来说是极大的便利,代码可读性也极大提高。目前处于第二个预览阶段。

370:Foreign-Memory Access API (Incubator)

外部内存访问 API(孵化阶段)

新增一个 API 以允许 Java 程序安全有效地访问 Java 堆之外的外部内存。

http://openjdk.java.net/projects/jdk/14/

JEP 305: Pattern Matching for instanceof 

很明显这个特性跟使用instanceof有关。平常我们写代码是这样的。很明显这不是最优的方式,怎么看怎么别捏, 代码显得有点冗余乏味,我们既要类型判断,还要类型强制转换:

if (obj instanceof String) {
 String s = (String) obj;
 // use s
}

那么新的方式是怎么样的呢?请往下看。厉不厉害,牛不牛逼:

if (obj instanceof String s) {
 //todo can use s here
} else {
 //todo can't use s here
}

而且还能用的更复杂一些,需要注意的是,下面这种写法时,必须是&&,而不能是||,为什么有这个限制,我想很容易理解吧:

if (obj instanceof String s && s.contains("afei")) {
 ... ...
}

JEP 358: Helpful NullPointerExceptions

这个特性有点意思,绝对非常有用。想象我们有一行这样的代码,并且在这里抛出了空指针,那么,我们没办法知道空指针是由于a引起的,还是a.b引起的,还是a.b.c引起的:

int index = a.b.c.i ;

所以,我们可能要将代码改造成这样,这样才能在代码抛出NPE时更容易定位问题:

if (a!=null){
 if (a.b!=null){
 if (a.b.c!=null){
 int index = a.b.c.i ;
 }
 }
}

JEP358这个特性就是帮我们解决这个问题的。假设我们的代码还是这样写的:int index = a.b.c.i ,并且由于a.b为null引起的空指针,那么抛出的异常信息是这样的,这个异常就非常友好了吧:

Exception in thread "main" java.lang.NullPointerException:
 Cannot read field "c" because "a.b" is null
 at Prog.main(Prog.java:5)

数组方式也是一样的,假设有一行这样的代码:int height = a[i][j][k],并且由于a[i][j]为空导致的NPE,那么异常信息是这样的:

Exception in thread "main" java.lang.NullPointerException:
 Cannot load from object array because "a[i][j]" is null
 at Prog.main(Prog.java:5)

JEP 361: Switch Expressions (Standard)

这个特性也是继承自JDK13的JEP 354: Switch Expressions (Preview),有一段switch老语法代码如下:

switch (day) {
 case MONDAY:
 case FRIDAY:
 case SUNDAY:
 System.out.println(6);
 break;
 case TUESDAY:
 System.out.println(7);
 break;
 case THURSDAY:
 case SATURDAY:
 System.out.println(8);
 break;
 case WEDNESDAY:
 System.out.println(9);
 break;
}

这段代码显得有点冗余,新的语法代码如下,很明显简练很多:

switch (day) {
 case MONDAY, FRIDAY, SUNDAY -> System.out.println(6);
 case TUESDAY -> System.out.println(7);
 case THURSDAY, SATURDAY -> System.out.println(8);
 case WEDNESDAY -> System.out.println(9);
}

而且,新的switch语法能直接将其作为表达式,用法如下:

int numLetters = switch (day) {
 case MONDAY, FRIDAY, SUNDAY -> 6;
 case TUESDAY -> 7;
 case THURSDAY, SATURDAY -> 8;
 case WEDNESDAY -> 9;
};

新的switch语法相比以前灵活了很多很多!

JEP 345: NUMA-Aware Memory Allocation for G1

了解这个特性之前,我们需要了解什么是NUMA。NUMA就是非统一内存访问架构(英语:non-uniform memory access,简称NUMA),是一种为多处理器的电脑设计的内存架构,内存访问时间取决于内存相对于处理器的位置。在NUMA下,处理器访问它自己的本地内存的速度比非本地内存(内存位于另一个处理器,或者是处理器之间共享的内存)快一些。如下图所示,Node0中的CPU如果访问Node0中的内存,那就是访问本地内存,如果它访问了Node1中的内存,那就是远程访问,性能较差:

NUMA架构

非统一内存访问架构的特点是:被共享的内存物理上是分布式的,所有这些内存的集合就是全局地址空间。所以处理器访问这些内存的时间是不一样的,显然访问本地内存的速度要比访问全局共享内存或远程访问外地内存要快些。另外,NUMA中内存可能是分层的:本地内存,群内共享内存,全局共享内存。

JEP345希望通过实现NUMA-aware的内存分配,改进G1在大型机上的性能!现代的multi-socket服务器越来越多都有NUMA,意思是,内存到每个socket的距离是不相等的,内存到不同的socket之间的访问是有性能差异的,这个距离越长,延迟就会越大,性能就会越差!(https://openjdk.java.net/jeps/345)。只需要设置JVM参数:+XX:+UseNUMA 后, 当JVM初始化的时候(即Java应用启动的时候),G1的Region集合就会被均匀的分散到所有有效的NUMA节点上。

JEP 349: JFR Event Streaming

Java为了更方便的了解运行的JVM情况,在之前的版本中提供了JFR特性,即JDK Flight Recorder。但是使用不太灵活。虽然JVM通过JFR暴露了超过500项数据,但是其中大部分数据只能通过解析JFR日志文件才能获取得到,而不是实时获取。用户想要使用JFR的数据的话,用户必须先开启JFR进行记录,然后停止记录,再将飞行记录的数据Dump到磁盘上,然后解析这个记录文件。

// 下面这条命令会立即启动JFR并开始使用templayte.jfc的配置收集60s的JVM信息,并输出到output.jfr中。
// 一旦记录完成之后,就可以复制jfr文件到你的工作环境使用jmc GUI来分析。
// 它几乎包含了排查JVM问题需要的所有信息,包括堆dump时的异常信息等。
jcmd <PID> JFR.start name=test duration=60s settings=template.jfc filename=output.jfr

这样对于应用程序分析很有效,但是对于实时监控却并不友好,因为无法将JFR采集的信息实时动态展示到仪表板上。JEP349特性能够通过异步订阅的方式直接获取JFR记录的数据,而不需要分析Dump文件。如下这段代码所示:

try (var rs = new RecordingStream()) {
 rs.enable("jdk.CPULoad").withPeriod(Duration.ofSeconds(1));
 rs.enable("jdk.JavaMonitorEnter").withThreshold(Duration.ofMillis(10));
 rs.onEvent("jdk.CPULoad", event -> {
 System.out.println(event.getFloat("machineTotal"));
 });
 rs.onEvent("jdk.JavaMonitorEnter", event -> {
 System.out.println(event.getClass("monitorClass"));
 });
 rs.start();
}

JEP 366: Deprecate the ParallelScavenge + SerialOld GC Combination

ParallelScavenge + SerialOld GC的GC组合要被标记为Deprecate了,也就意味着,在接下来的某个JDK版本中,会彻底不兼容这种GC组合。

JDK官方给出将这个GC组合标记为Deprecate的理由是:这个GC组合需要大量的代码维护工作,并且,这个GC组合很少被使用。因为它的使用场景应该是一个很大的Young区配合一个很小的Old区,这样的话,Old区用SerialOldGC去收集时停顿时间我们才能勉强接受。事实上,这种场景很少使用,而且风险即可。总而言之,老年代能用UseParallelOldGC ,还需要什么SerialOldGC,是吧!

JEP 363: Remove the CMS Garbage Collector

该来的总会来,自从G1横空出世后,CMS在JDK9中就被标记为Deprecate了(JEP 291: Deprecate the Concurrent Mark Sweep (CMS) Garbage Collector),那么CMS被彻底移除也就是一个时间问题了。

基于Region分代是大势所趋,CMS的设计还是落后了一点,而且它的碎片化问题,给你的JVM实例就像埋了一颗炸弹。说不定哪次就在你的业务高峰期来一次FGC,这可是采用Mark—Sweep-Compact算法的SerialOldGC回收,JVM中性能最差的垃圾回收方式,停顿个几秒钟,上十秒都有可能。

当然,如果你JDK14中你还是配置的CMS(-XX:+UseConcMarkSweepGC),JVM不会报错,只是给出一个告警信息,JVM会自动回退以默认GC方式启动JVM:

Java HotSpot(TM) 64-Bit Server VM warning: Ignoring option UseConcMarkSweepGC; \
support was removed in <version>
and the VM will continue execution using the default collector.

EP 364: ZGC on macOS

很简单,就是在macOS上支持ZGC,没什么太多需要说明的。

JEP 368: Text Blocks (Second Preview)

这个特性对应JDK13的JEP 355: Text Blocks (Preview),只不过这是Second Preview而已,所以,笔者只简单解决一下这个新的语法。

如果有一段SQL,老的语法是这样写的:

String query = "SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB`\n" +
 "WHERE `CITY` = 'INDIANAPOLIS'\n" +
 "ORDER BY `EMP_ID`, `LAST_NAME`;\n";

新的语法是这样写的:

String query = """
 SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB`
 WHERE `CITY` = 'INDIANAPOLIS'
 ORDER BY `EMP_ID`, `LAST_NAME`;
 """;

如果有一段脚本需要执行,老的语法是这样的:

ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
Object obj = engine.eval("function hello() {\n" +
 " print('\"Hello, world\"');\n" +
 "}\n" +
 "\n" +
 "hello();\n");

而新的语法是这样的:

ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
Object obj = engine.eval("""
 function hello() {
 print('"Hello, world"');
 }


 hello();
 """);

作者:HollisChuang 来源:掘金 链接:https://juejin.im/post/5e7421b8518825497468178a

2020年3月17日发布,Java正式发布了JDK 14 ,目前已经可以开放下载。在JDK 14中,共有16个新特性,下面介绍其中的一个特性:JEP 359: Records

官方吐槽最为致命

早在2019年2月份,Java 语言架构师 Brian Goetz,曾经写过一篇文章(cr.openjdk.java.net/~briangoetz… ),详尽的说明了并吐槽了Java语言,他和很多程序员一样抱怨“Java太啰嗦”或有太多的“繁文缛节”,他提到:开发人员想要创建纯数据载体类(plain data carriers)通常都必须编写大量低价值、重复的、容易出错的代码。如:构造函数、getter/setter、equals()、hashCode()以及toString()等。

以至于很多人选择使用IDE的功能来自动生成这些代码。还有一些开发会选择使用一些第三方类库,如Lombok等来生成这些方法,从而会导致了令人吃惊的表现(surprising behavior)和糟糕的可调试性(poor debuggability)。

那么,Brian Goetz 大神提到的纯数据载体到底指的是什么呢。他举了一个简单的例子:

final class Point {
    public final int x;
    public final int y;


    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }


    // state-based implementations of equals, hashCode, toString
    // nothing else
}

这里面的Piont其实就是一个纯数据载体,他表示一个"点"中包含x坐标和y坐标,并且只提供了构造函数,以及一些equals、hashCode等方法。

于是,BrianGoetz大神提出一种想法,他提到,Java完全可以对于这种纯数据载体通过另外一种方式表示。

其实在其他的面向对象语言中,早就针对这种纯数据载体有单独的定义了,如Scala中的case、Kotlin中的data以及C#中的record。这些定义,尽管在语义上有所不同,但是它们的共同点是类的部分或全部状态可以直接在类头中描述,并且这个类中只包含了纯数据而已。

于是,他提出Java中是不是也可以通过如下方式定义一个纯数据载体呢?

record Point(int x, int y) { }

神说要用record,于是就有了

就像大神吐槽的那样,我们通常需要编写大量代码才能使类变得有用。如以下内容:

  • toString()方法

  • hashCode() and equals()方法

  • Getter 方法

  • 一个共有的构造函数

对于这种简单的类,这些方法通常是无聊的、重复的,而且是可以很容易地机械地生成的那种东西(ide通常提供这种功能)。

当你阅读别人的代码时,可能会更加头大。例如,别人可能使用IDE生成的hashCode()和equals()来处理类的所有字段,但是如何才能在不检查实现的每一行的情况下确定他写的对呢?如果在重构过程中添加了字段而没有重新生成方法,会发生什么情况呢?

大神Brian Goetz提出了使用record定义一个纯数据载体的想法,于是,Java 14 中便包含了一个新特性:EP 359: Records ,作者正是 Brian Goetz

Records的目标是扩展Java语言语法,Records为声明类提供了一种紧凑的语法,用于创建一种类中是“字段,只是字段,除了字段什么都没有”的类。通过对类做这样的声明,编译器可以通过自动创建所有方法并让所有字段参与hashCode()等方法。这是JDK 14中的一个预览特性。

一言不合反编译

Records的用法比较简单,和定义Java类一样:

record Person (String firstName, String lastName) {}

如上,我们定义了一个Person记录,其中包含两个组件:firstName和lastName,以及一个空的类体。

那么,这个东西看上去也是个语法糖,那他到底是怎么实现的那?

我们先尝试对他进行编译,记得使用--enable-preview参数,因为records功能目前在JDK 14中还是一个预览(preview)功能。

> javac --enable-preview --release 14 Person.java
Note: Person.java uses preview language features.
Note: Recompile with -Xlint:preview for details.

如前所述,Record只是一个类,其目的是保存和公开数据。让我们看看用javap进行反编译,将会得到以下代码:

public final class Person extends java.lang.Record {
  private final String firstName;
  private final String lastName;
  public Person(java.lang.String, java.lang.String);
  public java.lang.String toString();
  public final int hashCode();
  public final boolean equals(java.lang.Object);
  public java.lang.String firstName();
  public java.lang.String lastName();
 }

通过反编译得到的类,我们可以得到以下信息:

1、生成了一个final类型的Person类(class),说明这个类不能再有子类了。

2、这个类继承了java.lang.Record类,这个我们使用enum创建出来的枚举都默认继承java.lang.Enum有点类似

3、类中有两个private final 类型的属性。所以,record定义的类中的属性都应该是private final类型的。

4、有一个public的构造函数,入参就是两个主要的属性。如果通过字节码查看其方法体的话,其内容就是以下代码,你一定很熟悉:

public Person(String firstName, String lastName) {
    this.firstName = firstName;
    this.lastName = lastName;
}

5、有两个getter方法,分别叫做firstName和lastName。这和JavaBean中定义的命名方式有区别,或许大神想通过这种方式告诉我们record定义出来的并不是一个JavaBean吧。

6、还帮我们自动生成了toString(), hashCode() 和 equals()方法。值得一提的是,这三个方法依赖invokedynamic来动态调用包含隐式实现的适当方法。

还可以这样玩

前面的例子中,我们简单的创建了一个record,那么,record中还能有其他的成员变量和方法吗?我们来看下。

1、我们不能将实例字段添加到record中。但是,我们可以添加静态字段。

record Person (String firstName, String lastName) {
    static int x;
}

2、我们可以定义静态方法和实例方法,可以操作对象的状态。

record Person (String firstName, String lastName) {
    static int x;


    public static void doX(){
        x++;
    }


    public String getFullName(){
        return firstName + " " + lastName;
    }
}

3、我们还可以添加构造函数。

record Person (String firstName, String lastName) {
    static int x;


    public Person{
        if(firstName == null){
            throw new IllegalArgumentException( "firstName can not be null !");
        }
    }


    public Person(String fullName){
        this(fullName.split(" ")[0],this(fullName.split(" ")[1])
    }
}

所以,我们是可以在record中添加静态字段/方法的,但是问题是,我们应该这么做吗?

请记住,record推出背后的目标是使开发人员能够将相关字段作为单个不可变数据项组合在一起,而不需要编写冗长的代码。这意味着,每当您想要向您的记录添加更多的字段/方法时,请考虑是否应该使用完整的类来代替它。

总结

record 解决了使用类作为数据包装器的一个常见问题。纯数据类从几行代码显著地简化为一行代码。

但是,record目前是一种预览语言特性,这意味着,尽管它已经完全实现,但在JDK中还没有标准化。

那么问题来了,如果你用上了Java 14之后,你还会使用Lombok吗?哦不,你可能短时间内都用不上,因为你可能Java 8都还没用熟~

最后,还是要给大家安利一下 Kotlin : 

Kotlin 是一个基于 JVM 的新的编程语言,由 JetBrains 开发。

其主要设计目标:

创建一种兼容 Java 的语言

让它比 Java 更安全,能够静态检测常见的陷阱。如:引用空指针

让它比 Java 更简洁,通过支持 variable type inference,higher-order functions (closures),extension functions,mixins and first-class delegation 等实现。

让它比最成熟的竞争对手 Scala 语言更加简单。

https://github.com/JetBrains/kotlin

2020 年 JVM 生态报告:Kotlin 成为第二受欢迎的 JVM 语言

2020 年 JVM 生态报告已于近日发布,该报告由 Snyk 和 The Java Magazine(Oracle 的双月刊)联合推出,旨在了解 JDK 的实现、工具、平台和应用方面的前景。 

该调查于 2019 年下半年进行,有来自全球各个大洲的开发者、架构师和团队负责人等参与其中,最终生成的报告主要包含以下几方面:

  • 36% 的开发人员从 Oracle JDK 切换到了 OpenJDK

Oracle JDK 仍以 34% 的比例占据主导地位。但在上一年的报告中,这一数据为 70%。一年之内,有 36% 的开发人员从 Oracle JDK 切换到了 OpenJDK 发行版。

  • 仅有 9% 的参与者愿意为 JDK 支付费用 

这也许解释了大家从 Oracle JDK 转变到 OpenJDK 的原因。

  • JDK 9 的发布节奏变化影响了近半数人的付费决定

从 JDK 9 开始,每年的三月和九月都会发布一个新的 Java 版本,这是对 JDK 发布节奏的重大更改。它影响了许多用户的更新策略,因为 6 个月的发布节奏也影响了支持周期。此外,这一更改也对安全性产生了影响,因为安全修复程序没有被反向移植到旧版本。 调查显示,对于至少 41% 的受访者来说,新的节奏影响了他们支持付费的决定。 

  • 64% 的用户表示 Java 8 仍然是最常用的 Java SE 版本 

上一年的报告中,这一数据为 79%。随着 2018 年 9 月发布的第一个长期支持版本 Java 11,这种情况正在慢慢改变。参与调查的开发人员中有 1/4 现在正在生产中运行 Java 11。

  • Kotlin 超越 Scala 和 Clojure,成为 JVM 上第二大最受欢迎的语言

毋庸多说,大多数 JVM 用户(9/10)使用 Java 作为主要语言。今年,Kotlin 大受欢迎,从去年的 2.4% 使用率增长到了 5.5%。 

  • Spring 占据了 Java 生态系统的主导地位 

调查显示,6/10 的人依赖于 Spring Framework 来生成其应用程序。对于第三方开源框架而言,这是一个非常高的市场份额。Spring 已发展成为 Java 生态系统中最主要的框架。其中,Spring 5 的采用率约为 2/3。 

服务器端也被 Spring 主导,其中 Spring Boot 占据一半的市场份额,另有近 1/3 的市场采用 Spring MVC。 

  • IntelliJ IDEA 主导 IDE 市场 

目前,IntelliJ IDEA 是 JVM 社区中使用最广泛的 IDE,有 62% 的开发人员都在使用它。对大量现成功能的支持以及对 Kotlin 的原生支持,促使 IntelliJ IDEA 越来越受欢迎。排在第二的 EclipseIDE 从去年的 38% 下降到今年的 20%。Apache NetBeans 稳居第三,市场份额为 10%,与去年大致相同。

  • Maven 仍是最常用的构建工具


详情可查阅 JVM Ecosystem Report 2020 完整报告: https://snyk.io/blog/jvm-ecosystem-report-2020/

PYPL PopularitY of Programming Language

https://pypl.github.io/PYPL.html

Kotlin 1.4 即将在 2020 年春季发布.


英文阅读能力提升:

JEP 361: Switch Expressions (Standard)

AuthorGavin Bierman
OwnerJan Lahoda
TypeFeature
ScopeSE
StatusClosed / Delivered
Release14
Componenttools / javac
Discussionamber dash dev at openjdk dot java dot net
EffortS
DurationM
Relates toJEP 354: Switch Expressions (Preview)
Reviewed byAlex Buckley, Brian Goetz
Endorsed byBrian Goetz
Created2019/09/04 06:35
Updated2020/02/24 20:57
Issue8230539

Summary

Extend switch so it can be used as either a statement or an expression, and so that both forms can use either traditional case ... : labels (with fall through) or new case ... -> labels (with no fall through), with a further new statement for yielding a value from a switch expression. These changes will simplify everyday coding, and prepare the way for the use of pattern matching in switch. This was a preview language feature in JDK 12 and JDK 13.

History

Switch expressions were proposed in December 2017 by JEP 325. JEP 325 was targeted to JDK 12 in August 2018 as a preview feature. One aspect of JEP 325 was the overloading of the break statement to return a result value from a switch expression. Feedback on JDK 12 suggested that this use of break was confusing. In response to the feedback, JEP 354 was created as an evolution of JEP 325. JEP 354 proposed a new statement, yield, and restored the original meaning of break. JEP 354 was targeted to JDK 13 in June 2019 as a preview feature for further review. Feedback on JDK 13 suggests that this feature is now ready to be made final and permanent in JDK 14.

Motivation

As we prepare to enhance the Java programming language to support pattern matching (JEP 305), several irregularities of the existing switch statement -- which have long been an irritation to users -- become impediments. These include the default control flow behavior between switch labels (fall through), the default scoping in switch blocks (the whole block is treated as one scope), and the fact that switch works only as a statement, even though it is often more natural to express multi-way conditionals as expressions.

The current design of Java's switch statement follows closely languages such as C and C++, and supports fall through semantics by default. Whilst this traditional control flow is often useful for writing low-level code (such as parsers for binary encodings), as switch is used in higher-level contexts, its error-prone nature starts to outweigh its flexibility. For example, in the following code, the many break statements make it unnecessarily verbose, and this visual noise often masks hard to debug errors, where missing break statements would mean accidental fall through.

switch (day) {
    case MONDAY:
    case FRIDAY:
    case SUNDAY:
        System.out.println(6);
        break;
    case TUESDAY:
        System.out.println(7);
        break;
    case THURSDAY:
    case SATURDAY:
        System.out.println(8);
        break;
    case WEDNESDAY:
        System.out.println(9);
        break;
}

We propose to introduce a new form of switch label, "case L ->", to signify that only the code to the right of the label is to be executed if the label is matched. We also propose to allow multiple constants per case, separated by commas. The previous code can now be written:

switch (day) {
    case MONDAY, FRIDAY, SUNDAY -> System.out.println(6);
    case TUESDAY                -> System.out.println(7);
    case THURSDAY, SATURDAY     -> System.out.println(8);
    case WEDNESDAY              -> System.out.println(9);
}

The code to the right of a "case L ->" switch label is restricted to be an expression, a block, or (for convenience) a throw statement. This has the pleasing consequence that should an arm introduce a local variable, it must be contained in a block and is thus not in scope for any of the other arms in the switch block. This eliminates another annoyance with traditional switch blocks where the scope of a local variable is the entire block:

switch (day) {
    case MONDAY:
    case TUESDAY:
        int temp = ...     // The scope of 'temp' continues to the }
        break;
    case WEDNESDAY:
    case THURSDAY:
        int temp2 = ...    // Can't call this variable 'temp'
        break;
    default:
        int temp3 = ...    // Can't call this variable 'temp'
}

Many existing switch statements are essentially simulations of switch expressions, where each arm either assigns to a common target variable or returns a value:

int numLetters;
switch (day) {
    case MONDAY:
    case FRIDAY:
    case SUNDAY:
        numLetters = 6;
        break;
    case TUESDAY:
        numLetters = 7;
        break;
    case THURSDAY:
    case SATURDAY:
        numLetters = 8;
        break;
    case WEDNESDAY:
        numLetters = 9;
        break;
    default:
        throw new IllegalStateException("Wat: " + day);
}

Expressing this as a statement is roundabout, repetitive, and error-prone. The author meant to express that we should compute a value of numLetters for each day. It should be possible to say that directly, using a switch expression, which is both clearer and safer:

int numLetters = switch (day) {
    case MONDAY, FRIDAY, SUNDAY -> 6;
    case TUESDAY                -> 7;
    case THURSDAY, SATURDAY     -> 8;
    case WEDNESDAY              -> 9;
};

In turn, extending switch to support expressions raises some additional needs, such as extending flow analysis (an expression must always compute a value or complete abruptly), and allowing some case arms of a switch expression to throw an exception rather than yield a value.

Description

Arrow labels

In addition to traditional "case L :" labels in a switch block, we define a new simplified form, with "case L ->" labels. If a label is matched, then only the expression or statement to the right of the arrow is executed; there is no fall through. For example, given the following switch statement that uses the new form of labels:

static void howMany(int k) {
    switch (k) {
        case 1  -> System.out.println("one");
        case 2  -> System.out.println("two");
        default -> System.out.println("many");
    }
}

The following code:

howMany(1);
howMany(2);
howMany(3);

results in the following output:

one
two
many

Switch expressions

We extend the switch statement so it can be used as an expression. For example, the previous howMany method can be rewritten to use a switch expression, so it uses only a single println.

static void howMany(int k) {
    System.out.println(
        switch (k) {
            case  1 -> "one";
            case  2 -> "two";
            default -> "many";
        }
    );
}

In the common case, a switch expression will look like:

T result = switch (arg) {
    case L1 -> e1;
    case L2 -> e2;
    default -> e3;
};

switch expression is a poly expression; if the target type is known, this type is pushed down into each arm. The type of a switch expression is its target type, if known; if not, a standalone type is computed by combining the types of each case arm.

Yielding a value

Most switch expressions will have a single expression to the right of the "case L ->" switch label. In the event that a full block is needed, we introduce a new yield statement to yield a value, which becomes the value of the enclosing switch expression.

int j = switch (day) {
    case MONDAY  -> 0;
    case TUESDAY -> 1;
    default      -> {
        int k = day.toString().length();
        int result = f(k);
        yield result;
    }
};

switch expression can, like a switch statement, also use a traditional switch block with "case L:" switch labels (implying fall through semantics). In this case, values are yielded using the new yield statement:

int result = switch (s) {
    case "Foo": 
        yield 1;
    case "Bar":
        yield 2;
    default:
        System.out.println("Neither Foo nor Bar, hmmm...");
        yield 0;
};

The two statements, break (with or without a label) and yield, facilitate easy disambiguation between switch statements and switch expressions: a switch statement but not a switch expression can be the target of a break statement; and a switch expression but not a switch statement can be the target of a yield statement.

Rather than being a keyword, yield is a restricted identifier (like var), which means that classes named yield are illegal. If there is a unary method yield in scope, then the expression yield(x) would be ambiguous (could be either a method call, or a yield statement whose operand is a parenthesized expression), and this ambiguity is resolved in favor of the yield statement. If the method invocation is preferred then the method should be qualified, with this for an instance method or the class name for a static method.

Exhaustiveness

The cases of a switch expression must be exhaustive; for all possible values there must be a matching switch label. (Obviously switch statements are not required to be exhaustive.)

In practice this normally means that a default clause is required; however, in the case of an enum switch expression that covers all known constants, a default clause is inserted by the compiler to indicate that the enum definition has changed between compile-time and runtime. Relying on this implicit default clause insertion makes for more robust code; now when code is recompiled, the compiler checks that all cases are explicitly handled. Had the developer inserted an explicit default clause (as is the case today) a possible error will have been hidden.

Furthermore, a switch expression must either complete normally with a value, or complete abruptly by throwing an exception. This has a number of consequences. First, the compiler checks that for every switch label, if it is matched then a value can be yielded.

int i = switch (day) {
    case MONDAY -> {
        System.out.println("Monday"); 
        // ERROR! Block doesn't contain a yield statement
    }
    default -> 1;
};
i = switch (day) {
    case MONDAY, TUESDAY, WEDNESDAY: 
        yield 0;
    default: 
        System.out.println("Second half of the week");
        // ERROR! Group doesn't contain a yield statement
};

A further consequence is that the control statements, breakyieldreturn and continue, cannot jump through a switch expression, such as in the following:

z: 
    for (int i = 0; i < MAX_VALUE; ++i) {
        int k = switch (e) { 
            case 0:  
                yield 1;
            case 1:
                yield 2;
            default: 
                continue z; 
                // ERROR! Illegal jump through a switch expression 
        };
    ...
    }

Dependencies

This JEP evolved from JEP 325 and JEP 354. However, this JEP is standalone, and does not depend on those two JEPs.

Future support for pattern matching, beginning with JEP 305, will build on this JEP.

Risks and Assumptions

The need for a switch statement with case L -> labels is sometimes unclear. The following considerations supported its inclusion:

  • There are switch statements that operate by side-effects, but which are generally still "one action per label". Bringing these into the fold with new-style labels makes the statements more straightforward and less error-prone.

  • That the default control flow in a switch statement's block is to fall through, rather than to break out, was an unfortunate choice early in Java's history, and continues to be a matter of significant angst for developers. By addressing this for the switch construct in general, not just for switch expressions, the impact of this choice is reduced.

  • By teasing the desired benefits (expression-ness, better control flow, saner scoping) into orthogonal features, switch expressions and switch statements could have more in common. The greater the divergence between switch expressions and switch statements, the more complex the language is to learn, and the more sharp edges there are for developers to cut themselves on.

https://openjdk.java.net/jeps/361

JEP 368: Text Blocks (Second Preview)

OwnerJim Laskey
TypeFeature
ScopeSE
StatusClosed / Delivered
Release14
Componentspecification / language
Discussionamber dash dev at openjdk dot java dot net
EffortM
DurationM
Relates toJEP 355: Text Blocks (Preview)
Reviewed byAlex Buckley
Created2019/09/30 14:10
Updated2020/03/05 02:05
Issue8231623

Summary

Add text blocks to the Java language. A text block is a multi-line string literal that avoids the need for most escape sequences, automatically formats the string in a predictable way, and gives the developer control over the format when desired. This is a preview language feature in JDK 14.

History

Text blocks were proposed by JEP 355 in early 2019 as a follow-on to explorations begun in JEP 326 (Raw String Literals), which was withdrawn and did not appear in JDK 12. JEP 355 was targeted to JDK 13 in mid 2019 as a preview feature. Feedback on JDK 13 suggested that this feature should be previewed again, with the addition of two new escape sequences.

Goals

  • Simplify the task of writing Java programs by making it easy to express strings that span several lines of source code, while avoiding escape sequences in common cases.

  • Enhance the readability of strings in Java programs that denote code written in non-Java languages.

  • Support migration from string literals by stipulating that any new construct can express the same set of strings as a string literal, interpret the same escape sequences, and be manipulated in the same ways as a string literal.

  • Add escape sequences for managing explicit white space and newline control.

Non-Goals

  • It is not a goal to define a new reference type, distinct from java.lang.String, for the strings expressed by any new construct.

  • It is not a goal to define new operators, distinct from +, that take String operands.

  • Text blocks do not directly support string interpolation. Interpolation may be considered in a future JEP.

  • Text blocks do not support raw strings, that is, strings whose characters are not processed in any way.

Motivation

In Java, embedding a snippet of HTML, XML, SQL, or JSON in a string literal "..." usually requires significant editing with escapes and concatenation before the code containing the snippet will compile. The snippet is often difficult to read and arduous to maintain.

More generally, the need to denote short, medium, and long blocks of text in a Java program is near universal, whether the text is code from other programming languages, structured text representing golden files, or messages in natural languages. On the one hand, the Java language recognizes this need by allowing strings of unbounded size and content; on the other hand, it embodies a design default that strings should be small enough to denote on a single line of a source file (surrounded by " characters), and simple enough to escape easily. This design default is at odds with the large number of Java programs where strings are too long to fit comfortably on a single line.

Accordingly, it would improve both the readability and the writability of a broad class of Java programs to have a linguistic mechanism for denoting strings more literally than a string literal -- across multiple lines and without the visual clutter of escapes. In essence, a two-dimensional block of text, rather than a one-dimensional sequence of characters.

Still, it is impossible to predict the role of every string in Java programs. Just because a string spans multiple lines of source code does not mean that newline characters are desirable in the string. One part of a program may be more readable when strings are laid out over multiple lines, but the embedded newline characters may change the behavior of another part of the program. Accordingly, it would be helpful if the developer had precise control over where newlines appear, and, as a related matter, how much white space appears to the left and right of the "block" of text.

HTML example

Using "one-dimensional" string literals

String html = "<html>\n" +
              "    <body>\n" +
              "        <p>Hello, world</p>\n" +
              "    </body>\n" +
              "</html>\n";

Using a "two-dimensional" block of text

String html = """
              <html>
                  <body>
                      <p>Hello, world</p>
                  </body>
              </html>
              """;

SQL example

Using "one-dimensional" string literals

String query = "SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB`\n" +
               "WHERE `CITY` = 'INDIANAPOLIS'\n" +
               "ORDER BY `EMP_ID`, `LAST_NAME`;\n";

Using a "two-dimensional" block of text

String query = """
               SELECT `EMP_ID`, `LAST_NAME` FROM `EMPLOYEE_TB`
               WHERE `CITY` = 'INDIANAPOLIS'
               ORDER BY `EMP_ID`, `LAST_NAME`;
               """;

Polyglot language example

Using "one-dimensional" string literals

ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
Object obj = engine.eval("function hello() {\n" +
                         "    print('\"Hello, world\"');\n" +
                         "}\n" +
                         "\n" +
                         "hello();\n");

Using a "two-dimensional" block of text

ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
Object obj = engine.eval("""
                         function hello() {
                             print('"Hello, world"');
                         }
                         
                         hello();
                         """);

Description

This p is identical to the same p in this JEP's predecessor, JEP 355, except for the addition of the subp on new escape sequences.

text block is a new kind of literal in the Java language. It may be used to denote a string anywhere that a string literal could appear, but offers greater expressiveness and less accidental complexity.

A text block consists of zero or more content characters, enclosed by opening and closing delimiters.

The opening delimiter is a sequence of three double quote characters (""") followed by zero or more white spaces followed by a line terminator. The content begins at the first character after the line terminator of the opening delimiter.

The closing delimiter is a sequence of three double quote characters. The content ends at the last character before the first double quote of the closing delimiter.

The content may include double quote characters directly, unlike the characters in a string literal. The use of \" in a text block is permitted, but not necessary or recommended. Fat delimiters (""") were chosen so that " characters could appear unescaped, and also to visually distinguish a text block from a string literal.

The content may include line terminators directly, unlike the characters in a string literal. The use of \n in a text block is permitted, but not necessary or recommended. For example, the text block:

"""
line 1
line 2
line 3
"""

is equivalent to the string literal:

"line 1\nline 2\nline 3\n"

or a concatenation of string literals:

"line 1\n" +
"line 2\n" +
"line 3\n"

If a line terminator is not required at the end of the string, then the closing delimiter can be placed on the last line of content. For example, the text block:

"""
line 1
line 2
line 3"""

is equivalent to the string literal:

"line 1\nline 2\nline 3"

A text block can denote the empty string, although this is not recommended because it needs two lines of source code:

String empty = """
""";

Here are some examples of ill-formed text blocks:

String a = """""";   // no line terminator after opening delimiter
String b = """ """;  // no line terminator after opening delimiter
String c = """
           ";        // no closing delimiter (text block continues to EOF)
String d = """
           abc \ def
           """;      // unescaped backslash (see below for escape processing)

Compile-time processing

A text block is a constant expression of type String, just like a string literal. However, unlike a string literal, the content of a text block is processed by the Java compiler in three distinct steps:

  1. Line terminators in the content are translated to LF (\u000A). The purpose of this translation is to follow the principle of least surprise when moving Java source code across platforms.

  2. Incidental white space surrounding the content, introduced to match the indentation of Java source code, is removed.

  3. Escape sequences in the content are interpreted. Performing interpretation as the final step means developers can write escape sequences such as \n without them being modified or deleted by earlier steps.

The processed content is recorded in the class file as a CONSTANT_String_info entry in the constant pool, just like the characters of a string literal. The class file does not record whether a CONSTANT_String_info entry was derived from a text block or a string literal.

At run time, a text block is evaluated to an instance of String, just like a string literal. Instances of String that are derived from text blocks are indistinguishable from instances derived from string literals. Two text blocks with the same processed content will refer to the same instance of String due to interning, just like for string literals.

The following ps discuss compile-time processing in more detail.

1. Line terminators

Line terminators in the content are normalized from CR (\u000D) and CRLF (\u000D\u000A) to LF (\u000A) by the Java compiler. This ensures that the string derived from the content is equivalent across platforms, even if the source code has been translated to a platform encoding (see javac -encoding).

For example, if Java source code that was created on a Unix platform (where the line terminator is LF) is edited on a Windows platform (where the line terminator is CRLF), then without normalization, the content would become one character longer for each line. Any algorithm that relied on LF being the line terminator might fail, and any test that needed to verify string equality with String::equals would fail.

The escape sequences \n (LF), \f (FF), and \r (CR) are not interpreted during normalization; escape processing happens later.

2. Incidental white space

The text blocks in shown above were easier to read than their concatenated string literal counterparts, but the obvious interpretation for the content of a text block would include the spaces added to indent the embedded string so that it lines up neatly with the opening delimiter. Here is the HTML example using dots to visualize the spaces that the developer added for indentation:

String html = """
..............<html>
..............    <body>
..............        <p>Hello, world</p>
..............    </body>
..............</html>
..............""";

Since the opening delimiter is generally positioned to appear on the same line as the statement or expression which consumes the text block, there is no real significance to the fact that 14 visualized spaces start each line. Including those spaces in the content would mean the text block denotes a string different from the one denoted by the concatenated string literals. This would hurt migration, and be a recurring source of surprise: it is overwhelmingly likely that the developer does not want those spaces in the string. Also, the closing delimiter is generally positioned to align with the content, which further suggests that the 14 visualized spaces are insignificant.

Spaces may also appear at the end of each line, especially when a text block is populated by copy-pasting snippets from other files (which may themselves have been formed by copy-pasting from yet more files). Here is the HTML example reimagined with some trailing white space, again using dots to visualize spaces:

String html = """
..............<html>...
..............    <body>
..............        <p>Hello, world</p>....
..............    </body>.
..............</html>...
..............""";

Trailing white space is most often unintentional, idiosyncratic, and insignificant. It is overwhelmingly likely that the developer does not care about it. Trailing white space characters are similar to line terminators, in that both are invisible artifacts of the source code editing environment. With no visual guide to the presence of trailing white space characters, including them in the content would be a recurring source of surprise, as it would affect the length, hash code, etc, of the string.

Accordingly, an appropriate interpretation for the content of a text block is to differentiate incidental white space at the start and end of each line, from essential white space. The Java compiler processes the content by removing incidental white space to yield what the developer intended. String::indent can then be used to manage indentation if desired. Using | to visualize margins:

|<html>|
|    <body>|
|        <p>Hello, world</p>|
|    </body>|
|</html>|

The re-indentation algorithm takes the content of a text block whose line terminators have been normalized to LF. It removes the same amount of white space from each line of content until at least one of the lines has a non-white space character in the leftmost position. The position of the opening """ characters has no effect on the algorithm, but the position of the closing """ characters does have an effect if placed on its own line. The algorithm is as follows:

  1. Split the content of the text block at every LF, producing a list of individual lines. Note that any line in the content which was just an LF will become an empty line in the list of individual lines.

  2. Add all non-blank lines from the list of individual lines into a set of determining lines. (Blank lines -- lines that are empty or are composed wholly of white space -- have no visible influence on the indentation. Excluding blank lines from the set of determining lines avoids throwing off step 4 of the algorithm.)

  3. If the last line in the list of individual lines (i.e., the line with the closing delimiter) is blank, then add it to the set of determining lines. (The indentation of the closing delimiter should influence the indentation of the content as a whole -- a significant trailing line policy.)

  4. Compute the common white space prefix of the set of determining lines, by counting the number of leading white space characters on each line and taking the minimum count.

  5. Remove the common white space prefix from each non-blank line in the list of individual lines.

  6. Remove all trailing white space from all lines in the modified list of individual lines from step 5. This step collapses wholly-white-space lines in the modified list so that they are empty, but does not discard them.

  7. Construct the result string by joining all the lines in the modified list of individual lines from step 6, using LF as the separator between lines. If the final line in the list from step 6 is empty, then the joining LF from the previous line will be the last character in the result string.

The escape sequences \b (backspace) and \t (tab) are not interpreted by the algorithm; escape processing happens later.

The re-indentation algorithm will be normative in The Java Language Specification. Developers will have access to it via String::stripIndent, a new instance method.

Significant trailing line policy

Normally, one would format a text block in two ways: first, position the left edge of the content to appear under the first " of the opening delimiter, and second, place the closing delimiter on its own line to appear exactly under the opening delimiter. The resulting string will have no white space at the start of any line, and will not include the trailing blank line of the closing delimiter.

However, because the trailing blank line is considered a determining line, moving it to the left has the effect of reducing the common white space prefix, and therefore reducing the the amount of white space that is stripped from the start of every line. In the extreme case, where the closing delimiter is moved all the way to the left, that reduces the common white space prefix to zero, effectively opting out of white space stripping.

For example, with the closing delimiter moved all the way to the left, there is no incidental white space to visualize with dots:

String html = """
              <html>
                  <body>
                      <p>Hello, world</p>
                  </body>
              </html>
""";

Including the trailing blank line with the closing delimiter, the common white space prefix is zero, so zero white space is removed from the start of each line. The algorithm thus produces: (using | to visualize the left margin)

|              <html>
|                  <body>
|                      <p>Hello, world</p>
|                  </body>
|              </html>

Alternatively, suppose the closing delimiter is not moved all the way to the left, but rather under the t of html so it is eight spaces deeper than the variable declaration:

String html = """
              <html>
                  <body>
                      <p>Hello, world</p>
                  </body>
              </html>
        """;

The spaces visualized with dots are considered to be incidental:

String html = """
........      <html>
........          <body>
........              <p>Hello, world</p>
........          </body>
........      </html>
........""";

Including the trailing blank line with the closing delimiter, the common white space prefix is eight, so eight white spaces are removed from the start of each line. The algorithm thus preserves the essential indentation of the content relative to the closing delimiter:

|      <html>
|          <body>
|              <p>Hello, world</p>
|          </body>
|      </html>

Finally, suppose the closing delimiter is moved slightly to the right of the content:

String html = """
              <html>
                  <body>
                      <p>Hello, world</p>
                  </body>
              </html>
                  """;

The spaces visualized with dots are considered to be incidental:

String html = """
..............<html>
..............    <body>
..............        <p>Hello, world</p>
..............    </body>
..............</html>
..............    """;

The common white space prefix is 14, so 14 white spaces are removed from the start of each line. The trailing blank line is stripped to leave an empty line, which being the last line is then discarded. In other words, moving the closing delimiter to the right of the content has no effect, and the algorithm again preserves the essential indentation of the content:

|<html>
|    <body>
|        <p>Hello, world</p>
|    </body>
|</html>
3. Escape sequences

After the content is re-indented, any escape sequences in the content are interpreted. Text blocks support all of the escape sequences supported in string literals, including \n\t\'\", and \\. See p 3.10.6 of the The Java Language Specification for the full list. Developers will have access to escape processing via String::translateEscapes, a new instance method.

Interpreting escapes as the final step allows developers to use \n\f, and \r for vertical formatting of a string without it affecting the translation of line terminators in step 1, and to use \b and \t for horizontal formatting of a string without it affecting the removal of incidental white space in step 2. For example, consider this text block that contains the \r escape sequence (CR):

String html = """
              <html>\r
                  <body>\r
                      <p>Hello, world</p>\r
                  </body>\r
              </html>\r
              """;

The CR escapes are not processed until after the line terminators have been normalized to LF. Using Unicode escapes to visualize LF (\u000A) and CR (\u000D), the result is:

|<html>\u000D\u000A
|    <body>\u000D\u000A
|        <p>Hello, world</p>\u000D\u000A
|    </body>\u000D\u000A
|</html>\u000D\u000A

Note that it is legal to use " freely inside a text block, even next to the opening or closing delimiter. For example:

String story = """
    "When I use a word," Humpty Dumpty said,
    in rather a scornful tone, "it means just what I
    choose it to mean - neither more nor less."
    "The question is," said Alice, "whether you
    can make words mean so many different things."
    "The question is," said Humpty Dumpty,
    "which is to be master - that's all."
    """;

However, sequences of three " characters require the escaping of at least one " to avoid mimicking the closing delimiter:

String code = 
    """
    String text = \"""
        A text block inside a text block
    \""";
    """;

New escape sequences

To allow finer control of the processing of newlines and white space, we introduce two new escape sequences.

First, the \<line-terminator> escape sequence explicitly suppresses the insertion of a newline character.

For example, it is common practice to split very long string literals into concatenations of smaller substrings, and then hard wrap the resulting string expression onto multiple lines:

String literal = "Lorem ipsum dolor sit amet, consectetur adipiscing " +
                 "elit, sed do eiusmod tempor incididunt ut labore " +
                 "et dolore magna aliqua.";

With the \<line-terminator> escape sequence this could be expressed as:

String text = """
                Lorem ipsum dolor sit amet, consectetur adipiscing \
                elit, sed do eiusmod tempor incididunt ut labore \
                et dolore magna aliqua.\
                """;

For the simple reason that character literals and traditional string literals don't allow embedded newlines, the \<line-terminator> escape sequence is only applicable to text blocks.

Second, the new \s escape sequence simply translates to a single space (\u0020).

Escape sequences aren't translated until after incident space stripping, so \s can act as fence to prevent the stripping of trailing white space. Using \s at the end of each line in this example guarantees that each line is exactly six characters long:

String colors = """
    red  \s
    green\s
    blue \s
    """;

The \s escape sequence can be used in both text blocks and traditional string literals.

Concatenation of text blocks

Text blocks can be used anywhere a string literal can be used. For example, text blocks and string literals may be concatenated interchangeably:

String code = "public void print(Object o) {" +
              """
                  System.out.println(Objects.toString(o));
              }
              """;

However, concatenation involving a text block can become rather clunky. Take this text block as a starting point:

String code = """
              public void print(Object o) {
                  System.out.println(Objects.toString(o));
              }
              """;

Suppose it needs to be changed so that the type of o comes from a variable. Using concatenation, the text block that contains the trailing code will need to start on a new line. Unfortunately, the straightforward insertion of a newline in the program, as below, will cause a long span of white space between the type and the text beginning o :

String code = """
              public void print(""" + type + """
                                                 o) {
                  System.out.println(Objects.toString(o));
              }
              """;

The white space can be removed manually, but this hurts readability of the quoted code:

String code = """
              public void print(""" + type + """
               o) {
                  System.out.println(Objects.toString(o));
              }
              """;

A cleaner alternative is to use String::replace or String::format, as follows:

String code = """
              public void print($type o) {
                  System.out.println(Objects.toString(o));
              }
              """.replace("$type", type);
String code = String.format("""
              public void print(%s o) {
                  System.out.println(Objects.toString(o));
              }
              """, type);

Another alternative involves the introduction of a new instance method, String::formatted, which could be used as follows:

String source = """
                public void print(%s object) {
                    System.out.println(Objects.toString(object));
                }
                """.formatted(type);

Additional Methods

The following methods will be added to support text blocks;

  • String::stripIndent(): used to strip away incidental white space from the text block content

  • String::translateEscapes(): used to translate escape sequences

  • String::formatted(Object... args): simplify value substitution in the text block

Alternatives

Do nothing

Java has prospered for over 20 years with string literals that required newlines to be escaped. IDEs ease the maintenance burden by supporting automatic formatting and concatenation of strings that span several lines of source code. The String class has also evolved to include methods that simplify the processing and formatting of long strings, such as a method that presents a string as a stream of lines. However, strings are such a fundamental part of the Java language that the shortcomings of string literals are apparent to vast numbers of developers. Other JVM languages have also made advances in how long and complex strings are denoted. Unsurprisingly, then, multi-line string literals have consistently been one of the most requested features for Java. Introducing a multi-line construct of low to moderate complexity would have a high payoff.

Allow a string literal to span multiple lines

Multi-line string literals could be introduced in Java simply by allowing line terminators in existing string literals. However, this would do nothing about the pain of escaping " characters. \" is the most frequently occurring escape sequence after \n, because of frequency of code snippets. The only way to avoid escaping " in a string literal would be to provide an alternate delimiter scheme for string literals. Delimiters were much discussed for JEP 326 (Raw String Literals), and the lessons learned were used to inform the design of text blocks, so it would be misguided to upset the stability of string literals.

Adopt another language's multi-string literal

According to Brian Goetz:

Many people have suggested that Java should adopt multi-line string literals from Swift or Rust. However, the approach of “just do what language X does” is intrinsically irresponsible; nearly every feature of every language is conditioned by other features of that language. Instead, the game is to learn from how other languages do things, assess the tradeoffs they’ve chosen (explicitly and implicitly), and ask what can be applied to the constraints of the language we have and user expectations within the community we have.

For JEP 326 (Raw String Literals), we surveyed many modern programming languages and their support for multi-line string literals. The results of these surveys influenced the current proposal, such as the choice of three " characters for delimiters (although there were other reasons for this choice too) and the recognition of the need for automatic indentation management.

Do not remove incidental white space

If Java introduced multi-line string literals without support for automatically removing incidental white space, then many developers would write a method to remove it themselves, or lobby for the String class to include a removal method. However, that implies a potentially expensive computation every time the string is instantiated at run time, which would reduce the benefit of string interning. Having the Java language mandate the removal of incidental white space, both in leading and trailing positions, seems the most appropriate solution. Developers can opt out of leading white space removal by careful placement of the closing delimiter.

Raw string literals

For JEP 326 (Raw String Literals), we took a different approach to the problem of denoting strings without escaping newlines and quotes, focusing on the raw-ness of strings. We now believe that this focus was wrong, because while raw string literals could easily span multiple lines of source code, the cost of supporting unescaped delimiters in their content was extreme. This limited the effectiveness of the feature in the multi-line use case, which is a critical one because of the frequency of embedding multi-line (but not truly raw) code snippets in Java programs. A good outcome of the pivot from raw-ness to multi-line-ness was a renewed focus on having a consistent escape language between string literals, text blocks, and related features that may be added in future.

Testing

Tests that use string literals for the creation, interning, and manipulation of instances of String should be duplicated to use text blocks too. Negative tests should be added for corner cases involving line terminators and EOF.

Tests should be added to ensure that text blocks can embed Java-in-Java, Markdown-in-Java, SQL-in-Java, and at least one JVM-language-in-Java.

https://openjdk.java.net/jeps/368


Kotlin 开发者社区

国内第一Kotlin 开发者社区公众号,主要分享、交流 Kotlin 编程语言、Spring Boot、Android、React.js/Node.js、函数式编程、编程思想等相关主题。

越是喧嚣的世界,越需要宁静的思考。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

光剑书架上的书

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值