java scanner 换行符,你怎么保持scanner.next()不包括换行符?

I am trying to simply read words in a text file using scanner.next() with delimiter equal " " but the scanner includes the newline/carriage return with the token.

I have scoured the internet trying to find a good example of this problem and have not found it so I am posting it here. I can't find another similar problem posted here on SO. I also looked over the documentation on scanner and pattern (http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html) but I still cannot find a way to solve this.

Text file:

This is a test

to see if1 this, is working

ok!

Code:

int i = 0;

String string;

try(Scanner scanner = new Scanner(new File(filename))) {

scanner.useDelimiter(" ");

while(scanner.hasNext())

{

string = scanner.next();

System.out.println(i++ + ": " + string);

}

}catch(IOException io_error) {

System.out.println(io_error);

}

Output:

0: This

1: is

2: a

3: test

to

4: see

5: if1

6: this,

7: is

8: working

ok!

As you can see, #3 and #8 have two words separated by a newline. (I know I can separate these into two separate strings.)

解决方案

The default whitespace delimiter used by a scanner is as recognized by Character.isWhitespace

Determines if the specified character is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:

It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').

It is '\t', U+0009 HORIZONTAL TABULATION.

It is '\n', U+000A LINE FEED.

It is '\u000B', U+000B VERTICAL TABULATION.

It is '\f', U+000C FORM FEED.

It is '\r', U+000D CARRIAGE RETURN.

It is '\u001C', U+001C FILE SEPARATOR.

It is '\u001D', U+001D GROUP SEPARATOR.

It is '\u001E', U+001E RECORD SEPARATOR.

It is '\u001F', U+001F UNIT SEPARATOR.

So, just don't set any specific delimiter. Keep the default, and newlines will be considered as a delimiter just like spaces, which means the token won't include newline characters.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值