java 读取单词,尝试从Java文件中读取2个单词

该博客讨论了尝试用Java从文本文件中读取单词对并存储到TreeSet时遇到的问题。代码示例中,由于Scanner默认按空格分隔,导致无法匹配连续的两个单词。解决方案是逐行读取文件,然后使用split()方法解析单词,并创建单词对。
摘要由CSDN通过智能技术生成

I'm trying to write a simple program to read a text file and store pair of words in a Set. Here is the code I wrote for that

import java.io.File;

import java.io.FileNotFoundException;

import java.util.Scanner;

import java.util.TreeSet;

public class Main {

public static void main(String[] args) {

TreeSet phraseSet = new TreeSet();

try {

Scanner readfile = new Scanner(new File("data.txt"));

while(readfile.hasNext("\\w{2}")) {

String phrase = readfile.next("\\w{2}");

phraseSet.add(phrase);

}

} catch (FileNotFoundException e) {

e.printStackTrace();

}

for(String p : phraseSet) {

System.out.println(p);

}

}

}

The code compiles but prints out a blank line (The while loop is never entered).

The data.txt file contents are:

There are seven words in this line.

And then there are few more words in this line.

I'm expecting following Strings in my TreeSet (off course in sorted order)

There are

are seven

seven words

words in

in this

this line

line And

And then

then there

there are

....

this line

解决方案

Your main problem is that Scanner by default parses tokens by whitespace.

According to the API:

A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types using the various next methods.

If you take a look at hasNext(String pattern), you'll see that it

Returns the next token if it matches the pattern constructed from the specified string. If the match is successful, the scanner advances past the input that matched the pattern.

(emphasis mine)

i.e. By the time you are asking for the Scanner to check for your token, it's already broken up the input by whitespace, so asking to find a token with a space in the middle will always fail.

A better way to do this would be to have the Scanner read in a line at a time, and then just split() the line and parse it yourself:

Scanner readfile = new Scanner(new File("data.txt"));

while (readfile.hasNextLine()) {

String[] words = readfile.nextLine().split("\\s");

for (int i=0; i

phraseSet.add(words[i] + " " + words[i+1]);

}

}

Your question didn't explicitly mention it, but from your example output, it looks like you want to ignore line breaks in reading. This approach makes that slightly more complicated, but you can just store off the last word of each line and add it when parsing the next, like so:

String lastWord = null;

while (readfile.hasNextLine()) {

String[] words = readfile.nextLine().split("\\s");

if (lastWord != null) {

phraseSet.add(lastWord + " " + words[0]);

}

for (int i=0; i

phraseSet.add(words[i] + " " + words[i+1]);

}

lastWord = words[words.length-1];

}

If this is actually what you're looking for, you're probably better off just using next() to pull each word one at a time like other answers have shown how to do.

To sum up

You cannot use Scanner to directly look for multi-word tokens, you'll have to do the parsing yourself.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值