java canonicalize,使用Java 8搜索字谜

I have to write program which should be reading file for anagrams and show word + his anagrams. Txt files is very big, after using scanner, listOfWords size is: 25000.

Output example:

word anagram1 anagram2 anagram3 ...

word2 anagram1 anagram2...

I have code, it works but very slow:

private static List listOfWords = new ArrayList();

private static List> allAnagrams = new ArrayList>();

public static void main(String[] args) throws Exception {

URL url = new URL("www.xxx.pl/textFile.txt");

Scanner scanner = new Scanner(url.openStream());

while (scanner.hasNext()) {

String nextToken = scanner.next();

listOfWords.add(nextToken);

}

scanner.close();

while (listOfWords.isEmpty() == false) {

ArrayList anagramy = new ArrayList();

String wzor = listOfWords.remove(0);

anagramy.add(wzor);

char[] ch = wzor.toCharArray();

Arrays.sort(ch);

for (int i = 0; i < listOfWords.size(); i++) {

String slowo = listOfWords.get(i);

char[] cha = slowo.toCharArray();

Arrays.sort(cha);

if (Arrays.equals(ch, cha)) {

anagramy.add(slowo);

listOfWords.remove(i);

i--;

}

}

allAnagrams.add(anagramy);

}

for (ArrayList ar : allAnagrams) {

String result = "";

if (ar.size() > 1) {

for (int i = 1; i < ar.size(); i++) {

result = ar.get(i) + " ";

}

System.out.println(ar.get(0) + " " + result);

}

}

}

I have to write it with Java 8 - streams but I don't know. It is possible to use Streams for reading from URL + searching anagrams? Could you help me with searching anagrams by Stream? Teacher told me that code should be shorter that mine with reading a whole list. Only a few lines, is that possible?

解决方案

Let's create separate method which sorts letters. You can do this with Stream API as well:

private static String canonicalize(String s) {

return Stream.of(s.split("")).sorted().collect(Collectors.joining());

}

Now you can read some Reader, extract words from it and group words by canonical form:

Map> map = new BufferedReader(reader).lines()

.flatMap(Pattern.compile("\\W+")::splitAsStream)

.collect(Collectors.groupingBy(Anagrams::canonicalize, Collectors.toSet()));

Next, you can remove single letter groups using Stream API for the third time:

return map.values().stream().filter(list -> list.size() > 1).collect(Collectors.toList());

Now you can pass some reader to this code to extract anagrams from it. Here's complete code:

import java.io.*;

import java.util.*;

import java.util.regex.Pattern;

import java.util.stream.*;

public class Anagrams {

private static String canonicalize(String s) {

return Stream.of(s.split("")).sorted().collect(Collectors.joining());

}

public static List> getAnagrams(Reader reader) {

Map> map = new BufferedReader(reader).lines()

.flatMap(Pattern.compile("\\W+")::splitAsStream)

.collect(Collectors.groupingBy(Anagrams::canonicalize, Collectors.toSet()));

return map.values().stream().filter(list -> list.size() > 1).collect(Collectors.toList());

}

public static void main(String[] args) throws IOException {

getAnagrams(new StringReader("abc cab tat aaa\natt tat bbb"))

.forEach(System.out::println);

}

}

It prints

[att, tat]

[abc, cab]

If you want to use an URL, just replace the StringReader with new InputStreamReader(new URL("www.xxx.pl/textFile.txt").openStream(), StandardCharsets.UTF_8)

If you want to extract the first element of the anagram set, the solution should be modified slightly:

public static Map> getAnagrams(Reader reader) {

Map> map = new BufferedReader(reader).lines()

.flatMap(Pattern.compile("\\W+")::splitAsStream)

.distinct() // remove repeating words

.collect(Collectors.groupingBy(Anagrams::canonicalize));

return map.values().stream()

.filter(list -> list.size() > 1)

.collect(Collectors.toMap(list -> list.get(0),

list -> new TreeSet<>(list.subList(1, list.size()))));

}

Here the result is the map where the key is the first element in anagram set (first occurred in the input file) and the value is the rest elements sorted alphabetically (I make a sublist to skip the first element, then move them into TreeSet to perform sorting; an alternative would be list.stream().skip(1).sorted().collect(Collectors.toList())).

Example usage:

getAnagrams(new StringReader("abc cab tat aaa\natt tat bbb\ntta\ncabr\nrbac cab crab cabrc cabr"))

.entrySet().forEach(System.out::println);

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值