Java——检索一段话中出现次数最多的英文单词

最新推荐文章于 2024-05-31 22:04:23 发布

陈夏明

最新推荐文章于 2024-05-31 22:04:23 发布

阅读量1.4w

点赞数 5

分类专栏： Java 文章标签： java

本文链接：https://blog.csdn.net/u012325167/article/details/50884373

版权

Java 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

今日看到一个题目，要求在给出的一段话中检索出出现频率最高的单词。现在此分享。

题目：

在下面这段话中，检索出出现次数最多的英文单词：
Look to the skies above London and you’ll see the usual suspects rainclouds, plane and pigeons. But by the end of the year, you might just see something else.

思路：

可以看到，这一段话中包含英文单词、空格、标点符号三种字符，若需要统计单词数，需要将标点符号、空格省略掉

做法：
1、在整个字符串中，先将标点符号替换成空格
2、将字符串按空格（一个或多个）分割
3、统计每个单词出现的次数

实现方法一：

package count;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Count {
    public static void main(String[] args) {
        long start = System.currentTimeMillis();

        String str = "Look to the skies above London and you'll see the usual suspects rainclouds, plane and pigeons. But by the end of the year, you might just see something else.";
        str = str.replace('\'', ' ');//将'号用空格替换
        str = str.replace(',', ' ');//将逗号用空格替换
        str = str.replace('.', ' ');//将句号用空格替换

        String[] strings = str.split("\\s+");   // “\\s+”代表一个或多个空格，是正则表达式
//      String[] strings = str.split(" +"); // “ +”在我的机器上也能代表一个或多个空格

        Map<String, Integer> map = new HashMap<String, Integer>();
        List<String> list = new ArrayList<String>();//存储每个不重复的单词

        for(String s : strings){
            if(map.containsKey(s)){//如果map中已经包含该单词，则将其个数+1
                int x = map.get(s);
                x++;
                map.put(s, x);
            }else{  //如果map中没用包含该单词，代表该单词第一次出现，则将其放入map并将个数设置为1
                map.put(s, 1);
                list.add(s);//将其添加到list中，代表它是一个新出现的单词
            }
        }

        int max=0;//记录出现次数最多的那个单词的出现次数
        String maxString = null;//记录出现次数最多的那个单词的值
        /*
         * 从list中取出每个单词，在map中查找其出现次数
         * 并没有真正排序，而只是记录下出现次数最多的那个单词
         */
        for(String s : list){
            int x = map.get(s);
            if(x>max){
                maxString = s;
                max = x;
            }
        }

        System.out.println(maxString);

        long end = System.currentTimeMillis();

        System.out.println("共耗时：" + (end - start) + "毫秒");
    }
}

实现方法二：

将map中的元素转换成用“键值对”表示的Entry，然后用一个列表（List）存储所有的Entry。
再使用Collections.sort()方法对List进行排序，并自己实现Comparator接口的方法。

package count;

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;


public class Count2 {
    public static void main(String[] args) {
        long start = System.currentTimeMillis();
        String str = "Look to the skies above London and you'll see the usual suspects rainclouds, plane and pigeons. But by the end of the year, you might just see something else.";
        str = str.replace('\'', ' ');
        str = str.replace(',', ' ');
        str = str.replace('.', ' ');
        String[] strings = str.split("\\s+");// “\\s+”代表一个或多个空格，是正则表达式

        /*
         * 跟方法一一样 ，先存储每个单词及其个数
         * */
        Map<String, Integer> map = new HashMap<String, Integer>();
        for(String s : strings){
            if(map.containsKey(s)){
                int x = map.get(s);
                x++;
                map.put(s, x);
            }else{
                map.put(s, 1);
            }
        }
        /*
         * 构造一个包含“键值对”的List
         * */
        List<Map.Entry<String, Integer>> list = new ArrayList<Map.Entry<String, Integer>>(map.entrySet());
        /*
         * 对List进行排序
         * 自己实现一个Comparator的匿名内部类，并实现compare方法
         * 使其根据出现的次数降序排列（因为我们需要的是出现最多的单词）
         * */
        Collections.sort(list, new Comparator<Map.Entry<String, Integer>>() {
            @Override
            public int compare(Entry<String, Integer> o1, Entry<String, Integer> o2) {
                //降序排列
                return o2.getValue() - o1.getValue();
            }
        });

        /*
         * 输出出现次数最多的单词
         * */
        System.out.println(list.get(0).getKey());

        long end = System.currentTimeMillis();

        System.out.println("共耗时：" + (end - start) + "毫秒");
    }
}

陈夏明

关注

5
点赞
踩
34

收藏

觉得还不错? 一键收藏
1
评论
Java——检索一段话中出现次数最多的英文单词

今日看到一个题目，要求在给出的一段话中检索出出现频率最高的单词。现在此分享。题目：在下面这段话中，检索出出现次数最多的英文单词： Look to the skies above London and you’ll see the usual suspects rainclouds, plane and pigeons. But by the end of the year, you might j
复制链接

扫一扫

专栏目录