最常见的单词MostCommonWord

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;

/**
 * @author LemonLin
 * @Description :StringMostCommonWord
 * @date 19.6.10-22:14
 * Given a paragraph and a list of banned words, return the most frequent word that is not in the list
 * of banned words.  It is guaranteed there is at least one word that isn't banned, and that the
 * answer is unique.Words in the list of banned words are given in lowercase, and free of
 * punctuation.  Words in the paragraph are not case sensitive.  The answer is in lowercase.
 *
 * Example:
 * Input:
 * paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
 * banned = ["hit"]
 * Output: "ball"
 * Explanation:
 * "hit" occurs 3 times, but it is a banned word.
 * "ball" occurs twice (and no other word does), so it is the most frequent non-banned word
 * in the paragraph.
 * Note that words in the paragraph are not case sensitive,
 * that punctuation is ignored (even if adjacent to words, such as "ball,"),
 * and that "hit" isn't the answer even though it occurs more because it is banned.
 *
 * Note:
 * 1 <= paragraph.length <= 1000.
 * 0 <= banned.length <= 100.
 * 1 <= banned[i].length <= 10.
 * The answer is unique, and written in lowercase (even if its occurrences in paragraph
 * may have uppercase symbols, and even if it is a proper noun.)
 * paragraph only consists of letters, spaces, or the punctuation symbols !?',;.
 * There are no hyphens or hyphenated words.
 * Words only consist of letters, never apostrophes or other punctuation symbols.
 * 给定一个段落 (paragraph) 和一个禁用单词列表 (banned)。返回出现次数最多,同时不在禁用列表中的
 * 单词。题目保证至少有一个词不在禁用列表中,而且答案唯一。
 * 禁用列表中的单词用小写字母表示,不含标点符号。段落中的单词不区分大小写。答案都是小写字母。
 *思路:
 * 考虑用hashmap处理,把单词作为key,value值是单词出现的次数。
 * 1、关于段落提取单词的思路,先把段落中所有的符号,.:!?';用空格替代,然后再用正则表达式把\\s+匹配任意空白
 * 字符作为分隔点,把段落分隔成单词形式。
 * 2、利用hashmap统计key出现的频率
 * 3、把banner中单词key把hashmap对应的value值置为0;
 * 4、取出此时最大的value值对应的key
 */
public class StringMostCommonWord {
    public String mostCommonWord(String paragraph, String[] banned) {
        String symbol = ",.:!?';";
        for (char i : symbol.toCharArray()) {
            paragraph = paragraph.replace(i, ' ');
        }
        paragraph = paragraph.toLowerCase();
        String [] arr = paragraph.split("\\s+");
        HashMap<String,Integer> hashMap = new HashMap();
        for(String ss : arr){
            if (hashMap.containsKey(ss)){
                Integer integer= hashMap.get(ss)+1;
                hashMap.put(ss, integer);
            }else {
                hashMap.put(ss,1);
            }
        }
        for(int i=0;i<banned.length;i++){
            hashMap.put(banned[i].toLowerCase(),0);
        }
        int max =0;
        String result="";
        for (String string:hashMap.keySet()){
            string = string.toLowerCase();
            if (hashMap.get(string)>max){
                max = hashMap.get(string);
                result = string;
            }
        }
        return result;
    }

    public static void main(String[] args) {
        String pa ="Bob. hIt, baLl";
        String[] ba = new String[]{"bob", "hit"};
        System.out.println(new StringMostCommonWord().mostCommonWord(pa, ba));
    }
}

用python实现以下需求,并输出代码。a) Read “train.csv” data to your Python session. b) Check the dimension of the dataframe that you created in a). (How many number of rows and columns do you observe in the dataframe?) And print the column names of the dataframe. c) We want to find out the most common word in articles of class 2 (articles on stock price movement). Please do the following to solve this question. • Step 1. Create a variable named “combinedText” having an empty string (“”) value • Step 2. Define a variable “news” in a for loop to iterate over the articles of class 2 (df.news[df.label==2]) – Step 3. Add “combinedText” to “news” (we need to place an empty space (“ ”) in between them) and assign the resultant string back to “combinedText” • Step 4. Split “news” into words (you can use combinedText.split()) and assign the resultant list to “words” • Step 5. Find the unique words in “words” and assign the resultant list to “unique_words” • Step 6. Create an empty list named “word_freqs” • Step 7. Define a variable “word” in a for loop to iterate over “unique_words” – Step 8. Count the number of times “word” appears in “words” (you can use words.count(word)) and append the count to “word_freqs” • Step 9. Find the index of maximum value of “word_freqs”. (I suggest you to use numpy.argmax(word_freqs) where numpy is an external library that needs to be imported to your Python session.) And provide this index to “unique_words” to find the most common word.
最新发布
04-24
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值