leetcode 819. Most Common Word 的思路与python实现

最新推荐文章于 2022-03-10 10:41:45 发布

千追万追

最新推荐文章于 2022-03-10 10:41:45 发布

阅读量194

点赞数

分类专栏： leetcode

本文链接：https://blog.csdn.net/qq_35175413/article/details/97932984

版权

leetcode 专栏收录该内容

22 篇文章 0 订阅

订阅专栏

Given a paragraph and a list of banned words, return the most frequent word that is not in the list of banned words. It is guaranteed there is at least one word that isn't banned, and that the answer is unique.

Words in the list of banned words are given in lowercase, and free of punctuation. Words in the paragraph are not case sensitive. The answer is in lowercase.

Example:

Input: 
paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ["hit"]
Output: "ball"
Explanation: 
"hit" occurs 3 times, but it is a banned word.
"ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. 
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as "ball,"), 
and that "hit" isn't the answer even though it occurs more because it is banned.

思路

这是道简单题，找一个句子里词频最高的词，违禁词除外。

思路也一看就很清晰了，先把句子处理成一个word list，然后把每个word和它对应的出现次数做成dict，然后遍历找最大值。

这个题还有个坑，看题目还以为把符号去掉，只保留字母和空格，然后split就好了，结果它给个这样的test case: "a, a, a, a, b,b,b,c, c"，中间有没空格的。所以我就按位找了，碰到非字母的就把前面的一坨字母加到wordlist。官方答案是直接把"!?',;."里的replace成空格再split....感觉也很奇葩

代码

python里把string里得字母转化成小写 string = string.lower() 。我这个代码是每一个character调用，其实可以整个paragraph调用，这样效率提高很多。
然后我也没把banned转化成set，理论上转化成set可以提高效率，但是我整完以后发现空间占用大了，但时间没变少，python太迷了。
然后判断character是否字母，我这里用的是ord(c) >= ord('a') and ord(c) <= ord('z')，其实可以直接char.isalpha()。类似的方法有：
- isdigit() 是不是数字
- isalnum() 是不是数字或字母
- isupper() 是不是大写英文字母
- islower() 是不是小写英文字母
- isspace() 是否为空格字符, 也就是判断是否为空格('')、定位字符('\t')、CR('\r')、换行('\n')、
  垂直定位字符('\v')或翻页('\f')的情况
- isascii() 是否为ASCII 码字符, 也就是判断c 的范围是否在0 到127 之间.
dict.keys() 是个方法，记得加括号。

class Solution:
    def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
        nextword = ''
        wordlist = []
        for i, c in enumerate(paragraph):
            c = c.lower()
            if ord(c) >= ord('a') and ord(c) <= ord('z'):
                nextword += c
            elif len(nextword):
                wordlist.append(nextword)
                nextword = ''
        if len(nextword):
                wordlist.append(nextword)
        for i, word in enumerate(wordlist):
            if word in banned:
                wordlist[i] = '$'
        
        worddict = {}
        for word in wordlist:
            if word == '$':
                continue
            if word in worddict.keys():
                worddict[word] += 1
            else:
                worddict[word] = 1
        
        ans = ''
        maximum = 0
        for word in worddict.keys():
            if worddict[word] > maximum:
                ans = word
                maximum = worddict[word]
        return ans

我写得比较繁琐，感觉大神用python应该三行完事了。结果一看discussion，还真有大神三行完事......

    def mostCommonWord(self, p, banned):
        ban = set(banned)
        # words = re.sub(r'[^a-zA-Z]', ' ', p).lower().split()
        words = re.findall(r'\w+', p.lower())
        return collections.Counter(w for w in words if w not in ban).most_common(1)[0][0]

用了正则表达式，但是我不会。

值得学习的是collections.Counter(list)，这个方法标答里也用了。

我们都知道，Python拥有一些内置的数据类型，比如str, int, list, tuple, dict等， collections模块在这些内置数据类型的基础上，提供了几个额外的数据类型：

namedtuple(): 生成可以使用名字来访问元素内容的tuple子类
deque: 双端队列，可以快速的从另外一侧追加和推出对象

Counter: 计数器，主要用来计数

Counter 对象的常用案例

sum(c.values())                 # total of all counts
c.clear()                       # reset all counts
list(c)                         # list unique elements
set(c)                          # convert to a set
dict(c)                         # convert to a regular dictionary
c.items()                       # convert to a list of (elem, cnt) pairs
Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
c.most_common()[:-n-1:-1]       # n least common elements
+c                              # remove zero and negative counts

Counter类继承dict类，所以它能使用dict类里面的方法
Counter('gallahad') 从一个可iterable对象（list、tuple、dict、字符串等）创建

OrderedDict: 有序字典
defaultdict: 带有默认值的字典

千追万追

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
leetcode 819. Most Common Word 的思路与python实现

Given a paragraphand a list of banned words, return the most frequent word that is not in the list of banned words. It is guaranteed there is at least one word that isn't banned, and that the answer...
复制链接

扫一扫