leetcode 819. Most Common Word 的思路与python实现

Given a paragraph and a list of banned words, return the most frequent word that is not in the list of banned words.  It is guaranteed there is at least one word that isn't banned, and that the answer is unique.

Words in the list of banned words are given in lowercase, and free of punctuation.  Words in the paragraph are not case sensitive.  The answer is in lowercase.

 

Example:

Input: 
paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
banned = ["hit"]
Output: "ball"
Explanation: 
"hit" occurs 3 times, but it is a banned word.
"ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. 
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as "ball,"), 
and that "hit" isn't the answer even though it occurs more because it is banned.

思路

这是道简单题,找一个句子里词频最高的词,违禁词除外。

思路也一看就很清晰了,先把句子处理成一个word list,然后把每个word和它对应的出现次数做成dict,然后遍历找最大值。

这个题还有个坑,看题目还以为把符号去掉,只保留字母和空格,然后split就好了,结果它给个这样的test case: "a, a, a, a, b,b,b,c, c",中间有没空格的。所以我就按位找了,碰到非字母的就把前面的一坨字母加到wordlist。官方答案是直接把"!?',;."里的replace成空格再split....感觉也很奇葩

代码

  • python里把string里得字母转化成小写 string = string.lower() 。我这个代码是每一个character调用,其实可以整个paragraph调用,这样效率提高很多。
  • 然后我也没把banned转化成set,理论上转化成set可以提高效率,但是我整完以后发现空间占用大了,但时间没变少,python太迷了。
  • 然后判断character是否字母,我这里用的是ord(c) >= ord('a') and ord(c) <= ord('z'),其实可以直接char.isalpha()。类似的方法有:
    • isdigit() 是不是数字
    • isalnum() 是不是数字或字母
    • isupper() 是不是大写英文字母
    • islower() 是不是小写英文字母
    • isspace() 是否为空格字符, 也就是判断是否为空格('')、定位字符('\t')、CR('\r')、换行('\n')、
      垂直定位字符('\v')或翻页('\f')的情况
    • isascii() 是否为ASCII 码字符, 也就是判断c 的范围是否在0 到127 之间.
  • dict.keys() 是个方法,记得加括号。
class Solution:
    def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
        nextword = ''
        wordlist = []
        for i, c in enumerate(paragraph):
            c = c.lower()
            if ord(c) >= ord('a') and ord(c) <= ord('z'):
                nextword += c
            elif len(nextword):
                wordlist.append(nextword)
                nextword = ''
        if len(nextword):
                wordlist.append(nextword)
        for i, word in enumerate(wordlist):
            if word in banned:
                wordlist[i] = '$'
        
        worddict = {}
        for word in wordlist:
            if word == '$':
                continue
            if word in worddict.keys():
                worddict[word] += 1
            else:
                worddict[word] = 1
        
        ans = ''
        maximum = 0
        for word in worddict.keys():
            if worddict[word] > maximum:
                ans = word
                maximum = worddict[word]
        return ans

我写得比较繁琐,感觉大神用python应该三行完事了。结果一看discussion,还真有大神三行完事......

    def mostCommonWord(self, p, banned):
        ban = set(banned)
        # words = re.sub(r'[^a-zA-Z]', ' ', p).lower().split()
        words = re.findall(r'\w+', p.lower())
        return collections.Counter(w for w in words if w not in ban).most_common(1)[0][0]

用了正则表达式,但是我不会。

值得学习的是collections.Counter(list),这个方法标答里也用了。

我们都知道,Python拥有一些内置的数据类型,比如str, int, list, tuple, dict等, collections模块在这些内置数据类型的基础上,提供了几个额外的数据类型:

  • namedtuple(): 生成可以使用名字来访问元素内容的tuple子类
  • deque: 双端队列,可以快速的从另外一侧追加和推出对象
  • Counter: 计数器,主要用来计数
    • Counter 对象的常用案例

      sum(c.values())                 # total of all counts
      c.clear()                       # reset all counts
      list(c)                         # list unique elements
      set(c)                          # convert to a set
      dict(c)                         # convert to a regular dictionary
      c.items()                       # convert to a list of (elem, cnt) pairs
      Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
      c.most_common()[:-n-1:-1]       # n least common elements
      +c                              # remove zero and negative counts
    • Counter类继承dict类,所以它能使用dict类里面的方法 
    • Counter('gallahad') 从一个可iterable对象(list、tuple、dict、字符串等)创建
  • OrderedDict: 有序字典
  • defaultdict: 带有默认值的字典

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值