python统计前十出现最多的词

最新推荐文章于 2022-08-26 09:48:48 发布

weixin_33834075

最新推荐文章于 2022-08-26 09:48:48 发布

阅读量1k

点赞数

文章标签： python 数据库

原文链接：https://yq.aliyun.com/articles/493250

版权

一、描述

这是一道python面试题：

“一个可读文件，有一万行，一行只有一个单词，单词可以重复的，求出这一万行中出现频繁次数最多的前10个单词”

二、思路

先读取文件变为列表，再用集合去重得到一个参照的列表，逆排序取前10（最大即最多的的10个元素），再用参照列表中的每个元素从文件中去统计，把参照列表中的元素作为键，统计到的结果为值，放入字典，打印出来。

三、代码

 
          #!/usr/bin/python 
         
          #coding:utf-8 
         
          all_C  
          =  
          [] 
         
          with  
          open 
          ( 
          "words.txt" 
          , 
          'r' 
          ) as f: 
         
          for  
          line  
          in  
          f.readlines(): 
         
          all_C.append(line) 
         
          #获取无重复元素 
         
          all_set 
          = 
          set 
          ( 
          sorted 
          (all_C)) 
         
          #统计为字典 
         
          counts 
          = 
          {} 
         
          for  
          key  
          in  
          all_set: 
         
          counts[key]  
          =  
          all_C.count(key) 
         
          #获取前10个元素的个数变为列表 
         
          tens  
          =  
          sorted 
          (counts.values(),reverse 
          = 
          True 
          )[ 
          0 
          : 
          11 
          ] 
         
          print  
          tens 
         
          #统计最终前十的元素及出现次数 
         
          tendict  
          =  
          {} 
         
          for  
          k  
          in  
          counts.keys(): 
         
          if  
          counts[k]  
          in  
          tens: 
         
          tendict.setdefault(counts[k],k.strip( 
          "\n" 
          )) 
         
          print 
          ( 
          "出现最多的10个词为:%s \n" 
          )  
          % 
          tendict

#python tens.py

如图：

练习的文件类似如下10001行，以文件的方式读取还是很快的：

参考其他人代码二：

 
     
          #!/usr/bin/python 
         
 
          #coding:utf-8 
         
 
          result 
          =  
          {} 
         
 
          with  
          open 
          ( 
          "words.txt" 
          , 
          'r' 
          ) as fopen: 
         
 
               
          fopen.seek( 
          0 
          , 
          2 
          ) 
         
 
               
          all  
          =  
          fopen.tell() 
         
 
               
          fopen.seek( 
          0 
          , 
          0 
          ) 
         
 
               
          while  
          fopen.tell() <  
          all 
          : 
         
 
                   
          lines  
          =  
          fopen.readline().strip() 
         
 
                   
          if  
          lines  
          in  
          result: 
         
 
                       
          result[lines]  
          + 
          =  
          1 
         
 
                   
          else 
          : 
         
 
                       
          result[lines]  
          =  
          1 
         
 
          print 
          ( 
          sorted 
          (result.items(),key 
          = 
          lambda  
          k:k[ 
          1 
          ],reverse 
          = 
          True 
          )[: 
          11 
          ])