有时候看英文论文,高频词汇是一些术语,可能不太认识,因此我们可以先分析一下该论文的词频,对于高频词汇可以在看论文之前就记住其意思,这样看论文思路会更顺畅一些,接下来就讲一下如何用python输出一篇英文论文的词汇出现频次。
首先肯定要先把论文从PDF版转为txt格式,一般来说直接转会出现乱码,建议先转为Word格式,之后再复制为txt文本格式。
接下来附上含有详细注释的代码
#论文词频分析
#You should convert the file to text format
#Read the text and save all the words in a list
def readtxt(filename):
fr = open(filename, 'r')
wordsL = []#use this list to save the words
for word in fr:
word = word.strip()
word = word.split()
wordsL = wordsL + word
fr.close()
return wordsL
#count the frequency of every word and store in a dictionary
#And sort dictionaries by value from large to small
def count(wordsL):
wordsD = {}
for x in wordsL:
#move these words that we don't need
if Judge(x):
continue
#count
if not x in wor