计算机英文术语乱码,计算机如何区分乱码和英文？

乐姐理财说

于 2021-07-08 16:25:36 发布

阅读量242

点赞数

文章标签：计算机英文术语乱码

#Detect English module#http://inventwithpython.com/hacking (BSD Licensed)

#To use, type this code:#import detectEnglish#detectEnglish.isEnglish(someString) # returns True or False#(There must be a "dictionary.txt" file in this directory with all English#words in it, one word per line. You can download this from#http://invpy.com/dictionary.txt)#好的习惯常量大写命名！！ ' \t\n'分别是空格 (两个转义字符)制表符和换行字符

UPPERLETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'LETTERS_AND_SPACE= UPPERLETTERS + UPPERLETTERS.lower() + '\t\n'

defloadDictionary():

dictionaryFile= open('dictionary.txt')#englishWords = {}定义了一个空的字典

englishWords ={}#把所有的单词设成这个字典的键key，其对应的值都为None

#dictionaryFile.read().split('\n')这是一个列表，read()读出了整个文档内容

#成了一个大的字符串，split('\n')方法将其分割组成列表

#(因为这个文件中每一行只有一个单词)

for word in dictionaryFile.read().split('\n'):#我们不在乎每个键里面保存了什么值，所以用None 属于NoneType数据类型

#表示这个值暂且不存在，我也说不好他是什么以及会是什么

englishWords[word] =None

dictionaryFile.close()returnenglishWords#装载一个字典在detectEnglish的全局代码块中，任何import detectEnglish的#Python程序都可以看见并使用

ENGLISH_WORDS =loadDictionary()#接受一个字符串参数，返回一个浮点值，比例(没有一个0~1全是)，表示已经识别出了多少个英文单词

defgetEnglishCount(message):

message=message.upper()

message=removeNonLetters(message)

possibleWords=message.split()#考虑到message可能是一个不是英文字母的字符串如'1234568' 那么调用removeNonLetters

#返回了空的字符串，然后经过split()方法转化成空的列表这种情况要return出去

if possibleWords ==[]:return 0.0 #no words at all, so return 0.0

matches=0for word inpossibleWords:if word inENGLISH_WORDS:

matches+= 1

#我们在python中使用除法的时候要避免除以0错误，这里这种错误不会发生，因为如果possibleWords

#是空列表时在上面已经return出去了，这是一种处理除以0错误的办法

return float(matches) /len(possibleWords)#移除特殊符号和数字(不在LETTERS_AND_SPACE中的字符串)

defremoveNonLetters(message):

lettersOnly=[]for symbol inmessage:if symbol inLETTERS_AND_SPACE:

lettersOnly.append(symbol)return ''.join(lettersOnly)#判断是英文字是通过设定字母和单词所占的比例，即设定阈值来判断的

def isEnglish(message, wordPercentage=20, letterPercentage=85):#By default, 20% of the words must exist in the dictionary file, and

#85% of all the characters in the message must be letters or spaces

#(not punctuation or numbers).

wordsMatch = getEnglishCount(message) * 100 >=wordPercentage

numLetters=len(removeNonLetters(message))

messageLettersPercentage= float(numLetters) / len(message) * 100lettersMatch= messageLettersPercentage >=letterPercentagereturn wordsMatch and lettersMatch

乐姐理财说

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
计算机英文术语乱码,计算机如何区分乱码和英文？

#Detect English module#http://inventwithpython.com/hacking (BSD Licensed)#To use, type this code:#import detectEnglish#detectEnglish.isEnglish(someString) # returns True or False#(There must be a "dic...
复制链接

扫一扫