python识别一段由字母组成的字符串是拼音还是英文单词

最新推荐文章于 2023-04-01 22:28:33 发布

weixin_30483013

最新推荐文章于 2023-04-01 22:28:33 发布

阅读量2.8k

点赞数 1

文章标签： python

原文链接：http://www.cnblogs.com/aloiswei/p/8976596.html

版权

环境：win10 python3.6

先说一下算法思想：

首先建立本地拼音库(不带声调)。使用正向最大匹配将字符串与本地拼音库(这里提供给大家一个)进行匹配。话不多说，见code：

下面是python代码：

def pinyin_or_word(string):
    '''
    judge a string is a pinyin or a english word.
    pinyin_Lib comes from a txt file.
    '''
    string = string.lower()
    stringlen = len(string)
    max_len = 6
    result = []
    n = 0
    while n < stringlen:
        matched = 0
        temp_result = []
        for i in range(max_len, 0, -1):
            s = string[0:i]
            if s in pinyinLib:
                temp_result.append(string[:i])
                matched = i
                break
            if i == 1 and len(temp_result) == 0:
                print("这可能是个英文单词！")
                return []
        result.extend(temp_result)
        string = string[matched:]
        n += matched
    return result 

In [1]: pinyin_or_word("woaizhongguo")
Out[1]: ['wo', 'ai', 'zhong', 'guo']

函数：传参为字符串，输出“拼音识别结果”或者判定英文。

其实这个算法是有缺陷的：

①比如你输入一个英文单词'open'，将返回拼音'o'+'pen'；

②虽说是判断拼音或单词，但是主要应该说是判断拼音，不能严格判断单词，想要精确判断，需添加单词库。

转载于:https://www.cnblogs.com/aloiswei/p/8976596.html