python 对英文句子分词，去重，排序

最新推荐文章于 2023-05-12 10:24:21 发布

Mei憨憨

最新推荐文章于 2023-05-12 10:24:21 发布

阅读量2.3k

点赞数 1

文章标签： python

本文链接：https://blog.csdn.net/a3213383291/article/details/121451244

版权

题目：给定如下一段英文

A major drawback of cross-network recommender solutions is that they can only be applied to users that are overlapped across networks. Thus, the non-overlapped users, which form the majority of users are ignored. As a solution, we propose CnGAN, a novel multi-task learning based recommend architecture。
编写一个函数，要求实现以下功能：1）统计有多少个不同的单词；2）根据每个单词ASCII码值的和（单词they ASCII码值的和是：116+104+101+121=442）对单词进行从小到大的排序，重复出现的单词只算一次的和，按行输出单词及对应的和。

思路分析：

对英文句子，自定义一个分词函数，将句子切分成为单词，并且写入list，list转换为set进行去重，定义Sum函数对list中的单词利用函数order（）进行求出ASCII码值，写入列表m ,再定义Sort函数将 list与m 连接成为字典函数，再对字典利用可以值进行排序。

实验步骤：

1 分词函数Qie§

def Qie(p):
    words = []  # 建立一个空列表
    index = 0   # 遍历所有的字符
    start = 0   # 记录每个单词的开始位置
    while index < len(p):   # 当index小于p的长度
        start = index       # start来记录位置
        while p[index] != " " and p[index] not in[".", ","]:   # 若不是空格，点号，逗号
            index += 1   # index加一
            if index == len(p):  # 若遍历完成
              break   # 结束
        words.append(p[start:index])
        if index == len(p):
          break
        # 判断如果是空格，或者是 . ,都要加上一，但是不写入words
        while p[index] == " " or p[index] in [".", ","]:
            index += 1
            if index ==len(p):
                break
     # 将list转换为set利用set的性质 进行去重
    new_li=list(set(words))
    # 保持原有的顺序，可有可没有，
    new_li.sort(key=words.index)
    return new_li
    # # list长度就是,单词的个数
    # m=len(new_li)
    # return m

2.Sum计数函数：

# 求出每个单词的ASCII码值
def Sum(new_li):
    m=[]
    sum=0
    # 双层for循环，对list中的每个单词求出ASCII码值
    for j in range(len(new_li)):
        for i in new_li[j]:
            sum=sum+ord(i)
        m.append(sum)
        sum=0
    return m

3 Sort排序函数

# 安照value的值进行从小到大的排序输出
def Sort(new_li,m):
    # 将list 与 m 进行一对一联合转换为字典排序
    d = zip(new_li,m)
    lm=dict(d)
    lh =sorted(lm.items(), key=lambda item:item[1])
    for i in range(len(lh)):
        print(lh[i], '\n', end="")

4主函数调用

str=p = "A major drawback of cross-network recommender solutions is that they can only be applied to users that are overlapped across networks. Thus, the non-overlapped users, which form the majority of users are " \
        "ignored. As a solution, we propose CnGAN, a novel multi-task learning based recommend architecture."
lis=Qie(p)

l=len(lis)
print("单词数量",l)
m=Sum(lis)
print("处理完成的结果：")
Sort(lis,m)