利用哈希表实现英译汉程序(Python版本)

吃掉鹅咩里啃

已于 2023-09-25 19:19:35 修改

阅读量253

点赞数 4

文章标签：数据结构哈希表 python

于 2023-08-28 23:45:28 首次发布

本文链接：https://blog.csdn.net/weixin_64346531/article/details/132549753

版权

一、前言

数据结构课程设计的目的是，深入理解数据结构的基本理论，掌握对数据结构各种操作的算法设计方法，增强对基础知识和基本方法的综合运用能力，增强对算法的理解能力，提高软件设计能力，在实践中培养独立分析问题和解决问题的作风和能力。

二、选择主题

用哈希表（容量为10000）存储四六级英语词汇，实现英译汉功能

三、设计方案

3.1 框架概要设计

3.2 确定功能函数(基于拥有一定的哈希表基础)

3.2.1 初始化哈希表

def __init__(self,m)函数

参数 m	哈希表长度
设计函数体	定义长度为m的空字典作为哈希表，key对应哈希值，value对应其单词word和翻译translation

图示：

3.2.2 插入函数

def insert(self,word,translation)

参数 word	指定插入单词
参数 translation	指定插入翻译
设计函数体	利用除留余数法来将单词的哈希值和单词及其翻译插入到空字典中

图示：

3.3.3 查询函数

def search(self,word)

参数 word

指定查询单词

设计函数体

查询分为三种情况：

1.存在

使用除留余数法查找

2.哈希值存在但并不存在

使用除留余数法查找，直至查找到空为止仍未找到，则表示哈希查询失败，说明它的哈希值虽然存在在字典当中，但是字典并没有录入这个单词。

3.不存在

即字典并没有其对应的哈希值

图示：

3.3 具体算法详细设计（主体在最左侧）

四、代码实现

HashTable.py

"""
设计包解析：
用哈希表（容量为10000）存储四六级英语词汇(词汇量超过5000个），实现英译汉功能
要求：
1.输入一个单词，如果存在，则输出查找的次数，和翻译的结果，否则提示不存在
2.输出整个哈希存储的冲突率
3.需要百度有效的算法求取字符串的哈希值

使用说明：
    1.初始化哈希表，并定义其长度length
        格式为：对象名=HashTable(length)
    2.制定名字为vocab_data.txt的列表，用于存储单词及其翻译
        格式为：
            with open("地址/vocab_data.txt", "r", encoding="utf-8") as file:
            for line in file:
                word, translation = line.strip().split(":")
                对象名.insert(word, translation)
    3.查询单词及其次数
        格式为：
        if hash_table.search('单词') == None:
            print("不存在")
        else:
            search_count, translation = 对象名.search('单词')
            print("查询次数:", search_count)
            print("翻译结果:", translation)
    4.返回冲突率(放置else后面)
        格式为：print(对象名.calculate_conflict_rate())
    5.返回后字符串哈希值(放置else后面)
        格式为：print(对象名.hash_value('单词'))

向vocab_data.txt插入单词注意：
    根据代码':(英文状态)'为分隔符
    插入单词时请务必都是小写并且编写格式务必为：
    单词1:翻译1
    单词2:翻译2
    ....
"""

class HashTable:
    # 构造哈希表
    def __init__(self,m):
        # 创建一个空且长度为10000的哈希表
        self.dictionary = {i:(None,None) for i in range(m)}
        # 定义哈希表长度
        self.m = m

    # 插入单词
    def insert(self, word, translation):

        """
            在python中，提供了内置函数hash，用于获取对象的哈希值，该函数可以用于字符串、数值中
            不能直接应用于list\set\dictionary中
            例如：
            hash(7) % 10000 --> 7(数值的哈希值仍然为数值)
            hash('abnormal') % 10000 --> 5584(字符串的哈希值是映射的地址，当刷新后，哈希值也会被刷新)

            对于任意哈希表中每个元素[key,value]中(k为关键字字符串、数值,m为哈希表长度)中
                        h(key) = key % m
            均可以转换成  h(key) = hash(key) % m
            通俗地说：仅数值能用哈希表 -> 数值 + 字符串均能用哈希表

            for i in ["a","abandon","abnormal","abroad","aboard","absence"]:
                print(hash(i)%6) >>> 3 5 4 2 3(冲突) 2(冲突)
        """

        # 使用python内置函数hash来插入到哈希表 -- 哈希索引值为key; word,translation为value
        # 设置单词的哈希索引值
        d = hash(word) % self.m
        # 如果重复了，使用除留法查询无重复哈希的索引值
        while self.dictionary[d] != (None,None):
            d = (d + 1) % self.m
        self.dictionary[d] = (word, translation)

    # 查找单词
    def search(self,word):
        count = 0
        # 获取单词的哈希索引值
        index = hash(word) % self.m
        # 当查询哈希表时候，若查到该对应索引值下的单词并不是‘插入的单词’
        while self.dictionary.get(index) and self.dictionary[index][0] != word:
            # 如果查到该单词的哈希索引值下的单词为None时，证明哈希表没有该单词
            if self.dictionary[index][0] == None:
                return None
            # 那么查询次数+1
            count += 1
            # 继续查找并更新索引值
            index = (index + 1) % self.m
        # 若查找成功
        if self.dictionary.get(index):
            # 获取制定索引值对应的值translation
            translation = self.dictionary[index][1]
            return count, translation
        # 如果单词对应的哈希索引值所对应的value不存在，那么表示不存在
        if not self.dictionary.get(index):
            return None

    # 计算碰撞率
    def calculate_conflict_rate(self):
        # 查找哈希表现存元素
        n = 0
        for i in range(self.m):
            if self.dictionary[i][0] != None:
                n = n + 1
        # 冲突率计算
        conflict_rate = f'{(n / self.m) * 100}%'
        return conflict_rate

    # 查看哈希表 -- 低配置电脑谨慎
    def display(self):
        return self.dictionary

    # 返回字符串的哈希值
    def hash_value(self,word):
        return hash(word) % self.m

if __name__ == "__main__":
    # 创建哈希表
    hash_table = HashTable(12)
    # 向哈希表中插入单词和翻译
    with open("vocab_data.txt", "r", encoding="utf-8") as file:
        for line in file:
            # 抽取单词表文件中单词和翻译词
            word, translation = line.strip().split(":")
            hash_table.insert(word, translation)
    # 搜索单词并输出结果
    if hash_table.search('abandon') == None:
        print("不存在")
    else:
        search_count, translation = hash_table.search('abandon')
        print("查询次数:", search_count)
        print("翻译结果:", translation)
        # 计算冲突率并输出结果
        print("冲突率:", hash_table.calculate_conflict_rate())
        # 返回单词哈希值
        print("哈希值:", hash_table.hash_value('abandon'))
        # 返回哈希表
        print("哈希值:", hash_table.display())

附：vocab_data.txt格式

a:一个、一双(art)
abandon:放弃(v)、放纵(n)
ability:能力(n)
able:能够的(adj)
abnormal:不正常的(adj)
about:关于(prep)、大约(adv)
above:在...上面(prep)、在上面(adv)
abroad:在国外、到国外(adv)
absence:缺席、缺乏(n)
absent:缺席的(adj)

...(截取前十个)