周报（哈希表，numpy，堆排序，enumerate )

最新推荐文章于 2024-10-01 22:51:02 发布

日富一日 yyt

最新推荐文章于 2024-10-01 22:51:02 发布

阅读量992

点赞数 31

文章标签：散列表数据结构

本文链接：https://blog.csdn.net/2301_80396365/article/details/138027814

版权

本文介绍了哈希表的基本概念、哈希函数特点、冲突解决的链表法，以及numpy中的数组创建、操作方法，如full、ones、randint、堆排序和内置函数enumerate的用法。

摘要由CSDN通过智能技术生成

一.哈希表（基础知识）

概念

哈希函数：一个把查找表中的关键字映射成该关键字对应的地址的函数，即为Hash(key)=Addr。
哈希表：是根据关键字而直接进行访问的数据结构。也就是说，哈希表建立了关键字和存储地址之间的一种直接映射关系。

哈希函数的特点

快速查找：哈希表使用哈希函数将键映射到存储位置，因此可以在常数时间内（O(1)）查找、插入和删除元素。
动态扩容：哈希表的大小是可变的，当元素数量增加时，哈希表会自动扩容以保持较低的装载因子，从而保证性能。
无序性：哈希表中的元素是无序存储的，即元素的顺序与插入顺序无关。
键唯一性：哈希表中的键是唯一的，如果插入相同的键，则后面的值会覆盖前面的值。

哈希冲突

链表法

上面所述的冲突解决方案的核心思想是，当冲突发生后，在哈希表中再查找一个有效空位置。

这种方案的优势是不会产生额外的存储空间，但易产生数据聚集，会让数据的存储不均衡，并且会违背初衷，通过关键字计算出来的哈希值并不能准确描述数据正确位置。

链表法应该是所有解决哈希冲突中较完美的方案。所谓链表法，指当发生哈希冲突后，以冲突位置为首结点构建一条链表，以链表方式保存所有发生冲突的数据。如下图所示：

链表方案解决冲突，无论在存储、查询、删除时都不会影响其它数据位置的独立性和唯一性，且因链表的操作速度较快，对于哈希表的整体性能都有较好改善。

使用链表法时，哈希表中保存的是链表的首结点。首结点可以保存数据也可以不保存数据。

编码实现链表法：链表实现需要定义 2 个类，1 个是结点类，1 个是哈希类。


'''
结点类
'''
class HashNode():
    def __init__(self, value):
        self.value = value
        self.next_node = None

'''
哈希类
'''
class HashTable():
    def __init__(self):
        # 哈希表,初始大小为 15
        self.table =  [None\]  * 15
        # 实际数据大小
        self.size = 0


    def put(self, key, value):
        hash_val = self.hash_code(key)
        # 新结点
        new_node = HashNode(value)
        if self.table [hash _val] is None:
            # 首结点保存数据方案
            self.table[hash _val] = new_node
            self.size+=1
        else:
            move = self.table[hash_val]
            while move.next_node is not None:
                move = move.next_node
            move.next_node = new_node
            self.size+=1

    '''
    查询数据
    '''
    def get(self, key):
        hash_val = self.hash_code(key)
        if self.table[hash_val] is None:
            # 数据不存在
            return -1

        if self.table[hash_val].value == key:
            # 首结点就是要找的数据
            return self.table[hash_val].value

        # 移动指针
        move = self.table[hash_val\].next_node
        while move.value != key and move is not None:
            move = move.next_node
        if move is None:
            return -1
        else:
            return move.value

    def hash\_code(self, key):
        # 这里仅为说明问题，13 的选择是固定的
        hash_val = key % 13
        return hash_val


# 原始数据
src_nums = [25, 78, 56, 32, 88, 26, 39, 82, 14]
# 哈希对象
hash_table = HashTable()
# 把数据添加到哈希表中
for n in src_nums:
    hash_table.put(n, n)
# 输出哈希表中的首结点数据
for i in hash_table.table:
    if i is not None:
        print(i.value,end=" ")
print(hash\_table.get(26))
'''
输出结果：
78 14 56 32 88 25 
26
'''

常见哈希算法

折叠法：将关键字分割成位数相同的几个部分（最后一部分的位数可以不同）然后取这几部分的叠加和（舍去进位）作为哈希值。

折叠法又分移位叠加和间界叠加。

移位叠加：将分割后的每一部分的最低位对齐，然后相加。
间界叠加：从一端沿分割线来回折叠，然后对齐相加。

'''
移位叠加哈希算法
'''
def hash\_code(key, hash\_table\_size):
    # 转换成字符串
    key\_s = str(key)
    # 保存求和结果
    s = 0
    # 使用切片
    for i in range(0, len(key\_s), 3):
        s += int(key\_s\[i:i + 3\])
    return s % hash\_table\_size

# 商品信息
products = \[\[20201011, 400.00\], \[19981112, 300\], \[20221212, 200\]\]
# 哈希表长度
hash\_size = 10
# 哈希表
hash\_table = \[None\] \* hash\_size
# 以哈希表方式进行存储
for p in products:
    key = hash\_code(p\[0\], hash\_size)
    hash\_table\[key\] = p\[1\]
# 显示哈希表中的数据
print("哈希表中的数据：",hash\_table)
# 根据订单号进行查询
hash\_val = hash\_code(19981112, hash\_size)
val = hash\_table\[hash\_val\]
print("订单号为{0}的金额为{1}".format(19981112, val))
'''
输出结果
哈希表中的数据： \[None, None, 300, 400.0, None, None, 200, None, None, None\]
订单号为19981112的金额为300
'''

'''
间界叠加哈希算法
'''
def hash\_code(key, hash\_table\_size):
    # 转换成字符串
    key\_s = str(key)
    # 保存求和结果
    s = 0
    # 使用切片
    for i in range(0, len(key\_s), 3):
        # 切片
        tmp\_s = key\_s\[i:i + 3\]
        # 反转
        if i % 2 != 0:
            tmp\_s = tmp\_s\[::-1\]
        s += int(tmp\_s)
    return s % hash\_table\_size

# 商品信息（数据样例）
products = \[\[20201011, 400.00\], \[19981112, 300\], \[20221212, 200\]\]
# 哈希表长度
hash\_size = 10
# 哈希表
hash\_table = \[None\] \* hash\_size
# 以哈希表方式进行存储
for p in products:
    key = hash\_code(p\[0\], hash\_size)
    hash\_table\[key\] = p\[1\]
# 显示哈希表中的数据
print("哈希表中的数据：", hash\_table)
# 根据订单号进行查询
hash\_val = hash\_code(19981112, hash\_size)
val = hash\_table\[hash\_val\]
print("订单号为{0}的金额为{1}".format(19981112, val))
'''
输出结果：
哈希表中的数据： \[None, None, None, 400.0, None, None, 200, None, None, 300\]
订单号为19981112的金额为300
'''

二.numpy的基础知识

创建数组

full 函数用法：numpy.full(shape, fill_value, dtype=None)，shape 是数组形状（可以理解为几维几列，一般以元组形式给出），fill_value 是填充的值，dtype 是类型，作用是创建一个值全为同一个的数组。

# 创建一个二维三列的数组
n2 = np.full(shape=(2,3),fill_value=1,dtype=np.int16)
n2

ones 函数用法：np.ones(shape, dtype=None, order=‘C’)，返回一个指定形状和类型的用1填充的数组。

n4 = np.ones((3,2),dtype=np.float)

randint 函数用法：randint(low, high=None, size=None, dtype=‘l’)，low 为随机数最小值，high 为最大值（取不到），size 在这里等价于 shape ，返回一个指定维度且元素位于 low~high 之间的随机数组。

n5 = np.random.randint(1,10,(3,4))

eye()函数用法：np.eye(N, M=None, k=0, dtype=<class ‘float’>, order=‘C’)，N 是行数，M 是列数，K 是偏移量，创建一个单位矩阵数组。

linspace()函数用法：np.linspace(start,stop,num)，创建一个包含 num 个数的等差数列，公差 d 等于多少由系统计算。

arrange()函数用法：arange([start,] stop[, step,], dtype=None)，start 为起始数，stop 为结束数，创建一个连续的一维数组。

拆分

split 函数用法：np.split(ary, indices_or_sections, axis=0)，ary：要切分的数组，indices_or_sections：填入一个整数或者一个可迭代对象，如果是整数，就切分为整数个子数组；如果是可迭代对象，就在该索引位置切分。

n3 = n3.reshape((6,2))
display(n3) # 1.查看 n3 数组
n5 = np.split(n3,2,axis=0) # 横向切分，均分为2部分
n6 = np.split(n3,1,axis=1) # 纵向切分，均分为1部分
n7 = np.split(n3,[1,3],axis=0) # 在索引为 0 和 3 处切分，切分为3部分

读写二进制文件

读
- load()
写
- save('文件名.npy',arr)
- savez('文件名.npz',arr) 多个数组的保存

a = np.arange(1,13).reshape(3,4)
np.save('arr.npy',a)
c = np.load('arr.npy')

读写文本文件

loadtxt()
savetxt()
genfromtxt() 参数 delimiter 数据以什么分隔

排序

直接排序
- sort()
- axis = 1 表示行， 0 表示列
- 一维中不存在axis=1，只有axis=0
间接排序
- argsort
- lexsort

重复数据与去重

np.unique(去重数组)
np.title(A,resp) ===> A 表示要重复的数组，resps表示重复次数
np.repeat(a,resp,axis=0) ===> axis=0 按行重复

常用统计函数

np.sum(arr) axis=1 行求和，0 列求和。none 所有元素求和。
np.mean() # 均值
np.var() # 方差
np.std() # 标准差
np.median() # 中位数中值

三.堆排序

class Heap:
    def __init__(self):
        self.heap = []

    def parent(self, i):
        return (i - 1) // 2

    def left_child(self, i):
        return 2 * i + 1

    def right_child(self, i):
        return 2 * i + 2

    def insert(self, value):
        # 添加元素到堆的末尾
        self.heap.append(value)
        current_index = len(self.heap) - 1

        # 调整堆，保持堆的性质
        while current_index != 0 and self.heap[self.parent(current_index)] < self.heap[current_index]:
            self.heap[current_index], self.heap[self.parent(current_index)] = self.heap[self.parent(current_index)], self.heap[current_index]
            current_index = self.parent(current_index)

    def max_heapify(self, i):
        # 维护最大堆性质的函数
        left = self.left_child(i)
        right = self.right_child(i)
        largest = i

        if left < len(self.heap) and self.heap[left] > self.heap[largest]:
            largest = left

        if right < len(self.heap) and self.heap[right] > self.heap[largest]:
            largest = right

        if largest != i:
            self.heap[i], self.heap[largest] = self.heap[largest], self.heap[i]
            self.max_heapify(largest)

    def build_max_heap(self, arr):
        # 构建最大堆
        self.heap = arr
        for i in range(len(arr) // 2 - 1, -1, -1):
            self.max_heapify(i)

    def extract_max(self):
        # 从堆中提取最大值
        if len(self.heap) == 0:
            return None

        max_value = self.heap[0]
        self.heap[0] = self.heap[-1]
        del self.heap[-1]
        self.max_heapify(0)
        return max_value

    def get_max(self):
        # 获取堆中的最大值
        if len(self.heap) > 0:
            return self.heap[0]
        return None

    def heap_sort(self, arr):
        # 堆排序
        self.build_max_heap(arr)
        sorted_arr = []
        while len(self.heap) > 0:
            sorted_arr.append(self.extract_max())
        return sorted_arr


# 示例用法
arr = [12, 11, 13, 5, 6, 7]
heap = Heap()
sorted_arr = heap.heap_sort(arr)
print("Sorted array is:", sorted_arr)

class Heap:: 定义了一个名为Heap的类，表示堆数据结构。
def __init__(self):: 初始化函数，创建一个空堆。
def parent(self, i):, def left_child(self, i):, def right_child(self, i):: 这些函数用于计算堆中元素的父节点和左右子节点的索引。
def insert(self, value):: 插入元素到堆中的函数。将新元素添加到堆的末尾，然后通过不断地与其父节点比较并交换，将新元素上移，直到满足堆的性质。
def max_heapify(self, i):: 维护最大堆性质的函数。如果以索引i为根节点的子树不满足最大堆的性质，就通过将该节点与其子节点中较大的节点交换，然后递归调用max_heapify函数来修复。
def build_max_heap(self, arr):: 构建最大堆的函数。从数组的中间开始，对每个非叶子节点调用max_heapify函数，构建最大堆。
def extract_max(self):: 从堆中提取最大值的函数。将堆顶元素与最后一个元素交换，然后删除最后一个元素，再调用max_heapify函数维护最大堆性质。
def get_max(self):: 获取堆中的最大值。
def heap_sort(self, arr):: 堆排序函数。首先构建最大堆，然后反复提取最大值并将其添加到排序数组中，直到堆为空，最后返回排序后的数组。
示例用法：创建一个堆实例，对给定数组进行堆排序，并打印排序后的数组。

四.内置函数

enumerate 是 Python 中一个常用的内置函数，用于在迭代过程中同时获取元素的索引和值。它的用法非常简单，接受一个可迭代对象作为参数，返回一个由索引和对应值组成的迭代器。

下面是 enumerate 函数的详细用法：

pythonCopy Code

enumerate(iterable, start=0)

iterable：必选参数，表示要进行枚举的可迭代对象，如列表、元组、字符串等。

start：可选参数，表示起始索引，默认为 0。如果指定了 start 参数，枚举的索引将从指定的值开始计数。

使用 enumerate 函数的常见场景是在循环遍历列表、元组等序列类型数据时，同时获取元素的索引和值。下面是一个简单的示例：

pythonCopy Code
colors = ['red', 'green', 'blue']

for index, color in enumerate(colors):
    print(f"Index: {index}, Color: {color}")
#输出结果：
Copy Code
Index: 0, Color: red
Index: 1, Color: green
Index: 2, Color: blue
pythonCopy Code
colors = ['red', 'green', 'blue']

for index, color in enumerate(colors, start=1):
    print(f"Index: {index}, Color: {color}")
#输出结果：
Copy Code
Index: 1, Color: red
Index: 2, Color: green
Index: 3, Color: blue
这样，索引就从 1 开始计数了。