巧用数据结构

最新推荐文章于 2023-08-15 13:13:17 发布

momo大魔王

最新推荐文章于 2023-08-15 13:13:17 发布

阅读量123

点赞数

本文链接：https://blog.csdn.net/weixin_38087754/article/details/97791227

版权

本文深入探讨了数据结构中的队列、栈、哈希表和堆，详细解释了它们的实现、应用及在特定问题中的解决方案。例如，队列与BFS的关系，栈在DFS中的作用，单调栈解决最大矩形面积问题，哈希表的快速插入和查找特性及其在解决重复元素问题中的应用，以及堆在找Top K频繁元素和在线计算中位数问题中的核心作用。文章还讨论了如何设计高效的数据结构，如LRU缓存和寻找最长公共子序列问题。

摘要由CSDN通过智能技术生成

一队列

实现：循环数组，链表

应用：与BFS紧密相连

见图与搜索

二栈（单向操作）

实现：普通数组，链表

应用：DFS

见图与搜索

155. Min Stack

Design a stack that supports push, pop, top, and retrieving the minimum element in constant time.

push(x) -- Push element x onto stack.
pop() -- Removes the element on top of the stack.
top() -- Get the top element.
getMin() -- Retrieve the minimum element in the stack.

考点：如果Min恰好在top上，pop掉Min后如何得到第二个Min？之后Min恰好又在top上，就需要得到第三个Min？

自然需要额外的空间去记录这些信息。所以用一个二维元组去记录当前stack下最小的值。

class MinStack:

    def __init__(self):
        self.stack = []

    def push(self, x):
        if not self.stack:
            self.stack.append((x, x))
        else:
            self.stack.append((x, min(x, self.stack[-1][-1])))

    def pop(self):
        self.stack.pop()

    def top(self):
        return self.stack[-1][0]

    def getMin(self):
        return self.stack[-1][1]

84. Largest Rectangle in Histogram

Given n non-negative integers representing the histogram's bar height where the width of each bar is 1, find the area of largest rectangle in the histogram.

符合求最大，而且是序列不可改动，为什么不用DP？因为暴力解法就是O(n^2)，DP无法进行优化

根据木桶原理，要去找到每个木桶最短的边，以圆柱高度5为例，需要找到左边第一个比5小的数，和右边第一个比5小的数。

边界处理，左右边界各加一个0作为第一个木桶和最后一个木桶的边。

问题就转化为寻找两边第一个更小值。

引入一个单调栈的概念（没见过记住就好了），定义每次push一个值x之前，pop出所有比x小（或者大）的值。

性质：栈内的值永远是单调递增或者递减的。

但这里利用的是其pop和push的过程，因为每次push x到栈中之后，栈中的上一个值就是左边第一个比x小的值。

而在push x之前，pop出的第一个数y，x就是y右边第一个比y小的数。

这样单调栈就可以在O(n)内可以得到每个值左右第一个比他的小的值。

关键：

每次pop出的数可以同时得到左右两边第一个比他小的数，所以每次pop就可以得到一个体积，通过比较求出全局最大。

class Solution:
    def largestRectangleArea(self, heights: List[int]) -> int:
        heights = [0] + heights + [0]
        mono_stack = []
        res = 0
        
        for i, val in enumerate(heights):
            if not mono_stack:
                mono_stack.append(i)
            else:
                while True:
                    large_index = mono_stack[-1]
                    if val >= heights[large_index]:
                        mono_stack.append(i)
                        break
                    else:
                        height_index = mono_stack.pop()
                        res = max(res, 
                                  (i - mono_stack[-1] - 1) *  heights[height_index])
        
        return res

85. Maximal Rectangle

Given a 2D binary matrix filled with 0's and 1's, find the largest rectangle containing only 1's and return its area.

Input:
[
  ["1","0","1","0","0"],
  ["1","0","1","1","1"],
  ["1","1","1","1","1"],
  ["1","0","0","1","0"]
]
Output: 6

Largest Rectangle in Histogram的变形，以每一行为底，得到每层的高度，然后传入到largestRectangleArea中得到每行的最大面积。

class Solution:
    def maximalRectangle(self, matrix: List[List[str]]) -> int:
      
        def largestRectangleArea(heights):
            heights = [0] + heights + [0]
            mono_stack = []
            res = 0

            for i, val in enumerate(heights):
                if not mono_stack:
                    mono_stack.append(i)
                else:
                    while True:
                        large_index = mono_stack[-1]
                        if val >= heights[large_index]:
                            mono_stack.append(i)
                            break
                        else:
                            height_index = mono_stack.pop()
                            res = max(res, 
                                      (i - mono_stack[-1] - 1) *  heights[height_index])
            return res
        
        
        if not matrix:
            return 0
        col = len(matrix[0])
        row = len(matrix)
        heights = [[0 for j in range(col)] for i in range(row)]
        heights[0] = list(map(lambda x: int(x), matrix[0]))

        for i in range(1, row):
            for j in range(col):
                if matrix[i][j] == '0':
                    heights[i][j] = 0
                else:
                    heights[i][j] = heights[i - 1][j] + 1
        
        res = 0
        for height in heights:
            res = max(res, largestRectangleArea(height))
        
        return res

三哈希表

就是为了快速的插入和查找，时间复杂度：O(1) insert, O(1)find, O(1) Delete，当字符串作为key的时候，一般都是O(L), L为字符串平均长度。

hash table和hash map的主要区别是hash table是线程安全的（有锁），hash map是线程不安全的

hash set是无key的哈希表

哈希表就是通过哈希方程的计算，将key映射到编号为index: [0, capcity -1]的bucket中，当bucket中有多个值时候就会引发冲突。哈希表的设计就是要权衡bucket的大小和key的多少的关系。一般希望哈希方程得到index是无规律的杂乱无章的，不容易引起冲突。key space是远小于整个值域的，不然会引发冲突，效果不好。

以string为key举例:

def hashfunc(string) -> int:
    sum = 0
    for i in string:
        sum = 31 * i + int(i)  # 31 是经验值，效果比较哈
        sum = sum % hash_capacity  # 防止溢出，不断取余
    return sum

如果hash function可以做到双射，那么就是完美的，可惜冲突是无可避免的，那么如何解决哈希冲突？

主要有open hashing和 closed hashing。当需要rehashing的时候，情况比较糟糕，需要重新遍历，再设计哈希方程。

Design HashMap

class ListNode:
    def __init__(self, key, val):
        self.pair = (key, val)
        self.next = None

class MyHashMap:

    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.m = 1000;
        self.h = [None]*self.m
        

    def put(self, key, value):
        """
        value will always be non-negative.
        :type key: int
        :type value: int
        :rtype: void
        """
        index = key % self.m
        if self.h[index] == None:
            self.h[index] = ListNode(key, value)
        else:
            cur = self.h[index]
            while True:
                if cur.pair[0] == key:
                    cur.pair = (key, value) #update
                    return
                if cur.next == None: break
                cur = cur.next
            cur.next = ListNode(key, value)
        

    def get(self, key):
        """
        Returns the value to which the specified key is mapped, or -1 if this map contains no mapping for the key
        :type key: int
        :rtype: int
        """
        index = key % self.m
        cur = self.h[index]
        while cur:
            if cur.pair[0] == key:
                return cur.pair[1]
            else:
                cur = cur.next
        return -1
            
        

    def remove(self, key):
        """
        Removes the mapping of the specified value key if this map contains a mapping for the key
        :type key: int
        :rtype: void
        """
        index = key % self.m
        cur = prev = self.h[index]
        if not cur: return
        if cur.pair[0] == key:
            self.h[index] = cur.next
        else:
            cur = cur.next
            while cur:
                if cur.pair[0] == key:
                    prev.next = cur.next
                    break
                else:
                    cur, prev = cur.next, prev.next

复杂度分析

如果有M个key，空间复杂度就是O(M)。

时间复杂度与设计息息相关，大多数时候用array去存储一个bucket中的值，如果太多的value在一个bucket会有一个搜索树去存储。理想的情况下bucket中的值是constant，insertion和search都是O(1)。但在最坏的情况下，insertion是O(1)，但是search是O(N)的。

HashSet的应用（找duplicate）

Single Number

Given a non-empty array of integers, every element appears twice except for one. Find that single one.

follow up：如果线性时间内，没有额外空间解决问题？

利用异或 A XOR A = 0

class Solution:
    def singleNumber(self, nums: List[int]) -> int:
#         hash_set = set()
#         for num in nums:
#             if num not in hash_set:
#                 hash_set.add(num)
#             else:
#                 hash_set.remove(num)
                
#         return list(hash_set)[0]

        res = 0
        for num in nums:
            res ^= num
        return res

264. Ugly Number II

Write a program to find the n-th ugly number.

Ugly numbers are positive numbers whose prime factors only include 2, 3, 5.

需要处理duplicate，如何在O(n)的时间内解决。

hash_set是无序的，不能忽略他的无序性，因为忽略了set的无序性，用res[0]去取结果出错。每次取出res中的最小值，然后乘以[2, 3, 5]放到res中继续循环。

class Solution:
    def nthUglyNumber(self, n):
        res = set([1])
        factor = [2, 3, 5]
        for i in range(n - 1):
            umin = min(res)
            res.remove(umin)
            for f in factor:
                res.add(umin * f)

        return min(res)
    
        # ugly = [1]
        # i2, i3, i5 = 0, 0, 0
        # while n > 1:
        #     u2, u3, u5 = 2 * ugly[i2], 3 * ugly[i3], 5 * ugly[i5]
        #     umin = min(u2, u3, u5)
        #     if umin == u2:
        #         i2 += 1
        #     if umin == u3:
        #         i3 += 1
        #     if umin == u5:
        #         i5 += 1
        #     ugly.append(umin)
        #     n -= 1
        # return ugly

Hash_table的应用（需要存储更多的信息）

Isomorphic Strings

Two strings are isomorphic if the characters in s can be replaced to get t.

class Solution:
    def isIsomorphic(self, s: str, t: str) -> bool:
        # 出现的次数相等即可
        # Counter只能记录次数，本题还要求次序要一致，必须要记录相应的位置(a, b, a)和(a, a, b)
        # c1 = collections.Counter(s)
        # c2 = collections.Counter(t)
        # return sorted(c1.values()) == sorted(c2.values())
    

        # d1, d2 = {}, {}
        # for i, val in enumerate(s):
        #     d1[val] = d1.get(val, []) + [i]
        # for i, val in enumerate(t):
        #     d2[val] = d2.get(val, []) + [i]
        # return sorted(d1.values()) == sorted(d2.values())
        
        # 每个序列对是唯一的。比如egg和add， e永远只会匹配a，g永远只会匹配d
        # 所以匹配的列表长度 = 被匹配的列表长度 = pair的长度  (a b b) (c d d) ((ac), (bd), (bd))
        return len(set(zip(s, t))) == len(set(s)) == len(set(t))
        
        # return map(s.find, s) == map(t.find, t)

LRU Cache

Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations: get and put.

设计数据结构先考虑需要什么操作，put插入一个pair的时候，如果存在则需要将其位置提到最近，如果capacity满的话需要删除最远的pair，所以需要两端操作，并且可以随意对中间数据进行插入。要进行频繁的插入需要链表的支持，而且要记录每个结点的位置就需要hash表的支持。为了可以在O(1)的时间进行插入和删除，所以需要知道前后结点是什么。因此使用双向链表。

由此确定需要hash table + double linked list

整个过程仅仅需要，删除中间结点和将结点插入末尾，单独定义remove和add函数。

原本没有在结点中记录key的值，但是在需要删除哈希表中的值时，无从下手。因此加上了key的值。

class node:
    
    def __init__(self, key, val):
        self.val = val
        self.pre = None
        self.next = None
        # 用于删除哈希表中的项
        self.key = key


class LRUCache:

    def __init__(self, capacity: int):
        self.hash = {}
        self.capacity = capacity
        self.head = node(-1, -1)
        self.tail = node(-1, -1)
        self.head.next = self.tail
        self.tail.prev = self.head

    def get(self, key: int) -> int:
        if key in self.hash:
            n = self.hash[key]
            self._remove(n)
            self._add(n)
            return n.val
        else:
            return -1

    def put(self, key: int, value: int) -> None:
        if key in self.hash:
            self._remove(self.hash[key])

            
        n = node(key, value)
        self._add(n)
        self.hash[key] = n
                
        if len(self.hash) > self.capacity:
            n = self.head.next
            self._remove(n)
            self.hash.pop(n.key)
        
        # head = self.head
        # while head:
        #     print(head.key, head.val)
        #     head = head.next
        # print('-----------')   
        
        
        
            
    def _remove(self, node):
        prev = node.prev
        _next = node.next
        
        prev.next = _next
        _next.prev = prev
                
    def _add(self, node):
        p = self.tail.prev
        p.next = node
        node.prev = p
        node.next = self.tail
        self.tail.prev = node

四堆

堆的实现与分析

又想知道最小值，又想支持修改和删除，使用TreeMap，python中就是用堆去实现。

347. Top K Frequent Elements

692. Top K Frequent Words

Given a non-empty list of words, return the k most frequent elements.

Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first.

堆的经典应用：需要online的排序，找top k大的数

题目要求同频率的按照字母表顺序排序，所以需要重新定义大小比较关系，因此定义类，重新定义魔术函数__lt__和__eq__

class Element:
    def __init__(self, freq, word):
        self.freq = freq
        self.word = word
    
    def __lt__(self, other):
        if self.freq == other.freq:
            return self.word > other.word
        return self.freq < other.freq
    
    def __eq__(self, other):
        return self.freq == other.freq and self.word == other.word


class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        c = collections.Counter(words)
        heap = []
        for word, freq in c.items():
            heapq.heappush(heap, Element(freq, word))
            if len(heap) > k:
                heapq.heappop(heap)
        
        res = []
        for _ in range(k):
            res.append(heapq.heappop(heap).word)
        
        return res[::-1]

295. Find Median from Data Stream

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

For example,

[2,3,4], the median is 3

[2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

void addNum(int num) - Add a integer number from the data stream to the data structure.
double findMedian() - Return the median of all elements so far.

class MedianFinder:

    def __init__(self):
        """
        initialize your data structure here.
        """
        self.small = []
        self.large = []
        self.mid = None

    def addNum(self, num: int) -> None:
        # 加入的时候进行排序即可，online更新数组并保证顺序的，非堆莫属了。
        
        # 如果是mid，那么一个最小堆存大于mid的值，一个最大堆存小于mid的值，这样比较好更新mid
        
        # small = self.small
        # large = self.large
        # mid = self.mid
        
        # print(id(small), id(self.small))
        
        if not self.mid:
            # 这里不能写成mid = num, 必须写成self.mid = num， left和right是引用。而mid是新对象,因为None是不可变的
            self.mid = num
            # print(id(mid), id(self.mid))
            return
        
        if num <= self.mid:
            # heapq 默认是最小堆，使用最大堆的时候去反即可
            heapq.heappush(self.small, -num)
        else:
            heapq.heappush(self.large, num)
            
        
        # 调整堆的大小和mid 使得两个堆大小相等
        if len(self.small) > len(self.large) + 1:
            smaller = - heapq.heappop(self.small)
            heapq.heappush(self.large, self.mid)
            self.mid = smaller
        if len(self.large) > len(self.small) + 1:
            larger = heapq.heappop(self.large)
            heapq.heappush(self.small, -self.mid)
            self.mid = larger
            
        # print(self.small, self.large)


    def findMedian(self) -> float:
        small, large = self.small, self.large
        
        if len(small) == len(large):
            return self.mid
        elif len(small) > len(large):
            return (self.mid - small[0]) / 2.0
        if len(small) < len(large):
            return (self.mid + large[0]) / 2.0

23. Merge k Sorted Lists

Merge k sorted linked lists and return it as one sorted list. Analyze and describe its complexity.

Example:

Input:
[
  1->4->5,
  1->3->4,
  2->6
]
Output: 1->1->2->3->4->4->5->6

如果使用heap则和上述问题几无区别，唯一的难点在于heap的比较，比如插入了[node.val, node]则无法运行，heap会去比较前两位（如果第一位相等比较第二位），所以要么重新定义魔术函数，要么在第二位随意插入数字。

# class element:
#     def __init__(self, x, node):
#         self.val = x
#         self.node = node
        
#     def __lt__(self, other):
#         return self.val < other.val
    
#     def __eq__(self, other):
#         return self.val == other.val
    


# class Solution:
#     def mergeKLists(self, lists: List[ListNode]) -> ListNode:  
#         res = ListNode(-1)
#         dummy = res
        
#         l = len(lists)
#         heap = []
#         for head in lists:
#             if head:
#                 heapq.heappush(heap, element(head.val, head))
        
#         while heap:
#             umin = heapq.heappop(heap).node
#             res.next = umin
#             res = res.next
#             if umin.next:
#                 heapq.heappush(heap, element(umin.next.val, umin.next))
        
#         return dummy.next


class Solution:
    def mergeKLists(self, lists: List[ListNode]) -> ListNode:
        result = ListNode(-1)
        min_heap = list()
        head = result
        for n in lists:
            if n:
                heapq.heappush(min_heap, (n.val, random.random(), n))
        while min_heap:
            _, _, n = heapq.heappop(min_heap)
            head.next = n
            if n.next:
                heapq.heappush(min_heap, (n.next.val, random.random(), n.next))
            head = head.next
        return result.next

momo大魔王

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
巧用数据结构

一队列实现：循环数组，链表应用：与BFS紧密相连见图与搜索二栈（单向操作）实现：普通数组，链表应用：DFS见图与搜索155.Min StackDesign a stack that supports push, pop, top, and retrieving the minimum element in constant time.push(x) -...
复制链接

扫一扫