Basic Algorithm Implements in Python3

置顶 KpLn_HJL

已于 2024-02-27 06:03:29 修改

阅读量1k

点赞数 18

分类专栏：学习笔记文章标签： Python Algorithm 算法

于 2023-12-13 23:59:41 首次发布

本文链接：https://blog.csdn.net/sinat_41679123/article/details/134973482

版权

学习笔记专栏收录该内容

16 篇文章 0 订阅

订阅专栏

Common algorithms implemented in python3.

List/array

Operation	Time complexity
transform to set	$o (n)$
in	$o (n)$
len	$o (1)$

Stack

Property: Last in, first out.
Reverse Polish Notation/RPN.

Monotonic stack

Queue

Linked List

最大重叠区间个数

题目：给多个区间，求这些区间中重叠的最大个数。如输入[[1, 2], [1, 5], [3, 4]]，应输出2
核心思想：把所有区间放在一起，经过1个开头，就cnt += 1，经过1个结尾，就cnt -= 1
复杂度：设共有n个区间，时间复杂度是 $o (n)$ ，空间复杂度是 $o (1)$

代码

def max_overlap_cnt(intervals: list) -> int:
    interval_indexs = []
    for each_interval in intervals:
        interval_indexs.append((each_interval[0], 1))
        interval_indexs.append((each_interval[1], -1))
    interval_indexs.sort()	# sort时，如果item[0]相同，则会根据item[1]排序
    max_cnt, cnt = 0, 0
    for index_pair in interval_indexs:
        cnt += index_pair[1]
        max_cnt = max(max_cnt, cnt)
    return max_cnt

String

Cantor’s Diagonal

How to get a string that does not occur in the given set but with same length? For example, given n as the length of string, and a list (length of n) of given strings, find the string with length n but does appear in the list.
Use cantor’s diagonal rules, initialize a string with length n, and its 1st digit is different with the 1st given string, its 2nd digit is different with the 2nd string, …, after go through the whole list we get our answer.
1980. Find Unique Binary String

KMP algorithm (TODO)

Trie Tree

Each node has 2 attributes, children and is_end, children is a dict, with alphabets as the key, and node as the value.

Usually it has 2 functions, insert and search.

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False

class Trie:

    def __init__(self):
        self.root = TrieNode()
        

    def insert(self, word: str) -> None:
        node = self.root
        for each_char in word:
            if each_char not in node.children:
                node.children[each_char] = TrieNode()
            node = node.children[each_char]
        node.is_end = True
        

    def search(self, word: str) -> bool:
        node = self.root
        for each_char in word:
            if each_char not in node.children:
                return False
            node = node.children[each_char]
        return node.is_end
        

    def startsWith(self, prefix: str) -> bool:
        node = self.root
        for each_char in prefix:
            if each_char not in node.children:
                return False
            node = node.children[each_char]
        return True
        


# Your Trie object will be instantiated and called as such:
# obj = Trie()
# obj.insert(word)
# param_2 = obj.search(word)
# param_3 = obj.startsWith(prefix)

Time complexity
Insert: $o(\text{length})$ , where length is the length of the inserted word
Search: $o(\text{length})$
Space complexity: $o(\text{length}*n)$ , where n is the number of nodes in the top layer, which is also the number of words in the TrieTree.

树

霍夫曼树

题目：要将一块长度为N的木板切割成A1, A2, ..., An，每次切割的开销，都是未切割前木板的长度。如将长度为7的木板切割成3, 4两块，则切割的开销是7，求切割的最小开销。
例子：输入完成的木板长度数组，输出切割开销。输入为：3, 4, 5, 1, 2，输出应为33，具体的切割方法是，将15切割成6, 9，开销15，将6切割成3, 3，开销为6，将其中1个3切割成1, 2，开销为3，将9切割为4, 5，开销为9，此时切割完毕。总开销为：15 + 6 + 3 + 9 = 33
核心思想：实际上每个最终切割出的板子，对总开销的贡献，和该块板子被切割的次数有关。比如对于例子中长度为4的板子来说，一共经历了2次切割，所以对最终的开销贡献是2 * 4 = 8，同理5的开销也是2 * 5 = 10。因为切割的先后次序没有限制，所以这个问题就转换为，求以最终这些板子为叶子节点的huffman树的总和。

代码
构建霍夫曼树的代码如下：

import heapq
class TreeNode:
    def __init__(self, x: int = None):
        self.val = x
        self.left = TreeNode()
        self.right = TreeNode()

    def __lt__(self, other):
        return self.val < other.val

def build_huffman_tree(vals: list) -> TreeNode:
    heap = []
    for each_val in vals:
        heapq.heappush(heap, TreeNode(each_val))
    while len(heap) > 1:
        node1, node2 = heapq.heappop(heap), heapq.heappop(heap)
        parent_node = TreeNode(node1.val + node2.val)
        parent_node.left = node1
        parent_node.right = node2
        heapq.heappush(heap, parent_node)
    return heap[0]

图

一般无环用dp，有正环无负环用dij，有负环用bellman-ford
For shortest path, if it’s an unweighted graph, we could use BFS, but if it’s weighted graph, usually we use Dijkstra.

Dijkstra

Core: Consider the lowest-weight unseen edge, from the set of edges connected to all the nodes that have been seen so far.

Because every time we choose the lowest-weight path, and there is no negative path in the graph, the node we firstly reach will definitely has the shortest path.

Time complexity: $o(V\log V)$ , for each node, we have to find the shortest path from the heap.
Space complexity: $o (V)$

code

class Dijkstra:
	def __init__(self, graph: dict) -> None:
		"""
		graph: {node1: {node2: weight12, node3: weight13, ...}
		"""
		self.graph = graph
		
    def shortest_path_dijkstra(self, start_node: int, end_node: int) -> int:
    	"""
    	Use dijkstra to find the shortest path, if not exist, return -1
    	"""
        import heapq
        # dijkstra
        heap = [(0, k)]
        dist = [float('inf')] * len(self.graph)
        res = -1
        while heap:
            path, node = heapq.heappop(heap)
            if dist[node] < path:
	            continue
            dist[node] = path
            if node == end_node:
                res = path
                break
            for each_neighbor, n_path in self.graph[node].items():
                heapq.heappush(heap, (path + n_path, each_neighbor))
        return res

bellman-ford

Similar to bfs, just don’t stop at the destination node, continue until the path no longer changes.

并查集

主要是2种操作，合并union和查找find

In find function, change the root (parent) to the ultimate root for every node. By doing so, we could avoid the height of the tree to keep growing.

class UnionFindSet:
    def __init__(self, nums: list) -> None:
        self.parent = [i for i in nums]
        self.height = [1] * len(nums)

    def find(self, x: int) -> int:
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x: int, y: int) -> None:
    	x, y = self.find(x), self.find(y)
        if x == y:
            return
        if self.height[x] < self.height[y]:
            self.parent[x] = y
        else:
            self.parent[y] = x
            if self.height[x] == self.height[y]:
                self.height[x] += 1

    def same(self, x: int, y: int) -> bool:
        return self.find(x) == self.find(y)

Time complexity: $o(\log n)$ , where n is the number of nodes. Because it’s like building a tree, the height of the tree of n nodes is $\log n$ .
Space complexity: $o (n)$

并查集相关的题目有：冗余连接

位运算

枚举二进制子集

一个枚举二进制子集的小技巧，假设想枚举x的所有二进制子集，则：

令sub = x，不断循环sub = (sub-1)&x，直到sub=0为止，期间的sub都是x的二进制子集

注意：x和0都是x的子集

二进制子集定义：子集的1，只能出现在x是1的位上
代码：

sub = x
while sub != 0:
	sub = (sub - 1) & x

上面的sub就是x的子集，注意不要漏掉跳出循环后sub = 0的情况，0也是x的子集

数学类

素数筛

题目：给出正整数m, n，求区间[m, n]内所有的素数

为了解决上面的问题，需要用Eratosthenes筛法/埃拉托斯特尼筛法，打表求出区间内所有素数。

核心思想：对于不超过n的每个非负整数p，删除2p, 3p, 4p,…，当处理完所有数之后，还没有被删除的就是素数。
改进点：不超过n的非负整数p可以改为不超过n的素数p，并且可以从p * p开始处理。（因为2 * p已经在素数为2的时候处理掉了）
复杂度：时间 $o(n\log n)$

代码

# 获取[m, n]范围内的素数表
def get_primes(m: int, n: int) -> list:
    records = [True] * (n + 1)
    for i in range(2, n + 1):
        if not records[i]:
            continue
        for j in range(i * i, n + 1, i):
            records[j] = False
    return [prime for prime in range(m, n + 1) if records[prime]]

快速幂

题目：求b^e % m
核心思想：把幂级数拆成二进制，比如 $3^{13} = 3^{(1101)_2} = 3^{1 * 8} * 3^{1 * 4} * 3^{0 * 2} * 3^{1 * 1}$ ，依次计算 $3^1, 3^2, 3^4, 3^8, \dots$ ，在对应二进制为1的地方把 $3$ 乘到结果里即可

复杂度：假设计算 $n$ 次幂，则需要计算 $o(log_2n)$ 个中间结果，所以时间复杂度为 $o(\log_2n)$

代码

def fast_exp(b: int, e: int) -> int:
    if e < 0:
        b = 1 / b
        e *= -1
    result = 1
    while e != 0:
        if e & 1 == 1:
            result *= b
        e >>= 1
        b *= b
    return result


def fast_exp_mod(b: int, e: int, m: int):
    b %= m
    result = 1
    while e != 0:
        if (e & 1) == 1:
            result = (result * b) % m
        e >>= 1
        b = (b * b) % m
    return result

Get greatest common divisor (gcd)

Euclidean algorithm: given two positive integers a and b such that a > b, the common divisors of a and b are the same as the common divisors of a – b and b.

Implementation: Replace the larger number with num1 % num2, keep doing this until one of them is 0.

def calc_gcd(num1: int, num2: int) -> int:
    while num1 > 0 and num2 > 0:
        num1, num2 = max(num1, num2), min(num1, num2)
        num1 = num1 % num2
    return num2

Reservoir Sampling

Context: we have an unknown size n, and we want to sample k samples without replacement.
Method: get k samples first (reservoir), and then for the next ith item, the probability of using this ith item to replace one of the k samples is 1/i. Keep doing so until the end, then all the items are chosen by 1/n
Proof
For ith item, the probability of being chosen is:
$\begin{aligned} p &= \underbrace{\frac{1}{i}}_{\text{probability of being chosen}}*\underbrace{(1-\frac{1}{i+1})*(1-\frac{1}{i+2})*\dots*(1-\frac{1}{n})}_{\text{probability of not being replaced}} \\ &=\frac{1}{i}*\frac{i}{i+1}*\frac{i+1}{i+1}*\dots*\frac{n-1}{n} \\ &= \frac{1}{n} \end{aligned}$

So every item in the reservoir has the probability of 1/n.

Time complexity: $o (n)$
Space complexity: $o (k)$

Related leetcodes: 398. Random Pick Index

dp

01背包

题目：有n个重量、价值分别为 $w_i, v_i$ 的物品，背包总承重为W，求能拿下的最大价值
核心思想：dp[i][j]表示选前i个物品，背包剩余空间为j的价值，则状态转移方程为：
$\begin{cases} dp[i - 1][j], & w[i] > j \\ max(dp[i - 1][j], dp[i - 1][j - w[i]] + v[i]), & w[i] \leq j \end{cases}$
上面一行表示第i个物品的重量大于剩余空间，此时不选第i个物品，在前i - 1个物品里找
下面一行表示第i个物品的重量小于等于剩余空间，此时可以选这个物品，那么要决定选或者不选，如果不选，则和上面一行一样，如果选，则背包空间减小，同时价值增加
复杂度：时间复杂度 $o (nW)$ ，空间复杂度 $o (nW)$

代码

# input example: items = [(4, 8), (6, 10), (2, 6), (2, 3), (5, 7), (1, 2)], W = 12
# output result: 24
def knapsack_01(items: list, W: int) -> int:
    n = len(items)
    dp = [[0] * (W + 1) for _ in range(n + 1)]
    for row in range(1, len(dp)):
        for col in range(1, len(dp[0])):
            if items[row - 1][0] > col:
                dp[row][col] = dp[row - 1][col]
            else:
                dp[row][col] = max(dp[row - 1][col], dp[row - 1][col - items[row - 1][0]] + items[row - 1][1])
    return dp[-1][-1]

观察到每行的状态转移只和上一行有关，所以还可优化空间复杂度。注意每个格子是和上一行同列、上一行左边的列有关，所以更新列的时候应该从右向左更新

复杂度：空间复杂度 $o (W)$

代码

# input example: items = [(4, 8), (6, 10), (2, 6), (2, 3), (5, 7), (1, 2)], W = 12
# output result: 24
def knapsack_01_space_optimized(items: list, W: int) -> int:
    n = len(items)
    dp = [0] * (W + 1)
    for row in range(n):
        for col in range(len(dp) - 1, -1, -1):
            if items[row][0] <= col:
                dp[col] = max(dp[col], dp[col - items[row][0]] + items[row][1])
    return dp[-1]

完全背包

和01背包很相似，区别在于，物品的数量是无限的，即：
题目：有n个重量、价值分别为 $w_i, v_i$ 的物品，每个物品有无穷多个，背包总承重为W，求能拿下的最大价值
核心思想：dp[i][j]表示选前i个物品，背包剩余空间为j的价值，则状态转移方程为：
$\begin{cases} dp[i - 1][j], & w[i] > j \\ max(dp[i - 1][j], dp[i][j - w[i]] + v[i]), & w[i] \leq j \end{cases}$
上面一行表示第i个物品的重量大于剩余空间，此时不选第i个物品，在前i - 1个物品里找
下面一行表示第i个物品的重量小于等于剩余空间，此时可以选这个物品，那么要决定选或者不选，如果不选，则和上面一行一样，如果选，因为可以选无穷多个，若果假设选择k个，则选择k个的价值，从选择k - 1个该物品的价值转换而来。所以此时的状态来源，是选择了1个该物品，同时还可以继续选择该物品，所以i不变，背包容量减小w[i]，价值增加v[i]即可。
复杂度：时间复杂度 $o (nW)$ ，空间复杂度 $o (nW)$

代码

# 完全背包，输入的items是List[tuple(int, int)]，分别是每个物品的重量、价值
# input example: items = [(3, 4), (4, 5), (2, 3)], W = 7
def knapsack_full(items: list, W: int) -> int:
    dp = [[0] * (W + 1) for _ in range(len(items) + 1)]
    for row in range(1, len(dp)):
        for col in range(1, len(dp[0])):
            dp[row][col] = max(dp[row - 1][col], dp[row][col - items[row - 1][0]] + items[row - 1][1] if col - items[row - 1][0] >= 0 else 0)
    return dp[-1][-1]

观察到每行的状态转移，和上一行同列、同一行左列有关，所以还可优化空间复杂度。注意，更新列信息时应该从左向右更新，可以自己画一下矩阵更新的图来判断更新方向。

复杂度：空间复杂度 $o (W)$

代码

# 优化空间的完全背包
def knapsack_full_space_optimized(items: list, W: int) -> int:
    dp = [0] * (W + 1)
    for row in range(len(items)):
        for col in range(W + 1):
            dp[col] = max(dp[col], dp[col - items[row][0]] + items[row][1] if col - items[row][0] >= 0 else 0)
    return dp[-1]

贪心

贪心的想法就是，每次都求局部最优解，认为局部最优解最终就是全局最优解。

N个人过河问题

题目：在一个夜黑风高的晚上，有n个小朋友在桥的这边编号为0, 1, ..., n - 1，现在他们需要过桥，但是由于桥很窄，每次只允许不大于两人通过，他们只有一个手电筒，所以每次过桥的两个人需要把手电筒带回来，i号小朋友过桥的时间为T[i]，两个人过桥的总时间为二者中时间长者，问所有小朋友过桥的总时间最短是多少？
核心思路：
因为每次过桥的时间，其实和最耗时的人有关，所以考虑把最慢的2个人送过去（2个人，是因为船能坐2个人），然后递归地对剩下的人计算。

把最慢的2个人带过去，有2种方案：

最快的带最慢的，最快的回来，最快的带次慢的，最快的回来。耗时为： $T_n + T_0 + T_{n - 1} + T_0$
最快的带次快的，最快的回来，最慢的带次慢的，次快的回来。耗时为： $T_1 + T_0 + T_n + T_1$

从上面2种方案种选出耗时较短的那个，然后递归即可。
复杂度：时间 $o(N^2)$ ，空间（主要是递归栈深度） $o (N)$
代码

def minimum_time(times: list) -> int:
    if len(times) == 1:
        return times[0]
    if len(times) == 2:
        return times[1]
    if len(times) == 3:
        return sum(times)
    return min(2 * times[0] + times[-1] + times[-2], times[0] + 2 * times[1] + times[-1]) + minimum_time(times[:-2])

Sorting

Insertion Sort

Iterate the whole list, every time compare every element in the new list to decide where to insert the current element.

Time complexity: $o(n^2)$
Space complexity: $o (n)$

class Solution:
    def sortArray(self, nums: List[int]) -> List[int]:
        sorted_nums = []
        for each_num in nums:
            i = 0
            while i < len(sorted_nums):
                if sorted_nums[i] < each_num:
                    i += 1
                else:
                    break
            sorted_nums.insert(i, each_num)
        return sorted_nums

Bubble Sort

Every time sort one element to the final position.

Time complexity: $o(n^2)$
Space complexity: $o (1)$

class Solution:
    def sortArray(self, nums: List[int]) -> List[int]:
        for i in range(len(nums) - 1, -1, -1):
            for j in range(1, i + 1):
                if nums[j - 1] > nums[j]:
                    nums[j - 1], nums[j] = nums[j], nums[j - 1]
        return nums

Selection Sort

Every time select the minimum/maximum element, and put it to the final position.

Time complexity: $o(n^2)$
Space complexity: $o (1)$

class Solution:
    def sortArray(self, nums: List[int]) -> List[int]:
        for i in range(len(nums)):
            cur_min, cur_min_index = 50001, -1
            for j in range(i, len(nums)):
                if nums[j] < cur_min:
                    cur_min = nums[j]
                    cur_min_index = j
            nums[i], nums[cur_min_index] = nums[cur_min_index], nums[i]
        return nums

Quick Sort

Every time, put all the smaller numbers in the left, put all the larger numbers in the right.

Time complexity: $\log n)$ , in worst case it would be $o(n^2)$ , where pivot is always the largest/smallest element.

Space complexity: $o (n)$ , key point: skip pivot when adding elements into smaller and larger.

class Solution:
    def sortArray(self, nums: List[int]) -> List[int]:
        if len(nums) < 2:
            return nums
        pivot = random.choice(nums)
        smaller_nums = [item for item in nums if item < pivot]
        larger_nums = [item for item in nums if item > pivot]
        equals = [item for item in nums if item == pivot]
        return self.sortArray(smaller_nums) + equals + self.sortArray(larger_nums)

Space complexity: $o (1)$

class Solution:
    def sortArray(self, nums: List[int]) -> List[int]:
        def helper(left: int, right: int) -> None:
            if left >= right:
                return
            pivot = nums[random.randint(left, right)]
            small_index, large_index = left, right
            p = left
            while p <= large_index:
                if nums[p] < pivot:
                    nums[small_index], nums[p] = nums[p], nums[small_index]
                    small_index += 1
                    p += 1
                elif nums[p] > pivot:
                    nums[large_index], nums[p] = nums[p], nums[large_index]
                    large_index -= 1
                else:
                    p += 1
            helper(left, small_index - 1)
            helper(large_index + 1, right)
        helper(0, len(nums) - 1)
        return nums

Merge Sort

Merge sort the left half and the right half, then merge two sorted lists.

Time complexity: $\log n)$
Space complexity: $o (n)$

class Solution:
    def sortArray(self, nums: List[int]) -> List[int]:
        def merge_sorted_list(nums1: list, nums2: list) -> list:
            p1, p2 = 0, 0
            res = []
            while p1 < len(nums1) or p2 < len(nums2):
                if p1 < len(nums1) and p2 < len(nums2):
                    if nums1[p1] <= nums2[p2]:
                        res.append(nums1[p1])
                        p1 += 1
                    else:
                        res.append(nums2[p2])
                        p2 += 1
                elif p1 < len(nums1):
                    res += nums1[p1:]
                    break
                elif p2 < len(nums2):
                    res += nums2[p2:]
                    break
            return res
        if len(nums) == 1:
            return nums
        sorted1, sorted2 = self.sortArray(nums[:len(nums) // 2]), self.sortArray(nums[len(nums) // 2:])
        return merge_sorted_list(sorted1, sorted2)

Counting sort

Use a count auxiliary list to store the count of each element, and then transform it into a pre sum list, to denote how many numbers that are smaller than the current number there are, and use this number to decide the final position of each element.

Time complexity: $o (n + k)$ , where n is the number of elements, and k is the maximum element.
Space complexity: $o (n + k)$

class Solution:
    def sortArray(self, nums: List[int]) -> List[int]:
    	# k: [-5*10^4, 5*10^4]
        count_list = [0] * 100001
        offset = 50000
        for each_num in nums:
            count_list[each_num + offset] += 1
        for i in range(1, len(count_list)):
            count_list[i] += count_list[i - 1]
        res = [0] * len(nums)
        for each_num in nums:
            position = count_list[each_num + offset] - 1
            res[position] = each_num
            count_list[each_num + offset] -= 1
        return res

Binary Search

Every time, discard a half.

left, right = 0, len(nums) - 1
while left < right:
	mid = (left + right) >> 1
	if nums[left] < target:
		left = mid + 1
	else:
		right = mid
return (left + right) >> 1