《算法图解》阅读总结

最新推荐文章于 2021-11-04 17:31:27 发布

kunn4938

最新推荐文章于 2021-11-04 17:31:27 发布

阅读量531

点赞数

文章标签：算法链表 python 数据结构

本文链接：https://blog.csdn.net/weixin_42367960/article/details/109152553

版权

《算法图解》

——像小说一样有趣的算法入门书

此博客为在阅读过程中主要知识点笔记，内容为书中所述加上自己理解与代码实现。

文章目录

《算法图解》

著名物理学家理查德·费曼提出了费曼算法（Feynman Algorithm），其步骤如下：

（1）将问题写下来；

（2）好好思考；

（3）将答案写下来。

阅读完本书，这段话给我印象最深，哈哈哈！告诫我们，只要不放弃，沉浸下来认真思考就会有回响。今天把这本书基本阅读完，坐下比较，以备不时之需。

全书共计十一章，前三章比较基础，主要内容集中在四~九章，以下是对每章主要内容及示例总结：

第一章：算法简介

查找算法：二分查找，查找对象是一个有序的元素列表（数组也行，强调有序），每次查找时都从中间位置开始，存在时返回其位置，否则返回null。
运行时间：大O表示法，用来描述算法的运行时间，需要指出大O表示法不是以秒为单位的速度，而是随着元素数量的增加（输入的增加），算法运行时间的增速。

常见的大O运行时间：

运行时间	基本解释
O(log n)	对数时间，如二分查找算法
O(n)	线性时间，如简单查找
O(n * log n)	快速排序，速度较快的一种排序算法
O(n²)	选择排序，速度较慢的排序算法
O(n!)	旅行商问题，非常慢的算法

二分查找的速度比简单查找快得多，大O表示法说的是最糟情形。以下为查找指定元素在列表中的位置，运用二分查找：

def binary_search(list_data, item):
    begin = 0
    last = len(list_data) - 1

    while begin <= last:
        mid = (begin + last) // 2
        mid_value = list_data[mid]
        if mid_value == item:
            return mid

        elif mid_value > item:
            last = mid - 1

        else:
            begin = mid + 1
    return None


if __name__ == '__main__':
    list_data = [1, 2, 3, 4, 5, 6, 7, 8, 9]
    print(binary_search(list_data, 5))

利用循环，迭代更新左右端点，直至找到元素或查找结束。

第二章：选择排序

基本数据结构：数组，链表
排序算法：选择排序

在数据存储中，数组为连续存储，查找较快，但对数组进行插入和删除时，由于其连续存储性，可能需要重新分配地址问题；对于链表存储时，链表的每一个元素都存储了下一个元素的地址，因此元素存储时地址可不连续，对于链表的查找需按序查找，如果要跳跃，链表的效率真的很低，但同时，当需要插入或删除元素时，链表更简单，只需要修改前面元素指向地址。

	数组	链表
读取	O(1)	O(n)
插入	O(n)	O(1)
删除	O(n)	O(1)

选择排序是一种灵巧的算法，但其运行速度不是很快，以下为采用选择排序对列表进行降维排列：

# 找出当前列表最大值并返回索引
def find_smallest(arr):
    smallest = arr[0]
    smallest_index = 0
    for i in range(1, len(arr)):
        if arr[i] > smallest:
            smallest = arr[i]
            smallest_index = i
    return smallest_index


def sort_arr(arr):
    finall_arr = []
    for i in range(len(arr)):
        smallest = find_smallest(arr)
        finall_arr.append(arr.pop(smallest))  # 弹出原列表最大值，并追加到新列表
    return finall_arr


if __name__ == '__main__':
    Test_arr = [1, 2, 4, 3, 5, 3, 6, 7, 9, 0, 3, 2]
    print(sort_arr(Test_arr))

第三章：递归

递归 <===> 一种优雅的解决问题的方法，爱恨交织；函数自己调用自己。

递归函数包含两个部分：

基线条件(Base Case)：函数不再调用自己的条件，避免形成死循环。
递归条件(Recursive Case)：函数调用自己的条件。

def print_number(i):
    print(i)
    if i <= 0:  # 基线条件
        return
    else:  # 递归条件
        print_number(i - 1)

if __name__ == '__main__':
    print_number(5)  # 打印 5 4 3 2 1 0

调用栈(Call Stack)：栈：一个重要的编程概念，基本的数据结构，（FILO，先进后出，后进先出）；调用栈：计算机在内部使用被称为调用栈的栈，在计算机内部执行时用于存储多个函数的变量，递归函数便使用调用栈，调用栈越长，占用内存越大。

栈的两种基本操作：

压栈（压入）
出栈（弹出）

第四章：快速排序

分而治之（Divide and Conquer），D&C，一种递归式问题解决方法
快速排序，重要的D&C算法

矩形土地分成方块问题：基线条件，一条边长度是另一条边的整数倍；递归条件：每次的递归调用都必须缩小问题的规模，当前问题（找出最大方块，以教短边为边长的正方形），对余下方块继续使用算法直至满足基线条件。
函数sum工作原理：基线条件：列表元素为空或者一个元素时，和即为0或这个数；递归条件：每次调用函数使都必须离空数组更近一步，

两个式子等效，但第二个式子给sum函数传参数更短，既缩小问题规模。

def get_sum(arr):
    head = arr[0]
    tail = arr[1:]
    if not tail:
        return head
    else:
        global count
        count += 1
        return head + get_sum(tail)


if __name__ == '__main__':
    count = 1
    test_arr = [1, 2, 3, 4, 5, 6, 7, 8]
    if not test_arr:
        print('No value！')
    else:
        print('Sum:', get_sum(test_arr))
        print('The counts of the arr:', count)

快速排序：是一种排序算法，比选择排序快得多，C语言标准库中的*qsort()*函数就是快速排序。以下为利用快速排序对列表进行排序：

# 利用快速排序对列表进行排序
def quick_sort(arr):
    if len(arr) < 2:
        return arr
    else:
        pivot = arr[0]  # 选择基准值（pivot）
        less = [i for i in arr[1:len(arr)] if i < arr[0]]
        greater = [i for i in arr[1:len(arr)] if i >= arr[0]]
        return quick_sort(less) + [pivot] + quick_sort(greater)  # 返回左右分区+基准值


if __name__ == '__main__':
    test_arr = [3, 2, 5, 6, 1, 8, 7, 2, 4, 9, 0, 3, 7]
    print(quick_sort(test_arr))

小结：

D&C将问题逐步分解，使用D&C算法时，基线条件可能是空数组或只有一个元素的数组；
实现快速排序时，请随机地选择基准值元素，快排序平均运行时间O(n*log n)。

第五章：散列表

散列表(Hash Table)：最有用的数据结构之一；内部实现机制：实现、冲突和散列函数。也被称谓散列映射、映射、字典和关联函数。

散列表由键和值组成，在Python中用 {} 来创建一个新的散列表：

phone_book = dict() <===> phone_book = {}

选择一个好的散列函数（如SHA），有助于使数值均匀分布，更便于对散列表进行操作,所需要时间为 O(1)。

在实际运用中可以结合散列函数和数组来创建散列表；
最大限度减少冲突（多个键值分配同一位置）的散列函数，常采用链表方式解决；
散列表的查询、插入和删除都非常快，O(1)，最糟情况下为：O(n);
一旦装填因子超过0.7，就应该调整散列表长度；
散列表可以用于缓存数据（在web服务器上），可防止重复。

第六章：广度优先搜索

数据结构图，建立网络模型
广度优先算法(Breadth-first Search, BFS)，解决最短路径问题
队列(First In First Out, FIFO)，一种先进先出的数据结构

以寻找你身边是否为‘ y ’为例，运用散列表与队列创建搜索队列，利用广度优先算法找出最短路径：

from collections import deque

# 定义一个散列表（字典）存放关系图结构
graph = {}
graph["you"] = ["alice", "bob", "claire"]
graph["bob"] = ["anuj", "peggy"]
graph["claire"] = ["peggy"]
graph["alice"] = ["thom", "jonny"]
graph["anuj"] = []
graph["peggy"] = []
graph["thom"] = []
graph["jonny"] = []


def person_is_seller(a):
    if a[-1] == 'y':
        return True
    else:
        return False


def search(name):
    search_deque = deque()  # 创建队列存放将查询对象
    search_deque += graph[name]
    searched = []  # 记录已近搜索的列表元素
    while search_deque:
        person = search_deque.popleft()  # 弹出左端元素
        if not person in searched:
            if person_is_seller(person):
                print(person + ' is a seller')
                return True
            else:
                search_deque += graph[person]
                searched.append(person)
        else:
            return False


if __name__ == '__main__':
    search("you")

小结：

广度优先算法指出从A到B的路径，并找出最短路径（无权重）；
在解决最短路径问题时，可尝试用图来建立模型，再使用广度优先搜索解决问题；
对于查找过的元素可将其放入一个列表，避免可能导致无限循环问题。

第七章：狄克斯特拉算法

加权图，较与上一章的结构图，给每条路径加上相应的权重（开销）
狄克斯特拉算法，在加权图中找到最短路径
狄克斯特拉算法适用条件，无环结构图，无负权边

# 创建原始关系图
graph = {}
graph["start"] = {}
graph["start"]["a"] = 6
graph["start"]["b"] = 2
graph["a"] = {}
graph["a"]["fin"] = 1
graph["b"] = {}
graph["b"]["fin"] = 5
graph["b"]["a"] = 3
graph["fin"] = {}
processed = []

# 创建开销散列图，待计算更新
infinity = float("inf")
costs = {}
costs["a"] = 6
costs["b"] = 2
costs["fin"] = infinity

# 创建其父节点，待计算更新
parents = {}
parents["a"] = "start"
parents["b"] = "start"
parents["fin"] = None


# 寻找节点处下一个最小节点
def find_low_cost_node(costs):
    low_cost = float("inf")
    low_cost_node = None
    for node in costs:
        cost = costs[node]
        if cost < low_cost and node not in processed:
            low_cost = cost
            low_cost_node = node
    print("开销最少的节点为:", low_cost_node)
    return low_cost_node


# 计算更新开销与父类
def main():
    node = find_low_cost_node(costs)
    while node is not None:
        cost = costs[node]
        neighbors = graph[node]
        for n in neighbors.keys():
            new_cost = cost + neighbors[n]
            if costs[n] > new_cost:
                costs[n] = new_cost
                parents[n] = node
        processed.append(node)
        node = find_low_cost_node(costs)
    print("最小开销为:", new_cost)


if __name__ == '__main__':
    main()

小结：

广度优先算法，在非加权图中寻找最短路径；
迪克斯特拉算法，在加权图中寻找最短路径；
当存在负权边时可考虑使用贝尔曼-福德算法。

第八章：贪婪算法

NP完全问题，Non-deterministic Polynomial，多项式复杂程度的非确定性问题
近似算法，找出NP完全问题的近似解
贪婪策略，一种非常简单的问题解决策略

贪婪算法，每一步都采用最优的做法，每步都选择局部最优解，最终获得到的就是全局最优解。以建设最少广播台覆盖全部州为列，选择一个广播台，覆盖最多的未覆盖州，并不断重复这一算法，直至覆盖全部：

states_needed = {"mt", "wa", "or", "id", "nv", "ut", "ca", "az"}  # set([]) 已经转为列表
# set(["id", "nv", "ut"]) <==> {"id", "nv", "ut"}

stations = {}
stations["kone"] = {"id", "nv", "ut"}
stations["ktwo"] = {"wa", "id", "mt"}
stations["kthree"] = {"or", "nv", "ca"}
stations["kfour"] = {"nv", "ut"}
stations["kfive"] = {"ca", "az"}

final_stations = set()

# 找出覆盖了最多的未覆盖州的广播站
while states_needed:
    best_station = None
    states_covered = set()
    for station, states in stations.items():  # 键-值对遍历字典
        covered = states_needed & states
        if len(covered) > len(states_covered):
            best_station = station
            states_covered = covered
    states_needed -= states_covered
    final_stations.add(best_station)
print(final_stations)  # <==> {'ktwo', 'kthree', 'kone', 'kfive'}

小结：

贪婪算法是寻找局部最优解，企图以这种方式寻找全局最优解的一种近似算法；
对于NP完全问题，暂无快速解决方案，面对NP完全问题，常采用近似算法。

第九章：动态规划

动态规划，解决棘手问题，将大问题分解小问题

以寻找两组列表最长公共子串（连续）和最长公共子序列（非连续相同）为例，求解子串和子序列矩阵：

import numpy as np

word1 = 'fish'
word2 = 'fosh'

row_length = len(word1)
column_length = len(word2)
metric = np.zeros((row_length, column_length))
metric2 = np.zeros((row_length, column_length))


# 求出最长公共子串，连在一起的
def find_max_samechars():
    for index1, item1 in enumerate(word1):
        for index2, item2 in enumerate(word2):
            if item1 == item2:
                if index1 and index2 >= 1:
                    metric[index1][index2] = metric[index1 - 1][index2 - 1] + 1
                else:
                    metric[index1][index2] = 1
            else:
                metric[index1][index2] = 0
    print('最长最长公共子串矩阵为：')
    print(metric)


# 求解最长公共子序列，有相同则算一次
def find_all_samechars():
    for index1, item1 in enumerate(word1):
        for index2, item2 in enumerate(word2):
            if item1 == item2:
                metric2[index1][index2] = metric2[index1 - 1][index2 - 1] + 1
            else:
                metric2[index1][index2] = max(metric2[index1 - 1][index2], metric2[index1][index2 - 1])
               
    print('最长公共子序列矩阵为：')
    print(metric2)


if __name__ == '__main__':
    find_max_samechars()
    find_all_samechars()