排序和搜索（2）

最新推荐文章于 2022-07-06 14:14:07 发布

MrUncle德鲁

最新推荐文章于 2022-07-06 14:14:07 发布

阅读量234

点赞数 1

分类专栏：数据结构与算法(Python) 文章标签：搜索与排序

本文链接：https://blog.csdn.net/FANGLICHAOLIUJIE/article/details/89429741

版权

数据结构与算法(Python) 专栏收录该内容

21 篇文章 0 订阅

订阅专栏

查找

1、顺序查找

列表无序
输入一个列表和待搜索的数，返回一个布尔值，表示是否找到。
复杂度为O(n)

def sequentialSearch(alist, item):
    pos = 0
    found = False
    while pos < len(alist) and not found:
        if alist[pos] == item:
            found = True
        else:
            pos += 1
    return found

alist = [1,2,3,4,6,7,97,4,3,23,22]
sequentialSearch(alist,0)

False

列表有序
假设列表是按升序排序,可以引入提前终止。

def orderdSequentialSearch(alist, item):
    pos = 0
    found = False
    stop = False
    while pos < len(alist) and not found and not stop:
        if alist[pos] == item:
            found = True
        else:
            if alist[pos] < item:
                pos += 1
            else:
                stop = True
    return found

alist = [1,3,4,5,7,65,90]
orderdSequentialSearch(alist,90)

True

2、二分查找

二分查找从中间开始查找，斌且每次查找都会将查找的范围减半
但是要求带搜寻的列表是有序的。
O(log^n)

def binarySearch(alist, item):
    first = 0
    last = len(alist)-1 
    found = False
    while not found and first <= last: # 注意边界条件，如果是<则结果会出错
        mid = (first + last) // 2
        if alist[mid] == item:
            found = True
        else:
            if alist[mid] > item:
                last = mid - 1
            else:
                first = mid + 1
    return found

alist = [0,1,2,8,13,18,32,43]
print(binarySearch(alist, 2))
print(binarySearch(alist, 43))
print(binarySearch(alist, 20))

True
True
False

使用递归实现二分查找

def binarySearch(alist, item):
    if len(alist) == 0:
        return False
    else:
        midpoint = len(alist) // 2
        if alist[midpoint] == item:
            return True
        else:
            if alist[midpoint] > item:              
                return binarySearch(alist[: midpoint], item)# 此处不需要减一，因为列表本身是不包括右边界的
            else:
                return binarySearch(alist[midpoint+1: ], item)

alist = [0,1,2,8,13,18,32,43]
print(binarySearch(alist, 2))
print(binarySearch(alist, 43))
print(binarySearch(alist, 20))

True
True
False

3、Hash查找

哈希表的每一个位置，通常称为一个槽，可以容纳一个项，并且由从0开始的整数命名。
项和该项在散列表中的槽之间的映射关系称为hash函数。
负载因子：已经存在项的数目/表的大小
碰撞(冲突)：指两个项的哈希值相同，会占用同一个槽
完美散列函数：给定项的集合，将每个项映射到唯一槽的散列函数.
哈希函数有很多种：常见的有简单余数法、分组求和法、平方取中法等

def hash(astring, tablesize):
    sum = 0
    for pos in range(len(astring)):
        sum += ord(astring[pos])
    return sum % tablesize

print(hash('cat', 11))
print(hash('act', 11))

4
4

可以发现上述函数有不足之处，为此进行改进，使用每个字符的位置作为权重

def hash(astring, tablesize):
    sum = 0
    for pos in range(len(astring)):
        sum +=  (pos + 1) * ord(astring[pos])
    return sum % tablesize

print(hash('cat', 11))
print(hash('act', 11))

3
5

冲突解决:当两个项的哈希值相同时，必须有一个方法将第二个项方式散列表中，这个过程称为冲突解决。
方法1：线形探测的开放寻址技术
- 定义：如果某个项的哈希值冲突了，则在哈希表中从该哈希值依次搜索，直至遇到一个空槽。
- 缺点：产生聚集的趋势，项在表中局部聚集，分布不够均匀
方法2：扩展线形探测
- 不在是按顺序依次搜寻，而是跳过一定的槽。跳过的大小人为指定，但是要保证哈希表中的所有槽都可以被访问。否则，表的一部分将不会被使用。为了确保这一点，表的大小通常设置位素数。

实现map抽象数据类型

Map()创建一个新的map，返回一个空的map集合
put(key, val)添加一个新的键值对。如果键已经存在，则用新值替换旧值。
get(key)给定一个键，返回对应的值
del 使用del map[key] 的形式删除键值对
len()返回键值对的个数
in 判断是否存在某个键值对

class HashTable:
    def __init__(self):
        self.size = 11
        self.slots = [None] * self.size
        self.data = [None] * self.size
    def put(self, key, data):
        hashvalue = self.hashfunction(key,len(self.slots))
        if self.slots[hashvalue] == None:
            self.slots[hashvalue] = key
            self.data[hashvalue] = data
        else:
            nextslot = self.rehash(hashvalue, len(self.slots))
            while self.slots[nextslot] != None and self.slots[nextslot] != key:
                nextslot = self.rehash(nextslot,len(self.slots))
            if self.slots[nextslot] == None:
                self.slots[nextslot] = key
                self.data[nextslot] = data
            else:
                self.data[nextslot] = data
    def hashfunction(self, key,size):
        return key%size
    def rehash(self, oldhash, size):
        return (oldhash + 1) % size
    
    def get(self, key):
        startslot = self.hashfunction(key,len(self.slots))
        data = None
        stop = False
        found = False
        position = startslot
        while self.slots[position] != None and not found and not stop:
            if self.slots[position] == key:
                found = True 
                data = self.data[position]
            else:
                position = self.rehash(position,len(self.slots))
                if position == startslot:
                    stop = True
        return data
    
    def __getitem__(self,key):
        return self.get(key)
    def __setitem__(self, key, data):
        self.put(key,data)

H = HashTable()
H[54] = 'cat'
H[26] = 'dog'
H[93] = 'lion'
H[17] = 'tiger'
H[77] = 'bird'
H[31] = 'cow'
H[44] = 'goat'
H[55] = 'pig'
H[20] = 'chicken'
print(H.slots)
print(H.data)

[77, 44, 55, 20, 26, 93, 17, None, None, 31, 54]
['bird', 'goat', 'pig', 'chicken', 'dog', 'lion', 'tiger', None, None, 'cow', 'cat']

print(H[20])
print(H[17])

chicken
tiger

H[20] = 'duck'
print(H.data)

['bird', 'goat', 'pig', 'duck', 'dog', 'lion', 'tiger', None, None, 'cow', 'cat']

print(H[99])

None

排序

1、冒泡排序

O(n^2)

def bubbleSort(alist):
    for passnum in range(len(alist) - 1, 0 ,-1):
        for i in range(passnum):          
            if alist[i] > alist[i+1]:
#                 temp = alist[i]
#                 alist[i] = alist[i+1]
#                 alist[i+1] = temp
                alist[i],alist[i+1] = alist[i+1], alist[i]
alist = [54,26,93,17,77,31,44,55,20,87]
bubbleSort(alist)
print(alist)

[17, 20, 26, 31, 44, 54, 55, 77, 87, 93]

2、短冒泡排序

def shortBubbleSort(alist):
    exchanges = True
    passnum = len(alist) - 1
    while passnum > 0 and exchanges:
        exchanges = False
        for i in range(passnum):
            if alist[i] > alist[i+1]:
                exchanges = True
                temp = alist[i]
                alist[i] = alist[i+1]
                alist[i+1] = temp
        passnum -= 1

alist=[20,30,40,90,50,60,70,80,100,110]
shortBubbleSort(alist)
print(alist)

[20, 30, 40, 50, 60, 70, 80, 90, 100, 110]

3、选择排序

每次遍历寻找带搜寻列表中额最大值，并将其放于正确的位置
相比于冒泡排序，他们之间比较的次数相同，但是交换的次数并不相同。
O(n^2)

def selectionSort(alist):
    for fillslot in range(len(alist) - 1, 0, -1):
        positionOfMax = 0
        for location in range(1, fillslot+1):# 注意这里fillslot要加上1.否则最后一个元素不能遍历
            if alist[location] > alist[positionOfMax]:
                positionOfMax = location
        temp = alist[fillslot]
        alist[fillslot] = alist[positionOfMax]
        alist[positionOfMax] = temp

alist = [54,26,93,17,77,31,44,55,20]
selectionSort(alist)
print(alist)

[17, 20, 26, 31, 44, 54, 55, 77, 93]

4、插入排序

它始终在列表的较低位置维护一个排序
的子列表。然后将每个新项 “插入” 回先前的子列表，使得排序的子列表称为较大的一个项
O(n^2)

def insertionSort(alist):
    for index in range(1,len(alist)):
        currentvalue = alist[index]
        position = index
        while position > 0 and alist[position-1] > currentvalue:
            alist[position] = alist[position-1]
            position = position - 1
        alist[position] = currentvalue

alist = [54,26,93,17,77,31,44,55,20]
insertionSort(alist)
print(alist)

[17, 20, 26, 31, 44, 54, 55, 77, 93]

5、希尔排序

（有时称为“递减递增排序”）通过将原始列表分解为多个较小的子列表来改进插入排
序，每个子列表使用插入排序进行排序。
希尔排序的复杂度随着增量的改变而不同，处于O(n)~O(n^2)之间

def shellSort(alist):
    sublistcount = len(alist) // 2
    while sublistcount > 0:
        for startposition in range(sublistcount):
            gapInsertionSort(alist,startposition, sublistcount)
        print("after increments of size ", sublistcount, " The list is  ", alist)
        sublistcount = sublistcount // 2

def gapInsertionSort(alist,start, gap):
    for i in range(start + gap, len(alist), gap):
        currentvalue = alist[i]
        position = i
        while position >= gap and alist[position - gap] > currentvalue:
            alist[position] = alist[position-gap]
            position = position - gap
        alist[position] = currentvalue

alist = [54,26,93,17,77,31,44,55,20]
shellSort(alist)
print(alist)

after increments of size  4  The list is   [20, 26, 44, 17, 54, 31, 93, 55, 77]
after increments of size  2  The list is   [20, 17, 44, 26, 54, 31, 77, 55, 93]
after increments of size  1  The list is   [17, 20, 26, 31, 44, 54, 55, 77, 93]
[17, 20, 26, 31, 44, 54, 55, 77, 93]

6、归并排序

属于一种递归算法，分而治之的策略，不断地将列表拆分为一半。如果列表为空
或有一个项，则按定义（基本情况）进行排序。如果列表有多个项，我们分割列表，并递归
调用两个半部分的合并排序。一旦对这两半排序完成，就执行称为合并的基本操作。合并是
获取两个较小的排序列表并将它们组合成单个排序的新列表的过程。
包括两个主要的过程，一个是列表二分，一个是合并
复杂度为O(nlog^n)
mergeSort 函数需要额外的空间来保存两个半部分，因为它们是使用切片操
作提取的。如果列表很大，这个额外的空间可能是一个关键因素，并且在处理大型数据集时
可能会导致此类问题。

def mergeSort(alist):
    print("Splitting ", alist)
    if len(alist) > 1:
        mid = len(alist) // 2
        lefthalf = alist[:mid]
        righthalf = alist[mid:]
        mergeSort(lefthalf)
        mergeSort(righthalf)
        i,j,k = 0,0,0
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] < righthalf[j]:
                alist[k] = lefthalf[i]
                i = i+1
            else:
                alist[k] = righthalf[j]
                j = j+1
            k = k+1
            
        while i < len(lefthalf):
            alist[k] = lefthalf[i]
            i += 1
            k += 1
        
        while j < len(righthalf):
            alist[k] = righthalf[j]
            j += 1
            k += 1
    print("merging ", alist)

alist = [54,26,93,17,77,31,44,55,20]
mergeSort(alist)
print(alist)

Splitting  [54, 26, 93, 17, 77, 31, 44, 55, 20]
Splitting  [54, 26, 93, 17]
Splitting  [54, 26]
Splitting  [54]
merging  [54]
Splitting  [26]
merging  [26]
merging  [26, 54]
Splitting  [93, 17]
Splitting  [93]
merging  [93]
Splitting  [17]
merging  [17]
merging  [17, 93]
merging  [17, 26, 54, 93]
Splitting  [77, 31, 44, 55, 20]
Splitting  [77, 31]
Splitting  [77]
merging  [77]
Splitting  [31]
merging  [31]
merging  [31, 77]
Splitting  [44, 55, 20]
Splitting  [44]
merging  [44]
Splitting  [55, 20]
Splitting  [55]
merging  [55]
Splitting  [20]
merging  [20]
merging  [20, 55]
merging  [20, 44, 55]
merging  [20, 31, 44, 55, 77]
merging  [17, 20, 26, 31, 44, 54, 55, 77, 93]
[17, 20, 26, 31, 44, 54, 55, 77, 93]

7、快速排序

使用分而治之的策略，但是不被使用额外的存储空间。
快速排序首先选择一个值，该值称为枢轴值。虽然有很多不同的方法来选择枢轴值，我们将
使用列表中的第一项。枢轴值的作用是帮助拆分列表。枢轴值属于最终排序列表（通常称为
拆分点）的实际位置，将用于将列表划分为快速排序的后续调用。

def quickSort(alist):
    quickSortHelper(alist, 0, len(alist) - 1)
    
def quickSortHelper(alist, first, last):
    if first < last:
        splitpoint = partition(alist, first, last)
        quickSortHelper(alist, first, splitpoint - 1)
        quickSortHelper(alist,splitpoint + 1, last)
    
def partition(alist, first, last):
    pivotvalue = alist[first]
    
    leftmark = first + 1
    rightmark = last
    done = False
    while not done:
        while leftmark <= rightmark and alist[leftmark] < pivotvalue:
            leftmark += 1
        while alist[rightmark] >= pivotvalue and rightmark >= leftmark:
            rightmark -= 1
            
        if rightmark < leftmark:
            done = True
        else:
            temp = alist[leftmark]
            alist[leftmark] = alist[rightmark]
            alist[rightmark] = temp
    
    temp = alist[first]
    alist[first] = alist[rightmark]
    alist[rightmark] = temp
    
    return rightmark

alist = [54,26,93,17,77,31,44,55,20]
quickSort(alist)
print(alist)

[17, 20, 26, 31, 44, 54, 55, 77, 93]

总结

对于有序和无序列表，顺序搜索是 O(n)。
在最坏的情况下，有序列表的二分查找是 O(log^n )。
哈希表可以提供恒定时间搜索。
冒泡排序，选择排序和插入排序是 O(n^2 )算法。
shell排序通过排序增量子列表来改进插入排序。它落在 O(n) 和 O(n^2 ) 之间。
归并排序是 O(nlog^n ），但是合并过程需要额外的空间。
快速排序是 O(nlog^n ），但如果分割点不在列表中间附近，可能会降级到O(n^2 ) 。它不需要额外的空间。

MrUncle德鲁

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
排序和搜索（2）

查找1、顺序查找列表无序输入一个列表和待搜索的数，返回一个布尔值，表示是否找到。复杂度为O(n)def sequentialSearch(alist, item): pos = 0 found = False while pos < len(alist) and not found: if alist[pos] == item: ...
复制链接

扫一扫