查找
1、顺序查找
列表无序 输入一个列表和待搜索的数,返回一个布尔值,表示是否找到。 复杂度为O(n)
def sequentialSearch ( alist, item) :
pos = 0
found = False
while pos < len ( alist) and not found:
if alist[ pos] == item:
found = True
else :
pos += 1
return found
alist = [ 1 , 2 , 3 , 4 , 6 , 7 , 97 , 4 , 3 , 23 , 22 ]
sequentialSearch( alist, 0 )
False
def orderdSequentialSearch ( alist, item) :
pos = 0
found = False
stop = False
while pos < len ( alist) and not found and not stop:
if alist[ pos] == item:
found = True
else :
if alist[ pos] < item:
pos += 1
else :
stop = True
return found
alist = [ 1 , 3 , 4 , 5 , 7 , 65 , 90 ]
orderdSequentialSearch( alist, 90 )
True
2、二分查找
二分查找从中间开始查找,斌且每次查找都会将查找的范围减半 但是要求带搜寻的列表是有序的。 O(log^n)
def binarySearch ( alist, item) :
first = 0
last = len ( alist) - 1
found = False
while not found and first <= last:
mid = ( first + last) // 2
if alist[ mid] == item:
found = True
else :
if alist[ mid] > item:
last = mid - 1
else :
first = mid + 1
return found
alist = [ 0 , 1 , 2 , 8 , 13 , 18 , 32 , 43 ]
print ( binarySearch( alist, 2 ) )
print ( binarySearch( alist, 43 ) )
print ( binarySearch( alist, 20 ) )
True
True
False
def binarySearch ( alist, item) :
if len ( alist) == 0 :
return False
else :
midpoint = len ( alist) // 2
if alist[ midpoint] == item:
return True
else :
if alist[ midpoint] > item:
return binarySearch( alist[ : midpoint] , item)
else :
return binarySearch( alist[ midpoint+ 1 : ] , item)
alist = [ 0 , 1 , 2 , 8 , 13 , 18 , 32 , 43 ]
print ( binarySearch( alist, 2 ) )
print ( binarySearch( alist, 43 ) )
print ( binarySearch( alist, 20 ) )
True
True
False
3、Hash查找
哈希表的每一个位置,通常称为一个槽,可以容纳一个项,并且由从0开始的整数命名。 项和该项在散列表中的槽之间的映射关系称为hash函数。 负载因子 : 已经存在项的数目/表的大小碰撞(冲突) :指两个项的哈希值相同,会占用同一个槽完美散列函数 :给定项的集合,将每个项映射到唯一槽的散列函数.哈希函数有很多种:常见的有简单余数法、分组求和法、平方取中法等
def hash ( astring, tablesize) :
sum = 0
for pos in range ( len ( astring) ) :
sum += ord ( astring[ pos] )
return sum % tablesize
print ( hash ( 'cat' , 11 ) )
print ( hash ( 'act' , 11 ) )
4
4
可以发现上述函数有不足之处,为此进行改进,使用每个字符的位置作为权重
def hash ( astring, tablesize) :
sum = 0
for pos in range ( len ( astring) ) :
sum += ( pos + 1 ) * ord ( astring[ pos] )
return sum % tablesize
print ( hash ( 'cat' , 11 ) )
print ( hash ( 'act' , 11 ) )
3
5
冲突解决:当两个项的哈希值相同时,必须有一个方法将第二个项方式散列表中,这个过程称为冲突解决。 方法1:线形探测的开放寻址技术
定义:如果某个项的哈希值冲突了,则在哈希表中从该哈希值依次搜索,直至遇到一个空槽。 缺点:产生聚集的趋势,项在表中局部聚集,分布不够均匀 方法2:扩展线形探测
不在是按顺序依次搜寻,而是跳过一定的槽。跳过的大小人为指定,但是要保证哈希表中的所有槽都可以被访问。否则,表的一部分将不会被使用。为了确保这一点,表的大小通常设置位素数。
实现map抽象数据类型
Map()创建一个新的map,返回一个空的map集合 put(key, val)添加一个新的键值对。如果键已经存在,则用新值替换旧值。 get(key)给定一个键,返回对应的值 del 使用del map[key] 的形式删除键值对 len()返回键值对的个数 in 判断是否存在某个键值对
class HashTable :
def __init__ ( self) :
self. size = 11
self. slots = [ None ] * self. size
self. data = [ None ] * self. size
def put ( self, key, data) :
hashvalue = self. hashfunction( key, len ( self. slots) )
if self. slots[ hashvalue] == None :
self. slots[ hashvalue] = key
self. data[ hashvalue] = data
else :
nextslot = self. rehash( hashvalue, len ( self. slots) )
while self. slots[ nextslot] != None and self. slots[ nextslot] != key:
nextslot = self. rehash( nextslot, len ( self. slots) )
if self. slots[ nextslot] == None :
self. slots[ nextslot] = key
self. data[ nextslot] = data
else :
self. data[ nextslot] = data
def hashfunction ( self, key, size) :
return key% size
def rehash ( self, oldhash, size) :
return ( oldhash + 1 ) % size
def get ( self, key) :
startslot = self. hashfunction( key, len ( self. slots) )
data = None
stop = False
found = False
position = startslot
while self. slots[ position] != None and not found and not stop:
if self. slots[ position] == key:
found = True
data = self. data[ position]
else :
position = self. rehash( position, len ( self. slots) )
if position == startslot:
stop = True
return data
def __getitem__ ( self, key) :
return self. get( key)
def __setitem__ ( self, key, data) :
self. put( key, data)
H = HashTable( )
H[ 54 ] = 'cat'
H[ 26 ] = 'dog'
H[ 93 ] = 'lion'
H[ 17 ] = 'tiger'
H[ 77 ] = 'bird'
H[ 31 ] = 'cow'
H[ 44 ] = 'goat'
H[ 55 ] = 'pig'
H[ 20 ] = 'chicken'
print ( H. slots)
print ( H. data)
[77, 44, 55, 20, 26, 93, 17, None, None, 31, 54]
['bird', 'goat', 'pig', 'chicken', 'dog', 'lion', 'tiger', None, None, 'cow', 'cat']
print ( H[ 20 ] )
print ( H[ 17 ] )
chicken
tiger
H[ 20 ] = 'duck'
print ( H. data)
['bird', 'goat', 'pig', 'duck', 'dog', 'lion', 'tiger', None, None, 'cow', 'cat']
print ( H[ 99 ] )
None
排序
1、冒泡排序
def bubbleSort ( alist) :
for passnum in range ( len ( alist) - 1 , 0 , - 1 ) :
for i in range ( passnum) :
if alist[ i] > alist[ i+ 1 ] :
alist[ i] , alist[ i+ 1 ] = alist[ i+ 1 ] , alist[ i]
alist = [ 54 , 26 , 93 , 17 , 77 , 31 , 44 , 55 , 20 , 87 ]
bubbleSort( alist)
print ( alist)
[17, 20, 26, 31, 44, 54, 55, 77, 87, 93]
2、短冒泡排序
def shortBubbleSort ( alist) :
exchanges = True
passnum = len ( alist) - 1
while passnum > 0 and exchanges:
exchanges = False
for i in range ( passnum) :
if alist[ i] > alist[ i+ 1 ] :
exchanges = True
temp = alist[ i]
alist[ i] = alist[ i+ 1 ]
alist[ i+ 1 ] = temp
passnum -= 1
alist= [ 20 , 30 , 40 , 90 , 50 , 60 , 70 , 80 , 100 , 110 ]
shortBubbleSort( alist)
print ( alist)
[20, 30, 40, 50, 60, 70, 80, 90, 100, 110]
3、选择排序
每次遍历寻找带搜寻列表中额最大值,并将其放于正确的位置 相比于冒泡排序,他们之间比较的次数相同,但是交换的次数并不相同。 O(n^2)
def selectionSort ( alist) :
for fillslot in range ( len ( alist) - 1 , 0 , - 1 ) :
positionOfMax = 0
for location in range ( 1 , fillslot+ 1 ) :
if alist[ location] > alist[ positionOfMax] :
positionOfMax = location
temp = alist[ fillslot]
alist[ fillslot] = alist[ positionOfMax]
alist[ positionOfMax] = temp
alist = [ 54 , 26 , 93 , 17 , 77 , 31 , 44 , 55 , 20 ]
selectionSort( alist)
print ( alist)
[17, 20, 26, 31, 44, 54, 55, 77, 93]
4、插入排序
它始终在列表的较低位置维护一个排序 的子列表。然后将每个新项 “插入” 回先前的子列表,使得排序的子列表称为较大的一个项 O(n^2)
def insertionSort ( alist) :
for index in range ( 1 , len ( alist) ) :
currentvalue = alist[ index]
position = index
while position > 0 and alist[ position- 1 ] > currentvalue:
alist[ position] = alist[ position- 1 ]
position = position - 1
alist[ position] = currentvalue
alist = [ 54 , 26 , 93 , 17 , 77 , 31 , 44 , 55 , 20 ]
insertionSort( alist)
print ( alist)
[17, 20, 26, 31, 44, 54, 55, 77, 93]
5、希尔排序
(有时称为“递减递增排序”) 通过将原始列表分解为多个较小的子列表来改进插入排 序,每个子列表使用插入排序进行排序。 希尔排序的复杂度随着增量的改变而不同,处于O(n)~O(n^2)之间
def shellSort ( alist) :
sublistcount = len ( alist) // 2
while sublistcount > 0 :
for startposition in range ( sublistcount) :
gapInsertionSort( alist, startposition, sublistcount)
print ( "after increments of size " , sublistcount, " The list is " , alist)
sublistcount = sublistcount // 2
def gapInsertionSort ( alist, start, gap) :
for i in range ( start + gap, len ( alist) , gap) :
currentvalue = alist[ i]
position = i
while position >= gap and alist[ position - gap] > currentvalue:
alist[ position] = alist[ position- gap]
position = position - gap
alist[ position] = currentvalue
alist = [ 54 , 26 , 93 , 17 , 77 , 31 , 44 , 55 , 20 ]
shellSort( alist)
print ( alist)
after increments of size 4 The list is [20, 26, 44, 17, 54, 31, 93, 55, 77]
after increments of size 2 The list is [20, 17, 44, 26, 54, 31, 77, 55, 93]
after increments of size 1 The list is [17, 20, 26, 31, 44, 54, 55, 77, 93]
[17, 20, 26, 31, 44, 54, 55, 77, 93]
6、归并排序
属于一种递归算法,分而治之的策略,不断地将列表拆分为一半。如果列表为空 或有一个项,则按定义(基本情况) 进行排序。如果列表有多个项,我们分割列表,并递归 调用两个半部分的合并排序。 一旦对这两半排序完成,就执行称为合并的基本操作。合并是 获取两个较小的排序列表并将它们组合成单个排序的新列表的过程。 包括两个主要的过程,一个是列表二分,一个是合并 复杂度为O(nlog^n) mergeSort 函数需要额外的空间来保存两个半部分,因为它们是使用切片操 作提取的。如果列表很大,这个额外的空间可能是一个关键因素,并且在处理大型数据集时 可能会导致此类问题。
def mergeSort ( alist) :
print ( "Splitting " , alist)
if len ( alist) > 1 :
mid = len ( alist) // 2
lefthalf = alist[ : mid]
righthalf = alist[ mid: ]
mergeSort( lefthalf)
mergeSort( righthalf)
i, j, k = 0 , 0 , 0
while i < len ( lefthalf) and j < len ( righthalf) :
if lefthalf[ i] < righthalf[ j] :
alist[ k] = lefthalf[ i]
i = i+ 1
else :
alist[ k] = righthalf[ j]
j = j+ 1
k = k+ 1
while i < len ( lefthalf) :
alist[ k] = lefthalf[ i]
i += 1
k += 1
while j < len ( righthalf) :
alist[ k] = righthalf[ j]
j += 1
k += 1
print ( "merging " , alist)
alist = [ 54 , 26 , 93 , 17 , 77 , 31 , 44 , 55 , 20 ]
mergeSort( alist)
print ( alist)
Splitting [54, 26, 93, 17, 77, 31, 44, 55, 20]
Splitting [54, 26, 93, 17]
Splitting [54, 26]
Splitting [54]
merging [54]
Splitting [26]
merging [26]
merging [26, 54]
Splitting [93, 17]
Splitting [93]
merging [93]
Splitting [17]
merging [17]
merging [17, 93]
merging [17, 26, 54, 93]
Splitting [77, 31, 44, 55, 20]
Splitting [77, 31]
Splitting [77]
merging [77]
Splitting [31]
merging [31]
merging [31, 77]
Splitting [44, 55, 20]
Splitting [44]
merging [44]
Splitting [55, 20]
Splitting [55]
merging [55]
Splitting [20]
merging [20]
merging [20, 55]
merging [20, 44, 55]
merging [20, 31, 44, 55, 77]
merging [17, 20, 26, 31, 44, 54, 55, 77, 93]
[17, 20, 26, 31, 44, 54, 55, 77, 93]
7、快速排序
使用分而治之的策略,但是不被使用额外的存储空间。 快速排序首先选择一个值,该值称为 枢轴值 。虽然有很多不同的方法来选择枢轴值,我们将 使用列表中的第一项。枢轴值的作用是帮助拆分列表。枢轴值属于最终排序列表(通常称为 拆分点) 的实际位置,将用于将列表划分为快速排序的后续调用。
def quickSort ( alist) :
quickSortHelper( alist, 0 , len ( alist) - 1 )
def quickSortHelper ( alist, first, last) :
if first < last:
splitpoint = partition( alist, first, last)
quickSortHelper( alist, first, splitpoint - 1 )
quickSortHelper( alist, splitpoint + 1 , last)
def partition ( alist, first, last) :
pivotvalue = alist[ first]
leftmark = first + 1
rightmark = last
done = False
while not done:
while leftmark <= rightmark and alist[ leftmark] < pivotvalue:
leftmark += 1
while alist[ rightmark] >= pivotvalue and rightmark >= leftmark:
rightmark -= 1
if rightmark < leftmark:
done = True
else :
temp = alist[ leftmark]
alist[ leftmark] = alist[ rightmark]
alist[ rightmark] = temp
temp = alist[ first]
alist[ first] = alist[ rightmark]
alist[ rightmark] = temp
return rightmark
alist = [ 54 , 26 , 93 , 17 , 77 , 31 , 44 , 55 , 20 ]
quickSort( alist)
print ( alist)
[17, 20, 26, 31, 44, 54, 55, 77, 93]
总结
对于有序和无序列表,顺序搜索是 O(n)。 在最坏的情况下,有序列表的二分查找是 O(log^n )。 哈希表可以提供恒定时间搜索。 冒泡排序,选择排序和插入排序是 O(n^2 )算法。 shell排序通过排序增量子列表来改进插入排序。它落在 O(n) 和 O(n^2 ) 之间。 归并排序是 O(nlog^n ) ,但是合并过程需要额外的空间。 快速排序是 O(nlog^n ) ,但如果分割点不在列表中间附近,可能会降级到O(n^2 ) 。它不需要额外的空间。