7.2 Insertion Sort
void InsertionSort( ElementType A[], int N )
{
int j,p;
ElementType Tmp;
for( p = 1; p < N; p++ )
{
Tmp = A[p];
// we compare A[j] & A[j-1], so j>0
for( j=p; j>0 && A[j-1] > Tmp; j--)
A[j] = A[j-1];
A[j] = Tmp;
}
}
Time complexity:
O(N*N)
7.3 A Lower Bound for Simple Sorting Algorithms
An inversion in an array of numbers is any ordered pair(i, j) having the property that i < j but A[i] > A[j].
Theorem: The average number of inversions in an array of N distinct number is N(N - 1)/4
Theorem: Any algorithm that sorts by exchanging adjacent elements requires Ω(N*N) time on average.
7.4 Shellsort
Shellsort is sometimes referred to as diminishing increment sort.
Shellsort uses a sequence, h1, h2, ..., ht, called increment sequence. Any increment sequence will do as long as h1 = 1.
After a phase, using some increment hk, for every i, we have A[i] <= A[i+hk]; all elements spaced hk apart are sorted. The file is said to be hk-sorted.
An important property of Shellsort is that an hk-sorted file is then h(k-1) sorted remains hk-sorted.
The action of an hk-sort is to perform an insertion sort on hk independent subarrays.
A popular(but poor) choice for increment sequence is: ht = [N/2], and hk = [h(k+1)/2].
void Shellsort( ElementType A[], int N )
{
int i,j,Increment;
ElementType Tmp;
for(Increment = N/2; Increment > 0; Increment /=2 )
// insert sort
for( i = Increment; i < N; i++)
{
Tmp = A[i];
for( j=i; j>=Increment; j-=Increment )
if(Tmp < A[j-Increment])
A[j] = A[j-Increment];
A[j] = Tmp;
}
}
7.4.1 Worst-Case Analysis of Shellsort
Theorem: The worst-case running time of Shellsort, using Shell's increment(ht = [N/2], and hk = [h(k+1)/2]), isΘ(N^2)
Hibbard's increments: 1, 3, 7, ..., 2^k - 1. This increment's consecutive increments have no common factors.
Theorem: The worst-case running time of Shellsort suing Hibbard's increments isΘ(N^(3/2))
7.5 Heapsort
Build a binary heap of N elements takes O(N). (Building a heap:http://en.wikipedia.org/wiki/Binary_heap)
We can perform N consecutive DeleteMin to get a sorted array. So the total running time isO(N*logN).
The main problem of heapsort is the use of an extra array, which requires extra spaces.
A clever way to avoid using a second array makes use of the fact that after each DeleteMin, the heap shrinks by 1. So we can store the deleted element at the end of the array. (To get a array in increasing order, we should useDeleteMax, and the put the first element at A[N], the second at A[N-1]...)
//max Heap implementation
#define LeftChild(i) (2*(i) + 1) // array starts with 0
// N is the current size, i is the root element
void PercDown( ElementType A[], int i, int N)
{
int Child;
ElementType Tmp;
for( Tmp = A[i]; LeftChild(i) < N; i = Child )
{
Child = LeftChild(i);
// max heap, so switch with larger child
// compare with A[Child+1], so LeftChild(i) must be smaller than N
if( Child != N-1 && A[ Child + 1] > A[Child])
Child++;
if(Tmp < A[Child])
A[i] = A[Child];
else
break;
}
A[i] = Tmp;
}
void Heapsort(ElementType A[], int N ) //N is the # of elements
{
int i;
// 从第一个有child的node开始PercDown
for( i = N/2; i>=0; i-- ) //build Heap
PercDown(A, i, N);
for( i = N-1; i>0; i-- ) //最后一个元素不需要PercDown
{
swap(&A[0], &A[i]);
PercDown(A, 0, i);
}
}
Binary Heap:
- The first element is A[0]: Left child is 2i +1, Right child is 2i + 2
- The first element is A[1]: Left child is 2i, Right child is 2i + 1
7.6 Mergesort
Mergesort runs in O(N*LogN) worst-case running time. The required space isO(N)
The fundamental operation in this algorithm is merging two sorted lists. The time to merge two sorted lists is clearly linear, because at most N - 1 comparisons are made.
This algorithm is a classic divide-and-conquer strategy.
Mergesort routine:
void MSort( ElementType A[], ElementType TmpArray[], int Left, int Right )
{
int Center;
if( Left < Right )
{
Center = (Left + Right)/2;
MSort(A, TmpArray, Left, Center);
MSort(A, TmpArray, Center + 1, Right);
Merge(A, TmpArray, Left, Center +1, Right);
}
}
void Mergesort( ElementType A[], int N )
{
ElementType *TmpArray;
TmpArray = malloc( N * sizeof(ElementType) );
if( TmpArray != NULL )
{
MSort(A, TmpArray, 0, N - 1 );
free( TmpArray );
}
else
FatalError( "No space for tmp array!!!" );
}
merge routine:
// Lpos = start of left half, Rpos = start of right half
void Merge( ElementType A[], ElementType TmpArray[], int Lpos, int Rpos, int RightEnd)
{
int i, Leftend, NumElements, TmpPos;
LeftEnd = Rpos - 1;
TmpPos = Lpos;
NumElements = RightEnd - Lpos + 1; // total # need to be merged
// main loop
while( Lpos <= LeftEnd && Rpos <= RightEnd )
if(A[Lpos] <= A[Rpos])
TmpArray[TmpPos++] = A[Lpos++];
else
TmpArray[TmpPos++] = A[Rpos++];
// copy rest of first/second half
while (Lpos <= LeftEnd)
TmpArray[TmpPos++] = A[Lpos++];
while (Rpos <= RightEnd)
TmpArray[TmpPos++] = A[Rpos++];
// Copy TmpArray back
for( i = 0; i < NumElements; i++, RightEnd-- )
A[RightEnd] = TmpArray[RightEnd];
}
7.7 Quicksort
As its name implies, quicksort is the fastest known sorting algorithm in practice.
It's average running time is O(N*logN), It hasO(N^2) worst case performance, but this can be made exponentially unlikely.
Like mergesort, quicksort is a divide-and-conquer recursive algorithm.
The basic algorithm to sort an array S consists of the following four steps:
1. If the number of elements in S is 0 or 1, then return.
2. Pick any element v in S, This is called pivot(中枢,支点).
3. Partition S - {v} (The remaining elements in S) into two disjoint groups: S1 = {x∈S - {v}| x≤v}, and S2 = {x∈S-{v}|x≥v}.
4. Return { quicksort(S1) followed by v followed by quicksort(S2) }.
The reason why quicksort is faster then mergesort is because the partitioning step can actually be performed in place and very efficiently.
7.7.1 Picking the Pivot
A wrong way
One choice is to use the first element as the pivot. This is acceptable if the input is random.
- If the input is presorted or in reverse order, then the pivot provides poor partitions.
A safe maneuver
An other way is to choose the pivot randomly.
- Random number generation is generally an expensive commodity and does not reduce the average running time of the rest of the algorithm at all
Median-of-Three Partitioning
The median of a group of N numbers is the [N/2] largest number, which is the best pivot. A good estimate can be obtained by picking three elements randomly and using the median of these three as pivot.
The randomness turns out not to help much, so the common course is to use as pivot the median of the left, right, and center elements.
7.7.2 Partitioning Strategy
1. Get the pivot element our of the way by swapping it with the last element.
2. While i is to the left of j, we move i right, skipping over elements that are smaller than the pivot. We move j left, skipping over elements that are larger than the pivot. When i and j have stopped, if i is to the left of j, those elements are swapped.
3. The final part of the partitioning is to swap the pivot element with the element pointed to by i.
We should stop i and j when they see a key equal to the pivot. And we should increase i, j after the swap to avoid infinite loop: swap(a[i++], a[j--]);
If we stop i/j, the total running time will be O(N logN) when all the elements are the same. Otherwise, the running time will beO(N^2)
7.7.3 Small Arrays
For very small arrays, quicksort does not perform as well as inserting sort. A good cutoff range is N = 10.
7.7.4 Actual Quicksort Routines
When select the pivot, we sort A[Left], A[Right] and A[Center]. We will put the smallest on A[Left], the largest on A[Right], and switch A[Center] with A[Right - 1]:
+ We will have sentinels on A[Left] and A[Right], so we don't need to worry out of boundary.
+ i will start with A[Left + 1], j will start with A[Right - 2]
Driver for quicksort:
void Quicksort( ElementType A[], int N )
{
Qsort(A, 0, N-1);
}
Code to perform median-of-three partitioning:
// return median of Left, Center, Right and hide the pivot
ElementType Median3( ElementType A[], int Left, int Right)
{
int Center = (Left + Right)/2;
if(A[Left] > A[Center])
swap(&A[Left], &A[Center]);
if(A[Left] > A[Right])
swap(&A[Left], &A[Right]);
if(A[Center] > A[Right])
swap(&A[Center], &A[Right]);
swap(&A[Center], &A[Right - 1]); // hide pivot
return A[Right - 1];
}
Mainquicksort routine:
#define Cutoff(3) //can be 3 ~ 20
void Qsort( ElementType A[], int Left, int Right)
{
int i,j;
ElementType Pivot;
if( Left + Cutoff <= Right)
{
Pivot = Median3(A, left, Right);
i = Left; j = Right -1;
for(;;)
{
// everytime, we will ++i and --j, even A[i] = Pivot
// so we won't have infinite loop
while(A[++i]<Pivot) {} //start from Left + 1
while(A[--j]<Pivot) {} //start from Right - 2
if(i < j)
swap(&A[i], A[j]); // swap should be inline function
else
break;
}
swap(&A[i], &A[Right-1]); //restore pivot
Qsort(A, Left, i - 1);
Qsort(A, i+1, Right);
}
else // do insertion sort on the subarray
InsertionSort(A+Left, Right - Left + 1);
}
A wrong way to implement quicksort:
i = Left + 1; j = Right -2;
for(;;)
{
// we won't do i++/j-- everytime in the for loop
// so we will have infinite loop when A[i]=A[j]=Pivot
while(A[i]<Pivot) i++;
while(A[j]>Pivot) j--;
if(i < j)
swap(&A[i], &A[j]);
else
break;
}
7.7.6 A Linear-Expected-Time Algorithm for Selection
Tofind kth largest element in an array, if we use priority queue(heap sort), wecan find it inO(N+klogN)
Ifwe use quickselect, which is similarto quicksort. The average running time isO(N)and the worst case isO(N^2)
Stepsof quicksort:
1.If |S| = 1, then k = 1 and return the element in S as the answer. If a cutofffor small arrays is used, then sort S and return kth smallest element.
2.Pick a pivot element, v∈S.
3.Partition S-{v} into S1 and S2, as was done with quicksort.
4.If k≤|S1|, return quickselect (S1, k). if k>|S1|, return (S2, k-|S1|-1)
In contrast to quicksort, quickselect makes only one recursive call instead of two.
// Choose the kth smallest element, as array starts with 0,
// we will select k-1
void Qselect( ElementType A[], int k, int Left, int Right )
{
int i,j;
ElementType Pivot;
if( Left + Cutoff <= Right )
{
Pivot = Median3(A, Left, Right);
i = Left; j = Right -1; // hide pivot in Median3
for(;;)
{
while(A[++i] < Pivot) {}
while(A[--j] < Pivot) {}
if( i<j )
Swap(&A[i], &A[j]);
else
break;
}
Swap( &A[i], &A[Right - 1]); // restore Pivot
// we always compare k with i, which is the Left side of the subarray
if( k <= i )
Qselect(A, k, Left, i-1);
else if( k > i+1 )
Qselect(A, k, i+1, Right);
}
else // do insertion sort on subarray
InsertionSort( A+Left, Right - Left + 1);
}
7.9 A General Lower Bound for Sorting
Anyalgorithm for sorting that uses only comparisons requires Ω(NlogN)comparisonsin the worst case, so that mergesort and heapsort are optimal to within aconstant factor.
7.9.1Decision Trees
Adecision tree is an abstraction usedto prove lower bounds.
Inour context, a decision tree is a binary tree, each node(state) represents aset of possible orderings, consistent with comparisons that have been made,among the elements. The results of the comparisons are the tree edges.
Different algorithm will have different decision tree.
Thenumber of comparisons used by the sorting algorithm is equal to the depth ofthe deepest leaf. The average number of comparisons used is equal to theaverage depth of the leaves.
Lemma7.1: Let T be a binary tree of depth d.Then T has at most2^d leaves.
Lemma7.2: A binary tree with L leaves musthave depth at least[log L]. (fromLemma 7.1)
Theorem7.6: Any sorting algorithm that usesonly comparisons between elements requires at least(log(N!))comparisons in the worst case.
proof: a decision tree to sort N elements must have N! leaves. (N个元素随意排序的组合: N!)
Theorem7.7: Any sorting algorithm that uses onlycomparisons between elements requiresΩ(NlogN). (log(N!)=Ω(NlogN))