Quicksort
Various algorithms exist for thesorting of a given list of numbers/strings.Some of the algorithms are quite simple such as bubble sort, whileothers are more complex. Each algorithm has its own pros and cons,and no single algorithm can be nominated as the best. Quicksort isone of the well known sorting algorithms and is the subject of thiswriteup.
Problem: We are given aset of numbers that must be arranged in non-decreasing order. Weneed to use Quicksort for the solution.
Note: In the text below, lg(n) means log of nto base 2.
Solution #1: Basic Quicksort, recursive.
Quicksort was developed by C.A.R. Hoare and isa widely used sorting technique, based on divide and conquer. Itis an O(n*n) algorithm in the worst case, which is as bad asinsertion and selection sorts. However, its popularity derivesfrom the fact that it has an excellent average case behaviour ofO(n*log(n)). In practice, Quicksort almost always achieves thisbehaviour.
Quicksort first selects an element that is tobe used as split-point from the list of given numbers. All thenumbers smaller than the split-point are brought to one side ofthe list and the rest on the other. This operation is calledsplitting.
Consider the list: 4 3 1 7 5 9 6 2 8
Now, assume the first element of the list, 4 asthe split-point. Then, after splitting the list will be arrangedas:
3 1 2 4 7 5 9 6 8
Or in some other order, say: 2 3 1 4 ...
That is, the exact order is immaterial. What isimportant is that the order should be:
- Elements less than split-point element.
- Split-point element.
- Elements greater than split-point element.
After this, the list is divided into lists, onecontaining the elements less than the split-point element and theother containing the elements more than the split-point element.These two lists are again individually sorted using Quicksort,that is by again finding a split-point and dividing into twoparts etc. In the program given below, we will let recursion sortthe smaller lists.
In the example, the two sub-lists, are:
2 3 1 and 7 5 9 6 8.
Let us consider another example,this time in more detail.
The list to be sorted is shown atthe top. The situation after each of the split operations is also shown, andfor the first split - all the comparison-exchanges are shown in more detail.The number with a bold square around it is the split point and the partof the list shown in pink is the part that’s being worked upon inthat split operation.
In the original list 53 is chosen as thesplit point. One by one each number is compared to the split point. If itsgreater, the list is left as is. If not, the splitpoint is moved one stepto the right and the number in consideration exchanged with the splitpoint.
At the end of the split operation, thesplitpoint element is brought into the middle.
Please make sure at this point you understandhow the splitting is happening, based on the drawing above.
Now let us see the program:
#include <stdio.h> #define MAXELT 50 typedef int key; typedef int index; key list[MAXELT]; void main() { int i=-1,j,n; char t[10]; void quicksort(index,index); do { //read the list of numbers if (i!=-1) list[i++]=n; else i++; printf("\nEnter the numbers <End by #>"); fflush(stdin); scanf("%[^\n]",t); if (sscanf(t,"%d",&n)<1) break; } while (1); quicksort(0,i-1); //sort printf("The list obtained is "); //print for (j=0;j<i;j++) printf("\n %d",list[j]); } void quicksort(index first,index last) { void split(index,index,index*); index splitpoint; if (first<last) { split(first,last,&splitpoint); //split quicksort(first,splitpoint-1); //sort the first sub-list quicksort(splitpoint+1,last); //sort the second sub-list } } void split(index first,index last,index *splitpoint) { key x; index unknown; void interchange(int*,int*); x=list[first]; //find the split-point element *splitpoint=first; //find the splitpoint for (unknown=first+1;unknown<=last;unknown++) { //move through the list if (list[unknown]<x) { //comparing each element //with the split-point element *splitpoint=(*splitpoint)+1; //move the splitpoint interchange(&list[*splitpoint],&list[unknown]); //move the split-point element } } interchange(&list[first],&list[*splitpoint]); //get the split-point element //in the middle return; } void interchange(int *x,int *y) //interchange { int temp; temp=*x; *x=*y; *y=temp; } //solution 1 ends
As you can see, Quicksort is not an inplacesort, since it needs extra space to store the recursion stack.Also, it has a very bad worst case behaviour. As implementedabove, the worst case occurs when the list is already in order,or if it is in the reverse order. This makes Quicksort anunnatural sort, since it has to do most work when all the work isalready done! However, in practice, Quicksort works quite well.You can include counters to count the number of key-comparisons/swapsdone to find its behaviour and test it with different types ofinput data. The amount of extra storage space needed is O(n).
Quicksort can keep doing swaps of elements withequal keys. At times it may be important that equal keys remainin the same order as in the original list. So, to avoidthis, the following scheme may be used:
'Attach to each record its sequence number inthe original file. This number is made a part of the key in such away that it is the least significant part. This keeps the recordsin the same relative order as they were in input.'
In particular, all keys can be changed from A(i)to A(i)*n+i (i starts from zero) to make all keys distinct. This,after sorting can be changed to floor(A(i)/n). The equal keyswill be in the same relative order as before.
Solution #1: Modifed Quicksort, non-recursive.
In our second attempt towards implementation wewill make a number of improvements:
- First we will see how we can use a stack to removerecursion.
- We will choose the split- point more carefully,as a median of the first, the middle and the last elements. This iswill reduce the probability of hitting worst case behaviour.
- We will try a different splitting strategy.
- We will look at the sizes of the sub-lists and stack only one, smallersub-list. This will make the worst case stack space come down toO(lg(n)) from O(n).
- To sort the sub-lists of size less than10, we will use insertion sort. This will also speed up theprocess, since for small lists, insertion sort is better thanQuicksort.
Let us look at the program:
#include <stdio.h> #define MAXELT 100 #define INFINITY 32760 //numbers in list should not exceed //this. change the value to suit your //needs #define SMALLSIZE 10 //not less than 3 #define STACKSIZE 100 //should be ceiling(lg(MAXSIZE)+1) int list[MAXELT+1]; //one extra, to hold INFINITY struct { //stack element. int a,b; } stack[STACKSIZE]; int top=-1; //initialise stack void main() //overhead! { int i=-1,j,n; char t[10]; void quicksort(int); do { if (i!=-1) list[i++]=n; else i++; printf("\nEnter the numbers <End by #>"); fflush(stdin); scanf("%[^\n]",t); if (sscanf(t,"%d",&n)<1) break; } while (1); quicksort(i-1); printf("The list obtained is "); for (j=0;j<i;j++) printf("\n %d",list[j]); } void interchange(int *x,int *y) //swap { int temp; temp=*x; *x=*y; *y=temp; } void split(int first,int last,int *splitpoint) { int x,i,j,s,g; //here, atleast three elements are needed if (list[first]<list[(first+last)/2]) { //find median s=first; g=(first+last)/2; } else { g=first; s=(first+last)/2; } if (list[last]<=list[s]) x=s; else if (list[last]<=list[g]) x=last; else x=g; interchange(&list[x],&list[first]); //swap the split-point element //with the first x=list[first]; i=first; //initialise j=last+1; while (i<j) { do { //find j j--; } while (list[j]>x); do { i++; //find i } while (list[i]<x); interchange(&list[i],&list[j]); //swap } interchange(&list[i],&list[j]); //undo the extra swap interchange(&list[first],&list[j]); //bring the split-point //element to the first *splitpoint=j; } void push(int a,int b) //push { top++; stack[top].a=a; stack[top].b=b; } void pop(int *a,int *b) //pop { *a=stack[top].a; *b=stack[top].b; top--; } void insertion_sort(int first,int last) { int i,j,c; for (i=first;i<=last;i++) { j=list[i]; c=i; while ((list[c-1]>j)&&(c>first)) { list[c]=list[c-1]; c--; } list[c]=j; } } void quicksort(int n) { int first,last,splitpoint; push(0,n); while (top!=-1) { pop(&first,&last); for (;;) { if (last-first>SMALLSIZE) { //find the larger sub-list split(first,last,&splitpoint); //push the smaller list if (last-splitpoint<splitpoint-first) { push(first,splitpoint-1); first=splitpoint+1; } else { push(splitpoint+1,last); last=splitpoint-1; } } else { //sort the smaller sub-lists //through insertion sort insertion_sort(first,last); break; } } } //iterate for larger list } //end of solution 2
The program removes many disadvantages of theprevious program. Still, it is not necessarily an improvement.While using Quicksort for your own problem, think before usingthe second program. If your language implements recursion well,you may decide to go for a recursive version. However, you maydecide to use the median-of-three and the look-before-pushing-in-stackimprovements. Also, you may decide to keep one recursive call andremove only tail recursion which is more in-efficient and easierto remove without stack. You can also consider which of the twosplit functions to use. The first one does not need the INFINITYmark at the end of the list. However, it is slightly slower.
The way this program is written, the stacknever requires more than 2*ceiling(lg(n)). If we want to save theextra comparison in determining the larger sub-list, we need tostore only the 'first' of the new list on the stack. (If we workimmediately on the second sub-list, the second of the first sub-listcan be known from the first of the second sub-list by subtracting2).
Even in this case, one of the first and thelast can be saved (depending on the sub-list on which we workimmediately) with either a boolean variable to tell us what isstored or using a negative value of the index.
Now, for a comparison between thetwo programs. I created two different lists, each containing10000 numbers (numbers selected randomly from between 1 and 10000).The first implementation took 2880275 and 158007 comparisonsrespectively. The second implementation took around 128395 and143583 comparisons.
This indicates that the worst casebehaviour has improved with the changes we made, as we thought theywould. Have a look at this graphs.
While the first implementation issomewhere midway between O(n*n) and O(nlog(n)), the secondimplementation is much close to O(nlog(n)). Beautiful, isn’t it?(The graphs have been created usingHSB Grapher).
So much for sorting! Thanks for reading.