Cracking the Oyster
Question:“how do I sort a disk file?”
Mistake:”answer question directly,without doing something important.””
what to do before?
- A Friendly Conversation:know more about requirement and backdrop,the context will makes the problem clearer!
- Precise Problem Statement:
Input | OutPut | Constraint |
---|---|---|
A file containing at most n positive integets,each less than n,where n=10的7次方,It is a fatal error if any integer occurs twice in the input. No other data is associated with the integer. | A sort list in increasing order of the input integer. | At most a megabyte of storage is available in main memory;ample disk is available .The run time can be at most several minutes;a run time of ten seconds need not decreased. |
3. Program Design:
solution 1 | Solution 2 | Solution 3 |
---|---|---|
If use a general disk-based Merge Sort as a starting point but trims it to exploit the fact that we are sorting integers, but it might still take a few days to get the code up and running. | If we store each number in seven bytes, the we can store about 14,3000 numbers in the available megabyte。1M=1000KB, 1000*1000/7=143,000 | If we represent each number as a 32-bit integer,though,then we can store 250,000 numbers in the megabyte. 1 Byte=8 bit. we will therefore use a program that makes 40 passes over the input file. On the first pass it reads into memory any integer between 0 and 249,999, sorts the (at most)250,000 integers and writes them to the output file, and so on . but this scheme reads the file many times ,so wo would prefer the following scheme, reads the input just once,and use no intermediate files. Think about an appropriate representation. |
4. Implementation Sketch
The bitmap or bit vector representation of a set screams out to be used.We can represent a toy set of nonnegative integers less than 20 by a string of 20 bits.For instance,we can store the set{1,2,3,5,8,13}in this string:01110100100001000000。
Given the bitmap data structure to represent the set of integers in the file ,the program can be written in three natural phases.The first phase initializes the set to empty by turning off all bits.The second phase builds the set by reading each integer in the file and turning on the appropiate bit.The third phase produces the sorted file by inspecting each bit and writing out the appropriate integer if the bit is 1.