题目描述
题目链接:POJ 1007 DNA Sorting
DNA Sorting
Time Limit: 1000MS
Memory Limit: 10000K
Total Submissions: 46191
Accepted: 18037
Description
One measure of ``unsortedness'' in a sequence is the number of pairs of entries that are out of order with respect to each other. For instance, in the letter sequence ``DAABEC'', this measure is 5, since D is greater than four letters to its right and E is greater than one letter to its right. This measure is called the number of inversions in the sequence. The sequence ``AACEDGG'' has only one inversion (E and D)---it is nearly sorted---while the sequence ``ZWQM'' has 6 inversions (it is as unsorted as can be---exactly the reverse of sorted).
You are responsible for cataloguing a sequence of DNA strings (sequences containing only the four letters A, C, G, and T). However, you want to catalog them, not in alphabetical order, but rather in order of ``sortedness'', from ``most sorted'' to ``least sorted''. All the strings are of the same length.
Input
The first line contains two integers: a positive integer n (0 < n <= 50) giving the length of the strings; and a positive integer m (0 < m <= 100) giving the number of strings. These are followed by m lines, each containing a string of length n.
Output
Output the list of input strings, arranged from ``most sorted'' to ``least sorted''. Since two strings can be equally sorted, then output them according to the orginal order.
Sample Input
10 6 AACATGAAGG TTTTGGCCAA TTTGGCCAAA GATCAGATTT CCCGGGGGGA ATCGATGCAT
Sample Output
CCCGGGGGGA AACATGAAGG GATCAGATTT ATCGATGCAT TTTTGGCCAA TTTGGCCAAA
Source
East Central North America 1998
解题分析
这道题常规解法需要将每行输入中的字母两两比较一下,才能得出这行输入的"unsortedness",作为这行输入的key,然后对所有的输入按照key进行稳定排序。总体时间复杂度T(n,m) = O(n!)O(m) + O(mlogm)。
常规算法的复杂度令人难以满意。注意到这一点:输入的字母只包括ACGT四个字母。再加上输入的n比较小,可以使用数组这种随机访问数据结构,我们可以一次构建一个O(n)的算法,来计算一行输入的"unsortedness":
- 每读取一个输入,将这个字母的出现次数加1;
- 如果这个字母是T,则什么也不做;
- 如果是G,则将unsortedness加上已经出现过的T的次数;
- 如果是C,则将unsortedness加上已经出现过的T的次数和G的次数;
- 如果是A,则将unsortedness加上已经出现过的T的次数和G的次数和C的次数;
下面给出这道题的伪代码:
Procedure POJ1007 Begin Read the number n and m Dim array as Pair<key, value> array For i from 0 to m Begin line <- ReadLine unsortedness <- 0 occurs[A] <- 0 occurs[C] <- 0 occurs[G] <- 0 occurs[T] <- 0 For ch in line Begin occurs[ch] <- occurs[ch] + 1 Switch ch Begin case T: break case G: occurs[T] <- occurs[T] + 1 break case C: occurs[T] <- occurs[T] + 1 occurs[G] <- occurs[G] + 1 break case A: occurs[T] <- occurs[T] + 1 occurs[G] <- occurs[G] + 1 occurs[C] <- occurs[C] + 1 break End Switch unsortedness <- unsortedness + occurs[ch] End For put <unsortness, line> into array End For stable_sort array and output result End Procedure
总结
应该充分的利用已知条件,一些特殊的条件可以极大的改进我们的算法效率。
快速排序不是稳定排序。