POJ 1007 解题分析

最新推荐文章于 2024-09-18 23:25:39 发布

anjiao62582

最新推荐文章于 2024-09-18 23:25:39 发布

阅读量119

点赞数

文章标签：数据结构与算法

原文链接：http://www.cnblogs.com/HCOONa/archive/2010/07/12/poj-1007.html

版权

Technorati 标签: ACM, POJ

题目描述

题目链接：POJ 1007 DNA Sorting

DNA Sorting

Time Limit: 1000MS
Memory Limit: 10000K

Total Submissions: 46191
Accepted: 18037

Description

One measure of ``unsortedness'' in a sequence is the number of pairs of entries that are out of order with respect to each other. For instance, in the letter sequence ``DAABEC'', this measure is 5, since D is greater than four letters to its right and E is greater than one letter to its right. This measure is called the number of inversions in the sequence. The sequence ``AACEDGG'' has only one inversion (E and D)---it is nearly sorted---while the sequence ``ZWQM'' has 6 inversions (it is as unsorted as can be---exactly the reverse of sorted).
You are responsible for cataloguing a sequence of DNA strings (sequences containing only the four letters A, C, G, and T). However, you want to catalog them, not in alphabetical order, but rather in order of ``sortedness'', from ``most sorted'' to ``least sorted''. All the strings are of the same length.

Input

The first line contains two integers: a positive integer n (0 < n <= 50) giving the length of the strings; and a positive integer m (0 < m <= 100) giving the number of strings. These are followed by m lines, each containing a string of length n.

Output

Output the list of input strings, arranged from ``most sorted'' to ``least sorted''. Since two strings can be equally sorted, then output them according to the orginal order.

Sample Input

10 6
AACATGAAGG
TTTTGGCCAA
TTTGGCCAAA
GATCAGATTT
CCCGGGGGGA
ATCGATGCAT

Sample Output

CCCGGGGGGA
AACATGAAGG
GATCAGATTT
ATCGATGCAT
TTTTGGCCAA
TTTGGCCAAA

Source

East Central North America 1998

解题分析

这道题常规解法需要将每行输入中的字母两两比较一下，才能得出这行输入的"unsortedness"，作为这行输入的key，然后对所有的输入按照key进行稳定排序。总体时间复杂度T(n,m) = O(n!)O(m) + O(mlogm)。

常规算法的复杂度令人难以满意。注意到这一点：输入的字母只包括ACGT四个字母。再加上输入的n比较小，可以使用数组这种随机访问数据结构，我们可以一次构建一个O(n)的算法，来计算一行输入的"unsortedness"：

每读取一个输入，将这个字母的出现次数加1；
如果这个字母是T，则什么也不做；
如果是G，则将unsortedness加上已经出现过的T的次数；
如果是C，则将unsortedness加上已经出现过的T的次数和G的次数；
如果是A，则将unsortedness加上已经出现过的T的次数和G的次数和C的次数；

下面给出这道题的伪代码：

Procedure POJ1007 Begin
	Read the number n and m
	Dim array as Pair<key, value> array
	For i from 0 to m Begin
		line <- ReadLine
		unsortedness <- 0
		occurs[A] <- 0
		occurs[C] <- 0
		occurs[G] <- 0
		occurs[T] <- 0
		For ch in line Begin
			occurs[ch] <- occurs[ch] + 1
			Switch ch Begin
				case T:
					break
				case G:
					occurs[T] <- occurs[T] + 1
					break
				case C:
					occurs[T] <- occurs[T] + 1
					occurs[G] <- occurs[G] + 1
					break
				case A:
					occurs[T] <- occurs[T] + 1
					occurs[G] <- occurs[G] + 1
					occurs[C] <- occurs[C] + 1
					break
			End Switch
			unsortedness <- unsortedness + occurs[ch]
		End For
		put <unsortness, line> into array
	End For
	stable_sort array and output result
End Procedure