生物信息科学中,作者把DNA看作是以 Σ={
A,C,G,T} Σ = { A , C , G , T } 为字符集的序列,或者(以计算机专业词汇所称呼的)字符串.
本书讲述了很多 string technique string technique ,尤其是 inperfect data inperfect data 的处理,等等,故闲余之下以心拜读.
§ § 1 Exact Matchting .完全匹配
1.1.1 一些简单的定义
Given a string P called pattern and long string T called text, the exact matching problem is to find all occurences, if any, of pattern P in text T.
例如 P=“aba”,T=“cabac” P = “aba” , T = “cabac” ,那么 T T 能匹配到 .
Definition (string) A string S S is an ordered list of characters written continguously from left to right.
字符串定义
Definition (substring) For any string , S[i...j] S [ i . . . j ] is the (contingurous) substring of S S that starts at position and ends at position j j of .
子串定义
Definition (prefix/suffix) In particular, S[1...i] S [ 1... i ] is the prefix of string S S that ends at position , and S[j...|S|] S [ j . . . | S | ] is the suffix of string S S that begins at position , where |S| | S | denotes the number of characters in string S S .
前缀后缀定义
Definition (empty string) S[i…j] is the empty string if .
空串定义
Definition (proper prefix/proper suffix/proper substring) A proper prefix/suffix/substring is repectively,a prefix/suffix/substring that is neither the entire of the string S S , nor the empty string.
真前缀/后缀/子串定义
Definition (i-th character) For any string denotes the i-th character of string S S .
函数定义
符号习惯:
我就不讲了…应该都是很朴素且通用的符号,如果有人需要我解释,我就把一些符号的含义贴在这吧…
1.1.2 暴力匹配
显然如果 |P|=n | P | = n , |T|=m | T | = m ,那么下面暴力匹配的方法的复杂度将会是 Θ(nm) Θ ( n m ) .
SUBSTRING(string P,string T)
p ← 1
t ← 1
for t from 1 to |T|-|P|+1
for p from 1 to |P| or until pattern false
if T(t+p-1) ≠ P(p) then pattern false
if patterned then return true
return false
bool substring(string &P,string &T){
int endRef=T.length()-P.length();
for(int i=0,matched;i<=endRef;i++){
matched=1;
for(int j=0;matched&&j<P.length();j++){
if(P[j]!=T[i+j])matched=0;
}
if(matched)return true;
}
return false;
}
于是我们需要尝试使用更优秀的算法去完成完全匹配的任务
1.1.3 Z-Algorithm
definition Zi(S)(i>1) Z