Problem
As is the case with point mutations, the most common type of sequencing error occurs when a single nucleotide from a read is interpreted incorrectly.
Given: A collection of up to 1000 reads of equal length (at most 50 bp) in FASTA format. Some of these reads were generated with a single-nucleotide error. For each read in the dataset, one of the following applies:
- was correctly sequenced and appears in the dataset at least twice (possibly as a reverse complement);
- is incorrect, it appears in the dataset exactly once, and its Hamming distance is 1 with respect to exactly one correct read in the dataset (or its reverse complement).
Return: A list of all corrections in the form "[old read]->[new read]". (Each correction must be a single symbol substitution, and you may return the corrections in any order.)
与点突变一样,当错误解读读物中的单个核苷酸时,会发生最常见的测序错误。
给出:以FASTA格式最多收集1000个等长(最多50 bp)的等长读段。这些读数中的一些是单核苷酸错误产生的。对于数据集中的每次读取,适用以下条件之一:
返回值:所有更正的列表,形式为“ [旧读]-> [新读]”。(每个更正必须是单个符号替换,并且您可以按任何顺序返回更正。)
Sample Dataset
>Rosalind_52
TCATC
>Rosalind_44
TTCAT
>Rosalind_68
TCATC
>Rosalind_28
TGAAA
>Rosalind_95
GAGGA
>Rosalind_66
TTTCA
>Rosalind_33
ATCAA
>Rosalind_21
TTGAT
>Rosalind_18
TTTCC
Sample Output
TTCAT->TTGAT
GAGGA->GATGA
TTTCC->TTTCA