Rosalind Java|Finding a Spliced Motif

Rosalind编程问题之查找两个序列由内含子分隔的共有motif。

Ordering Strings of Varying Length Lexicographically

Problem:
A subsequence of a string is a collection of symbols contained in order (though not necessarily contiguously) in the string (e.g., ACG is a subsequence of TATGCTAAGATC). The indices of a subsequence are the positions in the string at which the symbols of the subsequence appear; thus, the indices of ACG in TATGCTAAGATC can be represented by (2, 5, 9).

As a substring can have multiple locations, a subsequence can have multiple collections of indices, and the same index can be reused in more than one appearance of the subsequence; for example, ACG is a subsequence of AACCGGTT in 8 different ways.
Given: Two DNA strings s and t (each of length at most 1 kbp) in FASTA format.
Sample input

Rosalind_14
ACGTACGTGACG
Rosalind_18
GTA

Return: One collection of indices of s in which the symbols of t appear as a subsequence of s. If multiple solutions exist, you may return any one.
Sample output

3 8 10


题目给出两条序列,需要我们在长的一条中找到短的一条里全部碱基的位置。也可以理解为短序列是长序列的cds,长序列包含内含子,需要我们定位出cds的碱基序号。(本题答案不唯一)

解题思路如下:
1.读取两条序列。
2.双指针法分别遍历长短序列。
3.如碱基相同则输出该碱基的序号。

下面是实现代码:

public class Finding_a_Spliced_Motif {
    public static void main(String[] args) {
        ArrayList<String> fasta = BufferedReader2("C:/Users/Administrator/Desktop/rosalind_sseq.txt", "fasta");
        ArrayList<Integer> index = new ArrayList<>();
        //双指针法
        int i = 0;//第一条序列,主序列
        int j = 0;//第二条序列,亚序列
        while (j < fasta.get(1).length()) {
            if (fasta.get(1).charAt(j) == fasta.get(0).charAt(i)) {
                index.add(i + 1);
                j++;//亚序列前进
            }
            i++;//主序列前进
        }
        for (int k = 0; k < index.size(); k++) {
            System.out.print(index.get(k) + " ");
        }

    }

    public static ArrayList<String> BufferedReader2(String path, String choose) {//返回值类型是新建集合大类,此处是Set而非哈希。
        BufferedReader reader;
        ArrayList<String> tag = new java.util.ArrayList<String>();
        ArrayList<String> fasta = new java.util.ArrayList<String>();
        try {
            reader = new BufferedReader(new FileReader(path));
            String line = reader.readLine();
            StringBuilder sb = new StringBuilder();
            while (line != null) {//多次匹配带有“>”的行,\w代表0—9A—Z_a—z,需要转义。\W代表非0—9A—Z_a—z。
                if (line.matches(">[\\w*|\\W*]*")) {
                    tag.add(line);
                    //定义字符串变量seq保存删除换行符的序列信息
                    if (sb.length() != 0) {
                        String seq = sb.toString();
                        fasta.add(seq);
                        sb.delete(0, sb.length());//清空StringBuilder中全部元素
                    }

                } else {
                    sb.append(line);//重新向StringBuilder添加元素
                }
                // read next line
                line = reader.readLine();
            }
            String seq = sb.toString();
            fasta.add(seq);

            reader.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        if (choose.equals("tag")) {
            return tag;
        }
        return fasta;
    }
}

双指针法

双指针法实现遍历的核心思想就是在遍历对象的过程中,不只使用单个指针进行数组或集合的访问,而是使用两个相同方向或者相反方向的指针进行扫描,从而达到相应的目的。换言之,双指针法充分使用了数组有序这一特征,从而在某些情况下简化运算。而实现双指针法关键点在于设定终止条件,本道题中两碱基字母相等就是终止条件:fasta.get(1).charAt(j) == fasta.get(0).charAt(i)。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值