Rosalind Java| RNA Splicing

最新推荐文章于 2024-05-28 09:25:21 发布

学术程稻属

最新推荐文章于 2024-05-28 09:25:21 发布

阅读量124

点赞数

分类专栏： Rosalind Java 文章标签： java 开发语言

本文链接：https://blog.csdn.net/m0_64240043/article/details/123466478

版权

Rosalind Java 专栏收录该内容

36 篇文章 0 订阅

订阅专栏

Rosalind编程问题之剪切RNA。

RNA Splicing

Problem
After identifying the exons and introns of an RNA string, we only need to delete the introns and concatenate the exons to form a new string ready for translation.

Given: A DNA string s (of length at most 1 kbp) and a collection of substrings of s acting as introns. All strings are given in FASTA format.
Sample input：

>Rosalind_10
ATGGTCTACATAGCTGACAAACAGCACGTAGCAATCGGTCGAATCTCGAGAGGCATATGGTCACATGATCGGTCGAGCGTGTTTCAAAGTTTGCGCCTAG
>Rosalind_12
ATCGGTCGAA
>Rosalind_15
ATCGGTCGAGCGTGT

Return: A protein string resulting from transcribing and translating the exons of s. (Note: Only one solution will exist for the dataset provided.)

MVYIADKQHVASREAYGHMFKVCA

众所周知，一段DNA序列在转录成为成熟RNA的过程中需要剪切掉内含子部分，承担多重功能的RNA甚至要选择性切除内含子以实现可变剪切。而本道题给出我们待比对的序列（第一条）以及要切除的内含子序列（后边几条），要求我们返回剪切后的序列并翻译为蛋白质。 解题思路如下：

1.读取输入文件，获取待剪切序列和内含子序列。
2.将内含子序列比对到待剪切的序列，并且减掉与内含子序列相同的片段。
3.重新拼接待剪切序列切点左右两端的片段。
4.DNA转录为RNA，再翻译为蛋白质。

实现代码如下：

剪切的核心子方法是cutSplice方法，实现了内含子比对到待剪切序列，并且利用substring重新拼接剪切后的序列。另外两个方法readFileContent和removeIndexRosalind分别用来读取输入文件为字符串格式；以及去掉“>Rosalind”标签行，从而获取得到待剪切序列和内含子序列。最后DNA转录以及蛋白质翻译过程采用的是Biojava的方法，在前面的博文中也有提及：Rosalind Java| Translating RNA into Protein

import org.biojava.nbio.core.exceptions.CompoundNotFoundException;
import org.biojava.nbio.core.sequence.ProteinSequence;
import org.biojava.nbio.core.sequence.RNASequence;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

public class RNA_Splicing {
    public static void main(String[] args) {
        //1.读取序列并进行标签处理（去掉>Rosalind这一行）
        String s = readFileContent("C:/Users/Administrator/Desktop/rosalind_splc.txt");
        String[] DNA = s.split(">Rosalind_\\d*");
        String[] DNA_arr = removeIndexRosalind(DNA);

        //2.子序列比对和删除
        //2.1获取待剪切序列，并存储到DNA_main之中；更新DNA_arr只包含子序列。
        String DNA_main = DNA_arr[0];
        String[] DNA_splice = removeIndexRosalind(DNA_arr);


        //2.2遍历splice集合并与主序列作比对，并进行剪切。
        for (String s1 : DNA_splice) {
            DNA_main = cutSplice(s1, DNA_main);
        }

        //3.剩余核酸序列转录并翻译为蛋白质
        String RNA = DNA_main.replace('T','U');
        RNASequence rna = null;
        try {
            rna = new RNASequence(RNA);
        } catch (CompoundNotFoundException e) {
            e.printStackTrace();
        }

        ProteinSequence pro = rna.getProteinSequence();
        System.out.println(pro);


    }

    //以下是子方法部分
    //1.输入文本文件路径，以字符串类型返回文本内容
    public static String readFileContent(String fileName) {
        File file = new File(fileName);
        BufferedReader reader = null;
        StringBuffer sbf = new StringBuffer();
        try {
            reader = new BufferedReader(new FileReader(file));
            String tempStr;
            while ((tempStr = reader.readLine()) != null) {
                sbf.append(tempStr);
            }
            reader.close();
            return sbf.toString();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e1) {
                    e1.printStackTrace();
                }
            }
        }
        return sbf.toString();
    }

    //2.rosalind移除fasta格式的核酸标签行，并保存为数组形式
    public static String[] removeIndexRosalind(String[] arr) {
        //需要删除的数的索引index
        int index = 0;
        //定义存储删除元素后的arr的新数组newArr
        String[] newArr = new String[arr.length - 1];
        //定义新数组的索引newArr
        int j = 0;
        //for循环将未删除的元素按原保存顺序保存到新数组
        for (int i = 0; i < arr.length; i++) {
            //判断数组arr中需删除的索引index与循环到的i是否相同,不同时持续赋值j++,相同时则跳过继续比较
            if (index != i) {
                //index不等于i持续赋值
                newArr[j] = arr[i];
                //newArr[]向后移动一个元素
                j++;
            }
        }
        //输出新数组newArr
        return newArr;
    }

    //3.遍历主序列并剪切后的主序列
    public static String cutSplice(String splice, String main){
        //1.总共要比对DNA.length()-motif.length()+1次，故为循环次数
        for (int i = 0; i < main.length() - splice.length() + 1; i++) {
            //2.切割DNA，一旦切割后的序列匹配于motif则输出位置
            if (main.substring(i, i + splice.length()).equals(splice)) {
                //3.输出序列需要拼接原main序列切割位置左右两端，即0-i区间和i + splice.length()位到末尾的区间。
                main = main.substring(0, i)+main.substring(i + splice.length());
                return main;
            }
        }
        return main;
    }
}

学术程稻属

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Rosalind Java| RNA Splicing

Rosalind编程问题之剪切RNA。RNA SplicingProblemAfter identifying the exons and introns of an RNA string, we only need to delete the introns and concatenate the exons to form a new string ready for translation.Given: A DNA string s (of length at most 1 kbp) and
复制链接

扫一扫