Rosalind
Rosalind是一个生物信息编程学习网站。抛出了一系列生物学问题,引导使用者通过编程语言解决。网站官方推荐使用python解决,python得益于丰富的开源包,其解决方法在CSDN中也多有涉及。小编使用的主语言是java,因此简单通过CSDN社区进行Java学习笔记记录,不足之处也请大家多多指点~
先来看一下问题描述:
Counting DNA Nucleotides
Problem
A string is simply an ordered collection of symbols selected from some alphabet and formed into a word; the length of a string is the number of symbols that it contains.
An example of a length 21 DNA string (whose alphabet contains the symbols ‘A’, ‘C’, ‘G’, and ‘T’) is “ATGCTTCAGAAAGGTCTTACG.”
Given: A DNA string s of length at most 1000 nt.
Sample input:
GAGTTAGGAATTCGGTCGCGAAACTTGCGATCGTGTTACGGCCTGGGTCATTACGAAATTCCCTAGCTCCGCAGTGTTCCTGGAGTGCCATGTCGTCGCTGCCATGCTCAACCGAGAGCAGCCCGTACTACGTGTCTGGTCCTTACAGGACCTAAGCGAATCAATGTGACTACTTTCATAGGTAGGGTTTGTCGTGTCATGGATACGTCTGACAACAACTCGTGGTTGGGGCTGCGCGCATTGATTGTGAGCGAAATACTCGCAAACCAGACTGTCTGAGATAGTCACATCAGACAACCCTGGTCTCTAGCAAAAATCGTTTTCCTATAAATCACGTAACGCAGTAATCTTCAGGCCTCGCGCCAGTCCGCGATAAGACCATCCCCTGCCCTATCCCGCTAATGGCGAGACCCAAAGGACGAGCCTACGACGATCATCGGACAATCAAGGCGGAAGACGCTAGCACTGATTCTTCGGCTCCTAGACAGCGTAGATTTCAGCTATCACCATAATTTTGGTCGTACGGAGGCCCTTTCCCTCGTCAACTCTGTGTCCACGTACGTAACCAGCCACGACAGTATCTTAAACTCCATGGGTCATATATTCGTACAAGTCCGTCGATTAAGCGTTATGGGCTGCTAATTAGACTCCACCTATGCAGGAGTTGTTGTACCGCATCGGCGATTATCCGCCACTCGAAGAAGTTTAACTGCCTATTATATCTTTGGAGACACTGGTTATGTTTTAATACGCACGCATCTTAGTTCAACGGGACGTGGGCGCACGAGCTATCCTGCTAGGATACTTCACTCGCTTCAGTCACCTATGTCTAGGCCCACTATAAGCCGTGACAT
Return: Four integers (separated by spaces) counting the respective number of times that the symbols ‘A’, ‘C’, ‘G’, and ‘T’ occur in s.
简单来说就是通过输入核酸序列,以ACGT的顺序返回相应的碱基数目。解决方法如下~~
import java.util.Scanner;
public class Counting_DNA_Nucleotides {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
System.out.println("请输入核酸序列:");//键盘录入核酸序列
String line = sc.nextLine();//存储录入的核酸序列为String类型
int a=0;
int c=0;
int g=0;
int t=0;
for (int i = 0; i <line.length();i++){
switch (line.charAt(i)){
//括号中的输出类型为char类型,匹配后面的字符时需要将字符加单引号转化为对应的ASCII码方可匹配。
case 'A':
case 'a':
a++;
break;
case 'C':
case 'c':
c++;
break;
case 'G':
case 'g':
g++;
break;
case 'T':
case 't':
t++;
break;
default:
System.out.println("第"+(i+1)+"位不是ACTG");
break;
}
}
System.out.println("ACGT分别为:"+a+" "+c+" "+g+" "+t);
}
}
Sample output:
ACGT分别为:207 229 198 220
但是有些小伙伴不想通过手动输入核酸序列,这里也为大家提供了读取txt文本文件并输出碱基数的方法。两个方法都可行,下述代码只是在上述代码基础上增加了读取文本文件的步骤。
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class Counting_DNA_Nucleotides_ReadFile {
public static void main(String[] args) {
//手动输入Rosalind网站下载的输入文件rosalind_dna.txt的路径地址到filePath中
String filePath = "C:/Users/Administrator/Desktop/rosalind_dna.txt";
System.out.println("核酸序列读取完毕,结果如下:");
System.out.println(readFileContent(filePath));
String line = readFileContent(filePath);
int a=0;
int c=0;
int g=0;
int t=0;
for (int i = 0; i <line.length();i++) {
switch (line.charAt(i)) {
//括号中的输出类型为char类型,匹配后面的字符时需要将字符加单引号转化为对应的ASCII码方可匹配。
case 'A':
case 'a':
a++;
break;
case 'C':
case 'c':
c++;
break;
case 'G':
case 'g':
g++;
break;
case 'T':
case 't':
t++;
break;
default:
System.out.println("第" + (i + 1) + "位不是ACTG");
break;
}
}
System.out.println("ACGT分别为:"+a+" "+c+" "+g+" "+t);
}
//定义方法,输入文本文件路径,以字符串类型返回文本内容
public static String readFileContent(String fileName) {
File file = new File(fileName);
BufferedReader reader = null;
StringBuffer sbf = new StringBuffer();
try {
reader = new BufferedReader(new FileReader(file));
String tempStr;
while ((tempStr = reader.readLine()) != null) {
sbf.append(tempStr);
}
reader.close();
return sbf.toString();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
}
return sbf.toString();
}
}
Sample output:
核酸序列读取完毕,结果如下:
GAGTTAGGAATTCGGTCGCGAAACTTGCGATCGTGTTACGGCCTGGGTCATTACGAAATTCCCTAGCTCCGCAGTGTTCCTGGAGTGCCATGTCGTCGCTGCCATGCTCAACCGAGAGCAGCCCGTACTACGTGTCTGGTCCTTACAGGACCTAAGCGAATCAATGTGACTACTTTCATAGGTAGGGTTTGTCGTGTCATGGATACGTCTGACAACAACTCGTGGTTGGGGCTGCGCGCATTGATTGTGAGCGAAATACTCGCAAACCAGACTGTCTGAGATAGTCACATCAGACAACCCTGGTCTCTAGCAAAAATCGTTTTCCTATAAATCACGTAACGCAGTAATCTTCAGGCCTCGCGCCAGTCCGCGATAAGACCATCCCCTGCCCTATCCCGCTAATGGCGAGACCCAAAGGACGAGCCTACGACGATCATCGGACAATCAAGGCGGAAGACGCTAGCACTGATTCTTCGGCTCCTAGACAGCGTAGATTTCAGCTATCACCATAATTTTGGTCGTACGGAGGCCCTTTCCCTCGTCAACTCTGTGTCCACGTACGTAACCAGCCACGACAGTATCTTAAACTCCATGGGTCATATATTCGTACAAGTCCGTCGATTAAGCGTTATGGGCTGCTAATTAGACTCCACCTATGCAGGAGTTGTTGTACCGCATCGGCGATTATCCGCCACTCGAAGAAGTTTAACTGCCTATTATATCTTTGGAGACACTGGTTATGTTTTAATACGCACGCATCTTAGTTCAACGGGACGTGGGCGCACGAGCTATCCTGCTAGGATACTTCACTCGCTTCAGTCACCTATGTCTAGGCCCACTATAAGCCGTGACAT
ACGT分别为:207 229 198 220