Rosalind Java|Matching Random Motifs

Rosalind编程问题之计算随机序列出现并匹配待比对序列的概率。
Rosalind Java|Introduction to Random Strings有异曲同工之妙。

Matching Random Motifs

Problem:
Our aim in this problem is to determine the probability with which a given motif (a known promoter, say) occurs in a randomly constructed genome. Unfortunately, finding this probability is tricky; instead of forming a long genome, we will form a large collection of smaller random strings having the same length as the motif; these smaller strings represent the genome’s substrings, which we can then test against our motif.

Given a probabilistic event A, the complement of A is the collection Ac of outcomes not belonging to A. Because Ac takes place precisely when A does not, we may also call Ac “not A.”

For a simple example, if A is the event that a rolled die is 2 or 4, then Pr(A)=13. Ac is the event that the die is 1, 3, 5, or 6, and Pr(Ac)=23. In general, for any event we will have the identity that Pr(A)+Pr(Ac)=1.

Given: A positive integer N≤100000, a number x between 0 and 1, and a DNA string s of length at most 10 bp.
Sample input

90000 0.6
ATAGCCGA

Return: The probability that if N random DNA strings having the same length as s are constructed with GC-content x (see “Introduction to Random Strings”), then at least one of the strings equals s. We allow for the same random string to be created more than once.
Sample output

0.689


本题给出我们90000条长度和ATAGCCGA一样的序列,这些序列有相同的GC比例0.6。需要我们计算90000条序列中至少一条序列等于给出序列的概率(ATAGCCGA)。有些朋友可能提问,给出的GC比例为0.6,并不等于给出序列(ATAGCCGA)的GC比。在这里GC含量可以理解为GC出现的概率,从而反推AT出现的概率。否则两条GC比例不同但长度相同的序列永远不可能相等。

解题思路如下:
1.读取三个输入数据。
2.统计给出序列的GC数和AT数,根据GC比例计算G/C以及A/T出现的概率。
3.结合第二步,计算给出序列与未知序列相同的概率p。
4.结合第三步,计算给出序列与未知序列不相同的概率1-p。
5.至少一个匹配给出序列的概率等于1-(1-p)n,最后四舍五入到三位小数。


代码如下:

public class Matching_Random_Motifs {
    public static void main(String[] args) {
        //1.输入序列个数
        Scanner sc = new Scanner(System.in);
        System.out.println("请输入序列个数n:");
        int n = sc.nextInt();
        //输入GC比例
        Scanner sr = new Scanner(System.in);
        System.out.println("请输入GC比例:");
        Double GC = sr.nextDouble();
        //输入待比对序列
        Scanner sk = new Scanner(System.in);
        System.out.println("请输入示例序列:");
        String DNA = sk.nextLine();

        //2.计算有相同GC比例和长度的序列中等于待比对DNA的概率
        int ATDNA = 0;
        int GCDNA = 0;
        for (int i = 0; i < DNA.length(); i++) {
            switch (DNA.charAt(i)) {
                //括号中的输出类型为char类型,匹配后面的字符时需要将字符加单引号转化为对应的ASCII码方可匹配。
                case 'A':
                case 'T':
                    ATDNA++;
                    break;
                case 'C':
                case 'G':
                    GCDNA++;
                    break;
                default:
                    break;
            }
        }
        double prob = Math.pow(GC / 2, GCDNA) * Math.pow((1 - GC) / 2, ATDNA);
        double randomprob = 1 - Math.pow(1 - prob, n);
        System.out.println((float) (Math.round(randomprob * 1000)) / 1000);//四舍五入时分子一定要转化成浮点型,否则会默认整数型输出并舍为0
    }
}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值