关于DNA 碱基序列检验的JAVA代码

This assignment focuses on arrays and file/text processing.  Turn in a file named DNA.java. You will also need the two input files dna.txt and ecoli.txt from the course web site.  Save these files in the same folder as your program. The assignment involves processing data from genome files.  Your program should work with the two given input files.  If you are curious (this is not required), the National Center for Biotechnology Information publishes many other bacteria genome files.  The last page tells you how to use your program to process other published genome files.


Background Information About DNA:
Note: This section explains some information from the field of biology that is related to this assignment. It is for your information only; you do not need to fully understand it to complete the assignment.

Deoxyribonucleic acid (DNA) is a complex biochemical macromolecule that carries genetic information for cellular life forms and some viruses.  DNA is also the mechanism through which genetic information from parents is passed on during reproduction.  DNA consists of long chains of chemical compounds called nucleotides.  Four nucleotides are present in DNA: Adenine (A), Cytosine (C), Guanine (G), and Thymine (T).  DNA has a double-helix structure (see diagram below) containing complementary chains of these four nucleotides connected by hydrogen bonds.  Certain regions of the DNA are called genes.  Most genes encode instructions for building proteins (they're called "protein-coding" genes).  These proteins are responsible for carrying out most of the life processes of the organism.   Nucleotides in a gene are organized into codons.  Codons are groups of three nucleotides and are written as the first letters of their nucleotides (e.g., TAC or GGA).  Each codon uniquely encodes a single amino acid, a building block of proteins. The process of building proteins from DNA has two major phases called transcription and translation, in which a gene is replicated into an intermediate form called mRNA, which is then processed by a structure called a ribosome to build the chain of amino acids encoded by the codons of the gene. 
The sequences of DNA that encode proteins occur between a start codon (which we will assume to be ATG) and a stop codon (which is any of TAA, TAG, or TGA).  Not all regions of DNA are genes; large portions that do not lie between a valid start and stop codon are called intergenic DNA and have other (possibly unknown) functions.  Computational biologists examine large DNA data files to find patterns and important information, such as which regions are genes.  Sometimes they are interested in the percentages of mass accounted for by each of the four nucleotide types.  Often high percentages of Cytosine (C) and Guanine (G) are indicators of important genetic data. For more information, visit the Wikipedia page about DNA: 点击打开链接 点击打开链接

In this assignment you read an input file containing named sequences of nucleotides and produce information about them.  For each nucleotide sequence, your program counts the occurrences of each of the four nucleotides (A, C, G, and T).  The program also computes the mass percentage occupied by each nucleotide type, rounded to one digit past the decimal point.  Next the program reports the codons (trios of nucleotides) present in each sequence and predicts whether or not the sequence is a protein-coding gene.  For us, a protein-coding gene is a string that matches all of the following constraints*:

• begins with a valid start codon (ATG)
• ends with a valid stop codon (one of the following: TAA, TAG, or TGA)
• contains at least 5 total codons (including its initial start codon and final stop codon)
• Cytosine (C) and Guanine (G) combined account for at l
  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值