华侨大学课件系列:matlab(生物信息学学习)
Matlab生物信息学工具包
Bioinformatics
MatLab Bioinformatics Toolbox
(1)Data Formats and Databases
①getgenbank和genbankread
例1:S = getgenbank('M10051')
返回:S =
例2 :mitochondria =
getgenbank('NC_001807','SequenceOnly',true)
返回:
mitochondria =
gatcacaggtctatcaccctattaaccactcacgggagctctccatgcat
ttggtattttcgtctggggggtgtgcacgcgatagcattgcgagacgctg
gagccggagcaccctatgtcgcagtatctgtctttgattcctgcctcatt
ctattatttatcgcacctacgttcaatattacaggcgaacatacctacta
aagt . . .
例:getgenbank('nm_000520', 'ToFile',
'TaySachs_Gene.txt')
s = genbankread('TaySachs_Gene.txt')
返回:s=
LocusName:"NM_000520'
LocusSequenceLength:'2255'
LocusNumberofStrands:''
LocusTopology:'linear'
LocusMoleculeType:'mRNA'
LocusGenBankDivision:'PRI'
LocusModificationDate:'23-SEP-2005'
………………
例1:
seq = getgenbank('NM_000546')
codingSeq = seq.Sequence(203:1384)
fastawrite('p53coding.txt','Coding region for p53',codingSeq);
例2 :p53nt = fastaread('p53coding.txt')
例3 :Save multiple sequences.
data(1).Sequence = 'ACACAGGAAA'
data(1).Header = 'First sequence'
data(2).Sequence = 'ACGTCAGGTC'
data(2).Header = 'Second sequence'
fastawrite('my_sequences.txt', data)
type('my_sequences.txt')
(2)Sequence Conversion
①aa2nt 和nt2aa
②nt2int 和int2nt ;aa2int 和int2aa
③dna2rna 和rna2dna
④baselookup 和aminolookup
⑤seqcomplement
⑥seqreverse
⑦seqrcomplement
MatLab生物信息学平台统计的函数
1.basecount(count nucleotides in a sequence)
例:bases = basecount('TAGCTGGCCAAGCGAGCTTG')
答案:bases =
A: 4
C: 5
G: 7
T: 4
2.dimercount(count dimers in a sequence)
例:dimercount('TAGCTGGCCAAGCGAGCTTG')
答案: ans =
AA: 1
AC: 0
AG: 3
AT: 0
CA: 1
CC: 1