Biopython -- SeqRecord

21 篇文章 3 订阅
11 篇文章 0 订阅
本文介绍了如何使用Python中的Bio.Seq模块进行序列操作,包括创建Seq对象、基础操作如互补序列、反向完全互补序列、切片和统计,以及获取GC含量。同时,展示了Seq对象如何像字符串一样操作,并进行了转录和翻译。此外,还通过SeqIO模块展示了读取和处理genbank和fasta文件的基本用法。
摘要由CSDN通过智能技术生成

创建Seq object

from Bio.Seq import Seq
my_seq = Seq("AGTACACTGGT")  #seq objetc & like string but differ from python string
my_seq
Seq('AGTACACTGGT', Alphabet())

基础操作

互补序列

my_seq.complement()  #
Seq('TCATGTGACCA', Alphabet())

反向完全互补序列

my_seq.reverse_complement()
Seq('ACCAGTGTACT', Alphabet())

迭代打印序列

for index,letter in enumerate(my_seq):
    print("%i %s" %(index,letter))
0 A
1 G
2 T
3 A
4 C
5 A
6 C
7 T
8 G
9 G
10 T

切片

my_seq[0:2]
Seq('AG', Alphabet())
my_seq[0::2]  #just like string 
Seq('ATCCGT', Alphabet())

统计字符或者subseq

my_seq.count("A")
3
100 * float(my_seq.count("G") + my_seq.count("C")) / len(my_seq)
45.45454545454545

获得GC含量

from Bio.SeqUtils import GC
GC(my_seq)   #same as up example  & you should cope with mixed case and ambiguous nucleotides 
45.45454545454545

像操作字符串一样操作Seq object

#turn Seq object into string
str(my_seq) # or
#concatenating of adding sequences
protein_seq = Seq("EVRNAK")
dna_seq = Seq("ACGT")
protein_seq + dna_seq
Seq('EVRNAKACGT', Alphabet())
dna_seq + "GGG"
Seq('ACGTGGG', Alphabet())
my_seq.lower()
my_seq.upper()
Seq('AGTACACTGGT', Alphabet())
"AGCT" in my_seq
False
"AGCT" not in my_seq
True

转录

my_seq.transcribe() #transform seq to mRNA
Seq('AGUACACUGGU', RNAAlphabet())
my_seq.reverse_complement().transcribe()
Seq('ACCAGUGUACU', RNAAlphabet())
my_seq.transcribe().back_transcribe() #reverse mRNA to DNA
Seq('AGTACACTGGT', DNAAlphabet())

翻译

my_seq.translate()  #translate to protein  or 
/home/djs/miniconda3/lib/python3.8/site-packages/biopython-1.69-py3.8-linux-x86_64.egg/Bio/Seq.py:2092: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn("Partial codon, len(sequence) not a multiple of three. "





Seq('STL', ExtendedIUPACProtein())
my_seq.transcribe().translate() #same as coding DNA
Seq('STL', ExtendedIUPACProtein())
my_seq.translate(tabel= ,to_stop=True)

直接操作seq string

#working with string directly
from Bio.Seq import reverse_complement,translate,back_transcribe,transcribe
my_string = "TCGATCGATCGATCGA"
reverse_complement(my_string)
translate(my_string)
'SIDRS'

SeqIO模块的简单应用

from Bio import SeqIO
for seq_record in SeqIO.parse("test.gbk","genbank"):
    print(seq_record.id)
    print(seq_record.seq)
    print(len(seq_record)) #length of seq
for seq_record in SeqIO.parse("test.fa","fasta"):
    print(seq_record.id)
    print(seq_record.seq)
    print(len(seq_record)) #length of seq
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值