腌制一坛美味的泡菜

腌制一坛美味的泡菜

  四世同堂里有祁老太爷三个月平安无事咸菜缸,Python里有令龟叔直呼amazing的泡菜缸。所谓泡菜缸,是指Python的pickle(泡菜)模块。

  了解Python文件操作的都会发现,从文件中读取字符串非常简单,但是读取数值、列表、字典等数据就令人头痛了,因为不管用read方法,还是readline方法都是返回一个字符串。基于此,Python提供了pickle模块,用pickle可以将列表、字典等等所有数据对象保存成二进制文件,存储在磁盘上。

  下方代码框中的脚本实现了将DNA序列翻译成氨基酸序列,其中密码子表就占了好多行,想向下翻页,修改代码都比较麻烦,将密码子表做成泡菜,需要的时候读取它,代码就可以变得优雅、简洁并美观了。

  DNA2Protein_v1.0.py

transl_table_1={
"TTT":"F","TCT":"S","TAT":"Y","TGT":"C",
"TTC":"F","TCC":"S","TAC":"Y","TGC":"C",
"TTA":"L","TCA":"S","TAA":"*","TGA":"*",
"TTG":"L","TCG":"S","TAG":"*","TGG":"W",
"CTT":"L","CCT":"P","CAT":"H","CGT":"R",
"CTC":"L","CCC":"P","CAC":"H","CGC":"R",
"CTA":"L","CCA":"P","CAA":"Q","CGA":"R",
"CTG":"L","CCG":"P","CAG":"Q","CGG":"R",
"ATT":"I","ACT":"T","AAT":"N","AGT":"S",
"ATC":"I","ACC":"T","AAC":"N","AGC":"S",
"ATA":"I","ACA":"T","AAA":"K","AGA":"R",
"ATG":"M","ACG":"T","AAG":"K","AGG":"R",
"GTT":"V","GCT":"A","GAT":"D","GGT":"G",
"GTC":"V","GCC":"A","GAC":"D","GGC":"G",
"GTA":"V","GCA":"A","GAA":"E","GGA":"G",
"GTG":"V","GCG":"A","GAG":"E","GGG":"G",
}
transl_table_2={
"TTT":"F","TCT":"S","TAT":"Y","TGT":"C",
"TTC":"F","TCC":"S","TAC":"Y","TGC":"C",
"TTA":"L","TCA":"S","TAA":"*","TGA":"W",
"TTG":"L","TCG":"S","TAG":"*","TGG":"W",
"CTT":"L","CCT":"P","CAT":"H","CGT":"R",
"CTC":"L","CCC":"P","CAC":"H","CGC":"R",
"CTA":"L","CCA":"P","CAA":"Q","CGA":"R",
"CTG":"L","CCG":"P","CAG":"Q","CGG":"R",
"ATT":"I","ACT":"T","AAT":"N","AGT":"S",
"ATC":"I","ACC":"T","AAC":"N","AGC":"S",
"ATA":"M","ACA":"T","AAA":"K","AGA":"*",
"ATG":"M","ACG":"T","AAG":"K","AGG":"*",
"GTT":"V","GCT":"A","GAT":"D","GGT":"G",
"GTC":"V","GCC":"A","GAC":"D","GGC":"G",
"GTA":"V","GCA":"A","GAA":"E","GGA":"G",
"GTG":"V","GCG":"A","GAG":"E","GGG":"G",
}
transl_table_3={
"TTT":"F","TCT":"S","TAT":"Y","TGT":"C",
"TTC":"F","TCC":"S","TAC":"Y","TGC":"C",
"TTA":"L","TCA":"S","TAA":"*","TGA":"W",
"TTG":"L","TCG":"S","TAG":"*","TGG":"W",
"CTT":"T","CCT":"P","CAT":"H","CGT":"R",
"CTC":"T","CCC":"P","CAC":"H","CGC":"R",
"CTA":"T","CCA":"P","CAA":"Q","CGA":"R",
"CTG":"T","CCG":"P","CAG":"Q","CGG":"R",
"ATT":"I","ACT":"T","AAT":"N","AGT":"S",
"ATC":"I","ACC":"T","AAC":"N","AGC":"S",
"ATA":"M","ACA":"T","AAA":"K","AGA":"R",
"ATG":"M","ACG":"T","AAG":"K","AGG":"R",
"GTT":"V","GCT":"A","GAT":"D","GGT":"G",
"GTC":"V","GCC":"A","GAC":"D","GGC":"G",
"GTA":"V","GCA":"A","GAA":"E","GGA":"G",
"GTG":"V","GCG":"A","GAG":"E","GGG":"G",
}
###此处省略其余的22种密码子表,本文的密码子表下载自NCBI,网址:https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1
protein=""
seq=input("Please enter a DNA sequence:\n")
table=input("Please enter the reference codon table,just enter the number: \n")
table="transl_table_"+table
table=eval(table)
if len(seq)%3==0:
    for i in range(0,len(seq),3):
        codon=seq[i:i+3]
        protein+=table[codon]
else:
    print("Warning:This sequence can't be translated completely.")
    for i in range(0,len(seq),3):
        if i+3<len(seq):
            codon=seq[i:i+3]
            protein+=table[codon]
print(protein)

1. 使用pickle的dump(倒)方法存储数据

pickle.dump(data,file)
#第一个参数是代存储的数据对象,第二个参数是目标存储的文件对象,注意要用'wb'的模式open文件

  pickle_transl_table.py

import pickle
transl_table_1={
"TTT":"F","TCT":"S","TAT":"Y","TGT":"C",
"TTC":"F","TCC":"S","TAC":"Y","TGC":"C",
"TTA":"L","TCA":"S","TAA":"*","TGA":"*",
"TTG":"L","TCG":"S","TAG":"*","TGG":"W",
"CTT":"L","CCT":"P","CAT":"H","CGT":"R",
"CTC":"L","CCC":"P","CAC":"H","CGC":"R",
"CTA":"L","CCA":"P","CAA":"Q","CGA":"R",
"CTG":"L","CCG":"P","CAG":"Q","CGG":"R",
"ATT":"I","ACT":"T","AAT":"N","AGT":"S",
"ATC":"I","ACC":"T","AAC":"N","AGC":"S",
"ATA":"I","ACA":"T","AAA":"K","AGA":"R",
"ATG":"M","ACG":"T","AAG":"K","AGG":"R",
"GTT":"V","GCT":"A","GAT":"D","GGT":"G",
"GTC":"V","GCC":"A","GAC":"D","GGC":"G",
"GTA":"V","GCA":"A","GAA":"E","GGA":"G",
"GTG":"V","GCG":"A","GAG":"E","GGG":"G",
}
transl_table_2={
"TTT":"F","TCT":"S","TAT":"Y","TGT":"C",
"TTC":"F","TCC":"S","TAC":"Y","TGC":"C",
"TTA":"L","TCA":"S","TAA":"*","TGA":"W",
"TTG":"L","TCG":"S","TAG":"*","TGG":"W",
"CTT":"L","CCT":"P","CAT":"H","CGT":"R",
"CTC":"L","CCC":"P","CAC":"H","CGC":"R",
"CTA":"L","CCA":"P","CAA":"Q","CGA":"R",
"CTG":"L","CCG":"P","CAG":"Q","CGG":"R",
"ATT":"I","ACT":"T","AAT":"N","AGT":"S",
"ATC":"I","ACC":"T","AAC":"N","AGC":"S",
"ATA":"M","ACA":"T","AAA":"K","AGA":"*",
"ATG":"M","ACG":"T","AAG":"K","AGG":"*",
"GTT":"V","GCT":"A","GAT":"D","GGT":"G",
"GTC":"V","GCC":"A","GAC":"D","GGC":"G",
"GTA":"V","GCA":"A","GAA":"E","GGA":"G",
"GTG":"V","GCG":"A","GAG":"E","GGG":"G",
}
transl_table_3={
"TTT":"F","TCT":"S","TAT":"Y","TGT":"C",
"TTC":"F","TCC":"S","TAC":"Y","TGC":"C",
"TTA":"L","TCA":"S","TAA":"*","TGA":"W",
"TTG":"L","TCG":"S","TAG":"*","TGG":"W",
"CTT":"T","CCT":"P","CAT":"H","CGT":"R",
"CTC":"T","CCC":"P","CAC":"H","CGC":"R",
"CTA":"T","CCA":"P","CAA":"Q","CGA":"R",
"CTG":"T","CCG":"P","CAG":"Q","CGG":"R",
"ATT":"I","ACT":"T","AAT":"N","AGT":"S",
"ATC":"I","ACC":"T","AAC":"N","AGC":"S",
"ATA":"M","ACA":"T","AAA":"K","AGA":"R",
"ATG":"M","ACG":"T","AAG":"K","AGG":"R",
"GTT":"V","GCT":"A","GAT":"D","GGT":"G",
"GTC":"V","GCC":"A","GAC":"D","GGC":"G",
"GTA":"V","GCA":"A","GAA":"E","GGA":"G",
"GTG":"V","GCG":"A","GAG":"E","GGG":"G",
}
###此处省略其余的22种密码子表,本文的密码子表下载自NCBI,网址:https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG1
for i in [1,2,3,4,5,6,9,10,11,12,13,14,16,21,22,23,24,25,26,27,28,29,30,31,33]:
    file_name="transl_table_"+str(i)+".pkl"
    sth="transl_table_"+str(i)
    #一定要用eval函数将字符串转为变量名!!!否则腌制的泡菜只有菜根,没有菜叶-_-
    sth=eval(sth)
    pickle_file=open(file_name,'wb')
    pickle.dump(sth,pickle_file)
    pickle_file.close()

2. 使用pickle的load方法读取数据

pickle.load(file)
#参数是目标存储的文件对象,注意要用'rb'的模式open文件

  DNA2Protein_v2.0.py

import pickle
seq=input("Please enter the DNA sequence:\n")
number=input("Please enter the number for the codon table:\n")
file_name="transl_table_"+str(number)+".pkl"
pickle_file=open(file_name,'rb')
table=pickle.load(pickle_file)
pickle_file.close()
protein=""
if len(seq)%3==0:
    for i in range(0,len(seq),3):
        codon=seq[i:i+3]
        protein+=table[codon]
else:
    print("Warning:This sequence can't be translated completely.")
    for i in range(0,len(seq),3):
        if i+3<len(seq):
            codon=seq[i:i+3]
            protein+=table[codon]
print(protein)

3. 运行结果如下所示

在这里插入图片描述

4. 脚本大小对比

在这里插入图片描述
  将数据对象保存为一个二进制数据包,改进后脚本大小减少了,运行时占用的内存就少了,pickle模块针布戳。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值