proteinMPNN 应用论文解读

论文连接:Improving Protein Expression, Stability, and Function withProteinMPNN

一、Background

  1. Natural proteins are highly optimized for functionbut are often difficult to produce at a scale suitable for biotechnological applications due to poor expression in heterolo-gous systems, limited solubility, and sensitivity to temperature.
    天然蛋白在功能上高度优化,但由于在异种体系中表达差、溶解度有限和对温度敏感,通常很难在适合生物技术应用的规模下生产。

  2. Evolution has optimized function over stability in most natural proteins; as a result, they often exhibit poor solubility,thermostability, and expression in heterologous systems, all of which reduce the yield of functional protein.
    进化优化了大多数天然蛋白质的稳定性。因此,它们往往表现出较差的溶解度、热稳定性和在异源系统中的表达,所有这些都降低了功能蛋白的产量。

  3. Experimental methodssuch as directed evolution have been extensively used to optimize desirable features in proteins but are often prohibitively resource- and labor-intensive.Computational tools have been developed to achieve the benefits of directedevolution while minimizing experimental screening.
    实验方法,如定向进化,已被广泛用于优化蛋白质中的理想特征,但往往是资源和劳动密集型的。计算工具已经开发出来,以实现定向转移的好处,同时最大限度地减少实验筛选。

天然蛋白质在外源系统中(天然大肠杆菌vs人体内 转基因大肠杆菌vs人体内)

  1. 蛋白质功能表达水平差

  2. 蛋白质本身物理性质:溶解度低(工业上蛋白质难以纯化和恢复活性,且会在生物体内以包含体的方式存在:非活性不溶解)

  3. 对温度敏感(热稳定性表现差)

 二、idea

active site:活性部位 / conserved positions:保守区域 / substrate:底物 

The design space is chosen to preserve alternative protein function by fixing the amino acid identities of residues close to the ligand and those that are highly conserved in multiple sequence alignments.
设计空间的选择是通过固定配体附近的氨基酸身份和那些在多个序列比对中高度保守的残基来保留原生蛋白质功能。

In all targets, to preserve the catalytic machinery and substrate-binding site, we fixed the amino acid identities of the first shell functional positions defined as those within 7Å of the substrate in aligand-bound crystal structure complex.
在所有的靶标中,为了保护催化机制和底物结合位点,我们固定了第一个壳层功能位点的氨基酸特征。这些功能位点是指在配体结合晶体结构复合物中与底物距离在7埃以内的位置。

With the design space selected, we generated sequences with Pro-teinMPNN, predicted the structures with AlphaFold2,15 and filtered by the predicted local distance difference test score(pLDDT) and Cα root-mean-square deviation (RMSD) to the input structure
选择设计空间后,我们使用ProteinMPNN生成序列,使用AlphaFold2预测结构,并使用预测的局部距离差测试分数(pLDDT)和Cα均方根偏差(RMSD)对输入结构进行过滤

 三、experiment

We chose as model systems one of the first proteins whose structure was solved, the oxygenstorage protein myoglobin, and the widely used protease from tobacco etch virus (TEV).
我们选择了最早的结构已被解决的蛋白质之一,氧储存蛋白肌红蛋白(myoglobin)和广泛使用的烟草蚀刻病毒蛋白酶(TEV)作为模型系统。

1.Design of Myoglobin Variants with IncreasedStability.

usage:Myoglobin binds heme to carry oxygen inmammalian muscle tissue, and has relevance in clinical applications as a biomarker, as a versatile platform for biocatalytic applications,and in food science as aningredient in artificial meat products. The globin superfamily, of which myoglobin is a member, has a fold made up of eight alpha helical regions, with diversity in the termini and two loop regions flanking the heme-binding pocket.
肌红蛋白结合血红素在哺乳动物肌肉组织中携带氧气,作为生物标志物,作为生物催化应用的通用平台,在临床应用中具有相关性,在食品科学中作为人造肉制品的一种成分。肌红蛋白是球蛋白超家族的一员,它的褶皱由8个α螺旋区组成,末端有多样性,血红素结合袋两侧有两个环区。

 biding site:结合位置 / heme:亚铁血红素 / inpainted regions:填充区域

heme:亚铁血红素

1.Experiment process

We applied the ProteinMPNN design protocol described above using a crystal structure of human myoglobin, nMb(PDB: 3RGK) to preserve the oxygen storage function, wefixed the identities of 17 positions located around the hemeligand in the heme-bound structure .
我们应用了上面描述的ProteinMPNN设计协议,使用了人类肌红蛋白的晶体结构nMb(PDB: 3RGK)来保持储氧功能,我们固定了血红素结合结构中位于血红素配体周围的17个位置的身份。

results: Sixty sequences were generated with ProteinMPNN and evaluated for their likelihood to recapitulate the myoglobin backbonecoordinates using AlphaFold2 single-sequence predictions (seeSupporting Information). Eight of the designs did so with highconfidence (pLDDT > 85.0 and Cα RMSD < 1.0 Å; analogoussingle-sequence prediction of the native sequence yielded pLDDT = 50.6 and Cα RMSD = 7.5 Å). Four designs with close structural agreement in the heme-binding region were selected for experimental testing.

  1. pLDDT(predicted Local Distance Difference Test):评估蛋白质结构预测质量的指标。

    1. 它基于局部距离差异测试(Local Distance Difference Test),评估预测的每个残基(或局部区域)的结构可信度。

    2. pLDDT的数值范围通常从0到100,表示预测结构与实验结构之间的相似性。较高的pLDDT值表明预测的结构与实验结构更为相似和可靠。

  2. RMSD(Root-Mean-Square Deviation of Cα atoms):量化两个蛋白质结构之间整体结构差异的度量。

    1. 它计算两个结构中所有Cα原子位置的均方根偏差。

    2. 通常以埃(Å)为单位,Cα RMSD越小,表示两个结构越相似。

2.result

In myoglobin, we performed a limited backbone redesign to further stabilize the structure.

We also explored the limited backbone redesign of poorly ordered regions to attempt to further stabilize the protein.
我们还探索了对无序区域的有限主干重新设计,以试图进一步稳定蛋白质。

We selected these less-conserved loopregions for backbone remodeling with RoseTTAFold joint inpainting. We generated two distinct sets of designs with structural remodeling: one with the region joining helices E and F redesigned and one additionally including the CD-loop region.
我们选择了这些保守程度较低的环区,用RoseTTAFold joint inpainting进行骨架重塑我们生成了两组不同的结构重塑设计:一组重新设计了连接螺旋E和F的区域,另一组额外包括cd-loop区域。

From these remodeled back-bones, we again performed sequence design with Pro-teinMPNN, with the heme-binding site kept fixed as described above.
从这些重塑的脊骨中,我们再次使用ProteinMPNN进行序列设计,血红素结合位点如上所述保持固定。

results-1:蛋白质功能保留

(b) SEC traces of 20 designed myoglobin variants.
(c) Soluble yield of myoglobin designs and native myoglobin (nMb, represented as a red dashedline).

Thirteen of the twenty designs had higherlevels (up to a 4.1-fold increase) of total soluble protein yield compared to that of native myoglobin. All 20 designs had similar heme-binding spectra to native myoglobin,with agreement in the Soret maximum (407−413 nm vs 409nm in native) and Q-band features (500, 537, 582, and 630nm), suggesting the preservation of the native heme-binding mechanism .
与天然肌红蛋白相比,20种设计中有13种的总可溶性蛋白含量更高(高达4.1倍)。所有20种设计与天然肌红蛋白具有相似的血红素结合光谱,在最大Soret值(407−413 nm vs天然肌红蛋白409nm)和q波段特征(500、537、582和630nm)上一致,表明天然肌红蛋白结合机制得到了保留。

results-2:蛋白质热稳定性增加

 2. Design of TEV Protease Variants with ImprovedStability and Catalytic Activity.

For TEV protease, we used evolutionary information to further identify residuescritical to activity.

TEVd (PDB: 1LVM)input structure with positions fixed during redesign highlighted. Active site residues surrounding the substrate (blue), 50% of the most highly conserved residues (yellow), and catalytic residues (pink) are highlighted. Inset shows a zoomed-in view of the active site region.
EVd (PDB: 1LVM)重新设计期间固定位置的输入结构高亮。底物周围的活性位点残基(蓝色)、50%高度保守的残基(黄色)和催化残基(粉色)被突出显示。内嵌显示活动站点区域的放大视图。

 1.Experiment process

We ranked each amino acid identity at each position by the degree of conservation in the sequence alignment and varied the percentage of these most highly conserved residues to fix during sequence redesign between 30 and 70%. We generated four distinct sets of designs that fixed the amino acid identities of just the active site residues or the active site residues and 30, 50, and 70% of the most conserved residues in the TEV family (Figure 3A, see Supporting Information).
在序列重新设计期间,我们根据序列中每个位置的保守程度对每个氨基酸身份进行排序,保留残基在30和70%之间进行固定。我们生成了四套不同的设计,仅固定活性位点残基的氨基酸特性,或者固定活性位点残基的氨基酸特性和最为保守区域的残基特性的30、50、70%。

results: A total of 144 sequences were generated with ProteinMPNN, which were all predicted with high confidence to fold to the TEV structure by AlphaFold2 (pLDDT > 87.5; native TEV is predicted with pLDDT = 90) and possess 55 to 85% sequence identity to the parent sequence.
总共生成了144个序列,这些模型的预测置信度都很高。用AlphaFold2实现TEV结构的折叠(pLDDT > 87.5; 预测原生TEV的pLDDT = 90),并具有55 ~ 85%的序列与父序列相同。

129 of 144 designs exhibited higher levels of soluble expression than TEVd (TEVd average yield = 1 mg/L culture, design average yield = 20.1 mg/L culture
144组设计中的129组可溶性表达水平高于TEVd (TEVd)平均产量= 1 mg/L培养,设计平均产量= 20.1mg/L培养。

Designs made with no evolutionary constraints had improved soluble expression over the parent but were not active on the peptide substrate, while designs with the highest activities were designed with the top 50% most conserved residues fixed
没有进化约束的设计在可溶性表达方面比母体改进,但在肽底物上不活跃,而具有最高活性的设计是通过固定最高保守性的50%残基设计的。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值