全基因组水平分泌蛋白鉴定

1、信号肽预测

下载软件signalp

SignalP - 5.0 - Services - DTU Health Techicon-default.png?t=L9C2https://services.healthtech.dtu.dk/service.php?SignalP-5.0输入邮箱等注册信息(高校或科研机构邮箱),在邮箱里获取下载链接。

#下载软件
wget https://services.healthtech.dtu.dk/download/1d1d8c14-5304-4800-aafe-8e637b01a7c3/signalp-5.0b.Linux.tar.gz

#解压
tar zxf /home/u1342/software/signalp-5.0.Linux.tar.gz

#添加环境变量
echo 'PATH=$PATH:/opt/biosoft/signalp-5.0/bin' >> ~/.bashrc

#刷新环境变量
source ~/.bashrc

#运行
signalp -batch 30000 -org euk -fasta protein.fa -gff3 -mature

signalp命令参数:

 -batch int
        Number of sequences that the tool will run simultaneously. Decrease or increase size depending on your system memory. (default 10000)
  -fasta string
        Input file in fasta format.
  -format string
        Output format. 'long' for generating the predictions with plots, 'short' for the predictions without plots. (default "short")
  -gff3
        Make gff3 file of processed sequences.
  -mature
        Make fasta file with mature sequence.
  -org string
        Organism. Archaea: 'arch', Gram-positive: 'gram+', Gram-negative: 'gram-' or Eukarya: 'euk' (default "euk")
  -plot string
        Plots output format. When long output selected, choose between 'png', 'eps' or 'none' to get just a tabular file. (default "png")
  -prefix string
        Output files prefix. (default "Input file prefix")
  -stdout
        Write the prediction summary to the STDOUT.
  -tmp string
        Specify temporary file directory. (default "System default tmpdir")
  -verbose
        Verbose output. Specify '-verbose=false' to avoid printing. (default true)
  -version
        Prints version.

2、跨膜结构域预测

下载TMHMM,下载方法同上

https://services.healthtech.dtu.dk/service.php?TMHMM-2.0icon-default.png?t=L9C2https://services.healthtech.dtu.dk/service.php?TMHMM-2.0

#运行TMHMM,test.fa为上一步生成的mature.fa
tmhmm -noplot test.fa > tmhmm.txt

#提取没有跨膜结构域蛋白的ID,
grep 'Number of predicted TMHs:  0' tmhmm.txt > secreted_id.txt

#可使用sed命令删除 Number of predicted TMHs:  0,只留下ID编号
sed -i 's/Number of predicted TMHs:  0//g' secreted_id.txt

# 根据ID提取基因组分泌蛋白
seqkit grep -f secreted_id.txt protein.fa > secreted.fa



 根据ID提取的蛋白序列即为基因组分泌蛋白

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值