1、重装PROFphd
1)运行 ./PROFphd.run
2)使用默认安装
3)修改/prof/bin/ 下的文件后缀名
!!!把 *.LINUX该会 *.UNKNOWN
4)运行测试安装 /prof/scr/test-install.pl
5)预测:
./prof ./exa/1ppt.hssp fileRdb=1ppt.out sec
预测二级结构
结果:
********************************************************************
# Perl-RDB
#
PROFsec
#
# Copyright
: Burkhard Rost, CUBIC NYC / LION Heidelberg
# Email
: rost@columbia.edu
# WWW
: http://cubic.bioc.columbia.edu
#
Version : 2000.02
#
# --------------------------------------------------------------------------------
# About your protein :
#
# VALUE
PROT_ID : 1ppt
# VALUE
PROT_NAME : PANCREATIC HORMONE
# VALUE
PROT_NCHN : 1
# VALUE
PROT_NRES : 36
# VALUE
PROT_NALI : 30
# VALUE
PROT_NFAR : 25
# VALUE
PROT_NFAR50-5: 7
# VALUE
PROT_NFAR40-5: 5
# VALUE
PROT_NFAR30-5: 4
# VALUE
PROT_NFAR5-5: 0
#
# --------------------------------------------------------------------------------
# About the alignment:
#
# VALUE
ALI_ORIG : ./exa/1ppt.hssp
#
# --------------------------------------------------------------------------------
# About PROF specifics:
#
# VALUE
PROF_FPAR : sec=/picb/home40/cyd/dys/backup/PROFphd/prof-tmp/prof/net/PROFsec_best.par
# VALUE
PROF_NNET : sec=2
#
# --------------------------------------------------------------------------------
# Notation used
:
#
# ------------------------------------------------------------------------
# NOTATION HEADER
: PROTEIN
# NOTATION PROT_ID
: identifier of protein [w]
# NOTATION PROT_NAME : name of protein [w]
# NOTATION PROT_NRES : number of residues [d]
# NOTATION PROT_NCHN : number of chains (if PDB protein) [d]
# NOTATION PROT_NALI : number of proteins aligned in family [d]
# NOTATION PROT_NFAR : number of distant relatives [d]
#
# ------------------------------------------------------------------------
# NOTATION HEADER
: ALIGNMENT
# NOTATION HEADER
: ALIGNMENT: input file
#
# ------------------------------------------------------------------------
# NOTATION HEADER
: INTERNAL
# NOTATION PROF_FPAR : name of parameter file, used [w]
# NOTATION PROF_NNET : number of networks used for prediction [d]
#
#
# ------------------------------------------------------------------------
# NOTATION BODY
: PROTEIN
# NOTATION NO
: counting residues [d]
# NOTATION AA
: amino acid one letter code [A-Z!a-z]
# NOTATION CHN
: protein chain [A-Z!a-z]
#
#
# ------------------------------------------------------------------------
# NOTATION BODY
: PROFsec
# NOTATION OHEL
: observed secondary structure: H=helix, E=extended (sheet), blank=other (loop)
# NOTATION PHEL
: PROF predicted secondary structure: H=helix, E=extended (sheet), blank=other (loop) PROF = PROF: Profile network prediction HeiDelberg
# NOTATION RI_S
: reliability index for PROFsec prediction (0=lo 9=high) Note: for the brief presentation strong predictions marked by '*'
# NOTATION pH
: 'probability' for assigning helix (1=high, 0=low)
# NOTATION pE
: 'probability' for assigning strand (1=high, 0=low)
# NOTATION pL
: 'probability' for assigning neither helix, nor strand (1=high, 0=low)
# NOTATION OtH
: actual neural network output from PROFsec for helix unit
# NOTATION OtE
: actual neural network output from PROFsec for strand unit
# NOTATION OtL
: actual neural network output from PROFsec for 'no-regular' unit
#
# --------------------------------------------------------------------------------
#
No
AA OHEL PHEL RI_S pH pE pL OtH OtE OtL
1
G
L L 9 0 0 9 0 1 97
2
P L L 9 0 0 9 1 2 93
3
S L L 8 0 0 9 2 4 91
4
Q L L 8 0 0 9 1 6 88
5
P L L 8 0 0 9 2 5 91
6
T L L 8 0 0 8 1 9 89
7
Y L L 8 0 0 9 0 10 91
8
P L L 8 0 0 9 2 5 90
9
G
L L 8 0 0 8 4 6 87
10
D L L 7 0 0 8 4 7 84
11
D L L 8 0 0 9 1 6 90
12
A
L L 9 0 0 9 1 4 95
13
P L L 8 0 0 9 7 1 90
14
V H H 7 9 0 0 87 0 8
15
E H H 8 9 0 0 92 0 4
16
D H H 8 9 0 0 93 0 5
17
L
H H 9 9 0 0 93 0 3
18
I H H 8 9 0 0 90 0 4
19
R H H 8 9 0 0 92 0 4
20
F
H H 9 9 0 0 94 0 3
21
Y H H 9 9 0 0 95 0 2
22
D H H 9 9 0 0 94 0 3
23
N H H 9 9 0 0 94 0 3
24
L
H H 8 9 0 0 93 0 5
25
Q H H 8 9 0 0 92 0 4
26
Q H H 8 9 0 0 89 0 7
27
Y H H 8 9 0 0 91 1 5
28
L
H H 8 9 0 0 88 2 7
29
N H H 7 8 0 1 86 2 10
30
V H H 7 8 0 1 85 3 10
31
V H H 3 6 0 2 64 4 27
32
T L L 0 4 0 4 41 6 44
33
R L L 2 3 0 6 27 7 55
34
H L L 3 2 1 6 19 18 56
35
R L L 5 1 1 6 12 18 68
36
Y L L 9 0 0 9 1 3 94
Good!!!!!!!!!!!!!!!!!!!!!!!!!
注意:
PROFphd的input文件可以是fasta,但是格式必须是:
>seq_name
ASJOOEWFNFNSK…………..
多序列预测报错:
5..--> ERROR during initialising copf
msg=*** ERROR copf:brIniErr: after lib-ut:brIniErr
*** ERROR file (fileMatGcg) '/picb/home40/cyd/dys/backup/PROFphd/prof-tmp/phd/mat/Maxhom_McLachlan.metric' missing!
*** ERROR prof:protRd: after protRdAli, fileIn=mutli_seq.fasta,
***
message from sbr:
*** ERROR protRdAli: failed conversion to HSSP
ok lib-col:sysRunProg
*** ERROR prof: line number of error=106
*** ERROR failed prof:full:doOne dbfile=mutli_seq.fasta, chain=*!
*** ERROR msg from where it failed:
*** ERROR doOne: reading db=mutli_seq.fasta, chain=*! (&protRd)
line=1431
*** ERROR protRd: *** ERROR prof:protRd: after protRdAli, fileIn=mutli_seq.fasta,
***
message from sbr:
*** ERROR protRdAli: failed conversion to HSSP
ok lib-col:sysRunProg
新思路:重复调用预测预测单条序列的命令
1、 制作input文件,将单条序列名和序列放入一个文件,做成16632个子文件
1) 做只有name和seq的文件:
/*
combine.cpp
By DYS
Function:Combine name real structure and prediction structure
Inputfile:name.txt real_structure prediction_structure
Onputfile:result.txt
*/
#include <stdio.h>
#include <fstream>
#include <iostream>
using namespace std;
int main()
{
ifstream name("name.txt");
ifstream aa("protein.txt");
ofstream combine("name_protein.txt");
string name_str,aa_str;
name>>name_str;
while(!name.eof())
{
aa>>aa_str;
combine<<name_str<<endl<<aa_str<<endl;
name>>name_str;
}
}//int main
注意:expr的用法,expr num1 + num2 两数与加号之间要空开