有段时间没动生物信息学这反面的知识,忽然老白说要计算折叠率(foldrate),文件依然是fasta格式。
http://ibi.hzau.edu.cn/FDserver/cipred.php
网站上可以直接提交单个蛋白质序列以计算foldrate,这样只要写向HTTP提交表单的程序,循环取出.fastx文件中的序列并一一提交就可以了。表单格式可以通过查看网页源代码得到。比较好实现的是php和perl两种方法:
perl的程序比较简洁:
#!/usr/bin/perl
use strict;
use LWP;
my $word = 'MKQTVAAYIAKTLESAGVKRIWGVTGDSLNGLSDSLNRMGTIEWMSTRHEEVAAFAAGAEAQLSGELAVCAGSCGPGNLHLINGLFDCHRNHVPVLAIAAHIPSSEIGSGYFQETHPQELFRECSHYCELVSSPEQIPQVLAIAMRKAVLNRGVSVVVLPGDVALKPAPEGATMHWYHAPQPVVTPEEEELRKLAQLLRYSSNIALMCGSGCAGAHKELVEFAGKIKAPIVHALRGKEHVEYDNPYDVGMTGLIGFSSGFHTMMNADTLVLLGTQFPYRAFYPTDAKIIQIDINPASIGAHSKVDMALVGDIKSTLRALLPLVEEKADRKFLDKALEDYRDARKGLDDLAKPSEKAIHPQYLAQQISHFAADDAIFTCDVGTPTVWAARYLKMNGKRRLLGSFNHGSMANAMPQALGAQATEPERQVVAMCGDGGFSMLMGDFLSVVQMKLPVKIVVFNNSVLGFVAMEMKAGGYLTDGTELHDTNFARIAEACGITGIRVEKASEVDEALQRAFSIDGPVLVDVVVAKEELAIPPQIKLEQAKGFSLYMLRAIISGRGDEVIELAKTNWLR
';
my $content;
my $browser = LWP::UserAgent->new();
my $url = 'http://ibi.hzau.edu.cn/FDserver/cido.php';
$browser->env_proxy( ); # 环境配置
$browser->timeout(200); # 延时
my $response = $browser->post( $url,['textarea'=>$word, 'radiobutton' => 'Unknown','ButtonRatePred' => 'Predict Folding Rate'] );
if ($response->is_success) {
print $response->content;
} else {
print "Bad luck this time\n";
}
获取后的格式就是html,保存在 $response -> content 中。再通过正则式:
(The predicted folding rate: )(log\(Kf\) = )(.*)(/sec.)
即可获取folding rate
php要用到curl (http://bbs.phpchina.com/thread-237353-1-1.html)
这需要将php.ini文件中的 ;extension=php_curl.dll 这一行去掉分号。(否则调用 curl_init() 会报错)
<?php
function curlFunc($array) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $array['url']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CRLF, false);
if (isset($array['header']) && $array['header']) {
curl_setopt($ch, CURLOPT_HEADER, 1);
}
if (isset($array['httpheader'])) {
$headerArr = array();
foreach($array['httpheader'] as $n => $v) {
$headerArr[] = $n . ':' . $v;
}
curl_setopt($ch, CURLOPT_HTTPHEADER, $headerArr);
}
if (isset($array['referer'])) {
curl_setopt($ch, CURLOPT_REFERER, $array['referer']);
}
if (isset($array['post'])) {
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $array['post']);
}
if (isset($array['cookie'])) {
curl_setopt($ch, CURLOPT_COOKIE, $array['cookie']);
}
if (isset($array['useragent'])) {
curl_setopt($ch, CURLOPT_USERAGENT, $array['useragent']);
} else {
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
}
if (isset($array['cookiefile'])) {
curl_setopt($ch, CURLOPT_COOKIEFILE, $array['cookiefile']);
}
if (isset($array['cookiejar'])) {
curl_setopt($ch, CURLOPT_COOKIEJAR, $array['cookiejar']);
}
$r['erro'] = curl_error($ch);
$r['errno'] = curl_errno($ch);
$r['html'] = curl_exec($ch);
$r['http_code'] = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
return $r;
}
$a=array('url'=>'http://feedback.hao123.com/',
'post'=>'ac=create&catalog_id=2&description=test',);
$b=curlFunc($a);
echo htmlspecialchars($b['html']);
//var_dump($b);
有了函数curlFunc,就可以提交我们的数据。
下面是封装好的两个函数:
function StringToRate($str)
{
//发送表单信息返回给$b
$a=array('url'=>'http://ibi.hzau.edu.cn/FDserver/cido.php',
'post'=>'textarea='.$str.'&radiobutton=Unknown&ButtonRatePred=Predict Folding Rate',);
$b=curlFunc($a);
//返回HTML
$c=htmlspecialchars($b['html']);
//按格式显示
//var_dump($b);
//从$b中搜索
//匹配的正则式:(The predicted folding rate: )(log\(Kf\) = )(.*)(/sec.)
$regex = '(The predicted folding rate: )(log\(Kf\) = )(.*)(/sec.)';
ereg($regex, $c, $matches);
$result = $matches[3];
return $result;
}
function FileExec($data, $output)
{
//$data为数据库路径 $output为输出文件的路径
//打开数据库
$data_file_name = $data;
$data_file_pointer=fopen($data_file_name,"r");
//打开输出文件
$output_file_name = $output;
$output_file_pointer=fopen($output_file_name,"w");
$data_file_line1 = '1';
$data_file_line2 = '2';
//以两行为单位读入数据
while($data_file_line1 != NULL && $data_file_line2 != NULL){
$data_file_line1=fgets($data_file_pointer);
$data_file_line2=fgets($data_file_pointer);
$time = StringToRate($data_file_line2);
$regex = '(DIP-[0-9]*N)';
ereg($regex, $data_file_line1, $matches);
$writer = $matches[0]."\t".$time."\r\n";
fwrite($output_file_pointer,$writer);
}
echo "congradulations!";
echo "execute file ok!";
}
===================================
本文来自:http://blog.csdn.net/zh405123507
tags:蛋白质 folding rate PERL 表单