蛋白质互作工具开发笔记(三)——批量计算折叠率的小程序

有段时间没动生物信息学这反面的知识,忽然老白说要计算折叠率(foldrate),文件依然是fasta格式。



http://ibi.hzau.edu.cn/FDserver/cipred.php


网站上可以直接提交单个蛋白质序列以计算foldrate,这样只要写向HTTP提交表单的程序,循环取出.fastx文件中的序列并一一提交就可以了。表单格式可以通过查看网页源代码得到。比较好实现的是php和perl两种方法:


perl的程序比较简洁:

#!/usr/bin/perl

use strict;
use LWP;

my $word = 'MKQTVAAYIAKTLESAGVKRIWGVTGDSLNGLSDSLNRMGTIEWMSTRHEEVAAFAAGAEAQLSGELAVCAGSCGPGNLHLINGLFDCHRNHVPVLAIAAHIPSSEIGSGYFQETHPQELFRECSHYCELVSSPEQIPQVLAIAMRKAVLNRGVSVVVLPGDVALKPAPEGATMHWYHAPQPVVTPEEEELRKLAQLLRYSSNIALMCGSGCAGAHKELVEFAGKIKAPIVHALRGKEHVEYDNPYDVGMTGLIGFSSGFHTMMNADTLVLLGTQFPYRAFYPTDAKIIQIDINPASIGAHSKVDMALVGDIKSTLRALLPLVEEKADRKFLDKALEDYRDARKGLDDLAKPSEKAIHPQYLAQQISHFAADDAIFTCDVGTPTVWAARYLKMNGKRRLLGSFNHGSMANAMPQALGAQATEPERQVVAMCGDGGFSMLMGDFLSVVQMKLPVKIVVFNNSVLGFVAMEMKAGGYLTDGTELHDTNFARIAEACGITGIRVEKASEVDEALQRAFSIDGPVLVDVVVAKEELAIPPQIKLEQAKGFSLYMLRAIISGRGDEVIELAKTNWLR
';
my $content;
my $browser = LWP::UserAgent->new();  
my $url = 'http://ibi.hzau.edu.cn/FDserver/cido.php';


$browser->env_proxy( ); # 环境配置
$browser->timeout(200); # 延时

my $response = $browser->post( $url,['textarea'=>$word, 'radiobutton' => 'Unknown','ButtonRatePred' => 'Predict Folding Rate']  );  


if ($response->is_success) {
	print $response->content;
} else {
	print "Bad luck this time\n";
}

获取后的格式就是html,保存在 $response -> content 中。再通过正则式:

(The predicted folding rate: )(log\(Kf\) = )(.*)(/sec.)

即可获取folding rate


php要用到curl (http://bbs.phpchina.com/thread-237353-1-1.html

这需要将php.ini文件中的 ;extension=php_curl.dll 这一行去掉分号。(否则调用 curl_init() 会报错)

<?php
function curlFunc($array) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $array['url']);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_CRLF, false);
        if (isset($array['header']) && $array['header']) {
                curl_setopt($ch, CURLOPT_HEADER, 1);
        }
        if (isset($array['httpheader'])) {
                $headerArr = array();
                foreach($array['httpheader'] as $n => $v) {
                        $headerArr[] = $n . ':' . $v;
                }
                curl_setopt($ch, CURLOPT_HTTPHEADER, $headerArr);
        }
        if (isset($array['referer'])) {
                curl_setopt($ch, CURLOPT_REFERER, $array['referer']);
        }
        if (isset($array['post'])) {
                curl_setopt($ch, CURLOPT_POST, 1);
                curl_setopt($ch, CURLOPT_POSTFIELDS, $array['post']);
        }
        if (isset($array['cookie'])) {
                curl_setopt($ch, CURLOPT_COOKIE, $array['cookie']);
        }
        if (isset($array['useragent'])) {
                curl_setopt($ch, CURLOPT_USERAGENT, $array['useragent']);
        } else {
                curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
        }
        if (isset($array['cookiefile'])) {
                curl_setopt($ch, CURLOPT_COOKIEFILE, $array['cookiefile']);
        }
        if (isset($array['cookiejar'])) {
                curl_setopt($ch, CURLOPT_COOKIEJAR, $array['cookiejar']);
        }
        $r['erro'] = curl_error($ch);
        $r['errno'] = curl_errno($ch);
        $r['html'] = curl_exec($ch);
        $r['http_code'] = curl_getinfo($ch, CURLINFO_HTTP_CODE);
        curl_close($ch);
        return $r;
}

$a=array('url'=>'http://feedback.hao123.com/',
         'post'=>'ac=create&catalog_id=2&description=test',);
$b=curlFunc($a);
echo htmlspecialchars($b['html']);
//var_dump($b);

有了函数curlFunc,就可以提交我们的数据。

下面是封装好的两个函数:

function  StringToRate($str)
{
//发送表单信息返回给$b
$a=array('url'=>'http://ibi.hzau.edu.cn/FDserver/cido.php',
         'post'=>'textarea='.$str.'&radiobutton=Unknown&ButtonRatePred=Predict Folding Rate',);
$b=curlFunc($a);


//返回HTML
$c=htmlspecialchars($b['html']);
//按格式显示
//var_dump($b);

//从$b中搜索
//匹配的正则式:(The predicted folding rate: )(log\(Kf\) = )(.*)(/sec.)

$regex = '(The predicted folding rate: )(log\(Kf\) = )(.*)(/sec.)';

ereg($regex, $c, $matches);

$result = $matches[3];

return $result;

}


function FileExec($data, $output)
{
	//$data为数据库路径  $output为输出文件的路径

	//打开数据库
	$data_file_name = $data;
	$data_file_pointer=fopen($data_file_name,"r");

	//打开输出文件
	$output_file_name = $output;
	$output_file_pointer=fopen($output_file_name,"w");

	$data_file_line1 = '1';
	$data_file_line2 = '2';

	//以两行为单位读入数据
	while($data_file_line1 != NULL && $data_file_line2 != NULL){
		$data_file_line1=fgets($data_file_pointer);
		$data_file_line2=fgets($data_file_pointer);

		$time = StringToRate($data_file_line2);
		$regex = '(DIP-[0-9]*N)';
		ereg($regex, $data_file_line1, $matches);
		
		$writer = $matches[0]."\t".$time."\r\n";
		fwrite($output_file_pointer,$writer);
	}

	echo "congradulations!";
	echo "execute file ok!";



}



===================================

本文来自:http://blog.csdn.net/zh405123507

tags:蛋白质 folding rate PERL 表单



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值